HW 6 review #10

CanyonFoot · 2018-03-28T21:49:12Z

changclark · 2018-03-30T00:27:49Z

Data Tidying

voter_rate: If you change the "YES" and "NO" responses to booleans (TRUE/FALSE) you can summarize() VOTE_PROP using mean() instead of using sum()

voter_2016_election: The way you've created the age column works but is a little opaque. Using sub_str() in the stringr package allows you to select the last elements of a string. e.g. as.numeric(str_sub(BIRTH_DATE,-4,-1)). Alternatively you can using the interval() function from lubridate to calculate the voters' ages.

I would've been nice to at least have a little bit of a glimpse() of what the final dataframe looked like.

Model Fitting and Assessment

m5: I liked the variables you created for this model (Major_party and VOTE_PROP). However, considering that Major_party covaries with PARTY_CODE it makes the model a little less robust if both are included as predictors.

There was a lot of repeated code for creating the confusion matrices. For the sake of reducing error and readability, it might have been worth the time to create a function for misclassification rate.

Interpretation

I would caution against using the value of the coefficients (beyond its sign) to determine the influence between a predictor and its response when many other predictors are present. Furthermore, with a logistic regression it's even harder to interpret what the absolute value of a coefficient really means. You make a comparison with a model that didn't include age, which is good. Another measure you may want to consider is significance (p-value).
You've considered quite a few alternative models, which is good. Do you have any theories as to why your model works the best? Could there possibly be some overfitting going on? What does it mean to have a cubed age term?
For this question, I think we were supposed to look at the coefficients for motor_voter. In my models, this coefficient was consistently negative, which is a little unexpected - if we weren't considering the source of the data.
While the models created aren't perfect, I think there's still a lot of useful information to be gathered from them. For example, the coefficients will also give a sense of which parties are more likely to vote without needing to create the tables. The supplementary tables created have a lot of interesting information, but it would have been more readable if the numbers were in proportions rather than raw counts.

Other: It would have been nice if somewhere you at least reported the coefficients of the models created. You can use coef() if you think summary() takes up too much space.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HW 6 review #10

HW 6 review #10

CanyonFoot commented Mar 28, 2018

changclark commented Mar 30, 2018

HW 6 review #10

HW 6 review #10

Comments

CanyonFoot commented Mar 28, 2018

changclark commented Mar 30, 2018

Data Tidying

Model Fitting and Assessment

Interpretation