Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HW 6 review #10

Open
CanyonFoot opened this issue Mar 28, 2018 · 1 comment
Open

HW 6 review #10

CanyonFoot opened this issue Mar 28, 2018 · 1 comment

Comments

@CanyonFoot
Copy link
Collaborator

@changclark

@changclark
Copy link

Data Tidying

voter_rate: If you change the "YES" and "NO" responses to booleans (TRUE/FALSE) you can summarize() VOTE_PROP using mean() instead of using sum()

voter_2016_election: The way you've created the age column works but is a little opaque. Using sub_str() in the stringr package allows you to select the last elements of a string. e.g. as.numeric(str_sub(BIRTH_DATE,-4,-1)). Alternatively you can using the interval() function from lubridate to calculate the voters' ages.

I would've been nice to at least have a little bit of a glimpse() of what the final dataframe looked like.

Model Fitting and Assessment

m5: I liked the variables you created for this model (Major_party and VOTE_PROP). However, considering that Major_party covaries with PARTY_CODE it makes the model a little less robust if both are included as predictors.

There was a lot of repeated code for creating the confusion matrices. For the sake of reducing error and readability, it might have been worth the time to create a function for misclassification rate.

Interpretation

  1. I would caution against using the value of the coefficients (beyond its sign) to determine the influence between a predictor and its response when many other predictors are present. Furthermore, with a logistic regression it's even harder to interpret what the absolute value of a coefficient really means. You make a comparison with a model that didn't include age, which is good. Another measure you may want to consider is significance (p-value).

  2. You've considered quite a few alternative models, which is good. Do you have any theories as to why your model works the best? Could there possibly be some overfitting going on? What does it mean to have a cubed age term?

  3. For this question, I think we were supposed to look at the coefficients for motor_voter. In my models, this coefficient was consistently negative, which is a little unexpected - if we weren't considering the source of the data.

  4. While the models created aren't perfect, I think there's still a lot of useful information to be gathered from them. For example, the coefficients will also give a sense of which parties are more likely to vote without needing to create the tables. The supplementary tables created have a lot of interesting information, but it would have been more readable if the numbers were in proportions rather than raw counts.

Other: It would have been nice if somewhere you at least reported the coefficients of the models created. You can use coef() if you think summary() takes up too much space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants