Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hw3 Review #5

Open
CanyonFoot opened this issue Feb 23, 2018 · 2 comments
Open

hw3 Review #5

CanyonFoot opened this issue Feb 23, 2018 · 2 comments

Comments

@CanyonFoot
Copy link
Collaborator

CanyonFoot commented Feb 23, 2018

@simonpcouch

@simonpcouch
Copy link
Member

5.1: The challenge here was to differentiate between the number of home runs hit and allowed in the graph; not to graph a ratio. That step was the one that required gather(). This way, both hit and allowed could be graphed separately, differentiated by color. Otherwise, clean and efficient code, though.

5.2: Yup, this looks great! Efficient code, and really helpful that you put an example in to test it.

5.7: Super well done on this one. For one, recreating the dataset made for testing the code really easy.

5.15: I also switched the "F" values to female, good looks on that one. Really helpful in terms of computing power to select only the years and names that had both a male and female entry. The way you did it was really artful/creative, too.

Generally considered bad practice to pipe into an entry on the same line. Also, saving the intermediate steps (i.e. babynames1) might not be a good idea when working with datasets as big as this one.

Really cool idea on how to filter the rare names! Last few lines are efficient and intuitive.

5.16: Any reason on saving the intermediate steps? As far as I can tell, breaking the pipe doesn't seem necessary.

Also, gather() and spread() are intended to do what you did with dplyr (ending in the full_join) but in many fewer steps. Your plots get back to the same thing I mentioned with 5.1; the goal of this is to tidy the data so that both geom_lines can be on the same plot, differentiated by color (or some other aesthetic). A final gather() right before piping into the plot would make the difference here. As well done as is possible using dplyr though!

@CanyonFoot
Copy link
Collaborator Author

Re: 5.16, thought I about doing it the way you described, but I understood each month to represent an observation, and therefore felt that it was appropriate to make the columns the surface and bottom temperatures. I split up the dataset and joined it just to make it a little easier to do this. Additionally, the choice to create two plots was intentional, as you can see, the plots are similar, and even using color and alpha I found it very difficult to interpret with both lines overlapping so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants