Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Work for RSocrata #1

Open
stuagano opened this issue May 23, 2016 · 2 comments
Open

Make Work for RSocrata #1

stuagano opened this issue May 23, 2016 · 2 comments

Comments

@stuagano
Copy link

stuagano commented May 23, 2016

Get and clean Atlanta Police Department Data get.apd.data <- function(selector.text, save.location){ require(rvest) require(dplyr) temp <- tempfile() download.file( read_html("http://www.atlantapd.org/crimedatadownloads.aspx") %>% html_nodes(selector.text) %>% html_attr("href") %>% paste0("http://www.atlantapd.org/", .), destfile = temp) # Save filename of csv crime data raw file, for later use crime.data.filename <- unzip(temp, list = TRUE)$Name # Unzip crime data to "data" folder and close temp file placeholder unzip(temp, exdir = save.location) unlink(temp) crime.data.filename

Was there a reason why you did it this way instead of using SODA? Other PD's could definitely take advantage of this if it was using RSocrata from Chicago.

@bbrewington
Copy link
Member

Good point. That will probably be the best route; I think we need to clean up the data in Socrata, though. Looks like there's a difference starting in 2015:

Socrata Data:

socrata crimes by rpt_date

COBRA051916.csv file:

num crimes by rpt_date - year month

Possible Cause: import tool is duplicating rows

socrata apd crime - rows affected repeating

@bbrewington
Copy link
Member

bbrewington commented May 24, 2016

@stuagano would it make more sense to do the ETL via https://dev.socrata.com/connectors/pentaho-kettle.html instead of the R script in this repo? Maybe if there was a way to schedule an R ETL job it could work. What do you think?

Here's the ETL current process (it would be nice to automate the last step of publishing cleaned data to Socrata):
etl process

edit: note to self - https://github.com/jwijffels/taskscheduleR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants