-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feasibility of read_dta? #15
Comments
The retroharmonize read_spss is indeed a wrapper, but the package needed a new, inherited class from haven (which depends on labelled), because haven does not correctly handle SPSS files: it does not write back missing values, and often has problems with the missing value range. In other words, the challenge is not to make the reader work, but to make the mapping of a different file format into R. SPSS handles metadata in a particular way, and processing the metadata into R terms was a big challenge. If there is an interesting use case, we can do a bit more extensive testing with dta - it may turn out to be a very simple task to solve, or a very difficult. Can you provide a retrospective harmonization example with at least two .dta files that are publicly accessible? Or make a small subsample for republication? Gladly take a look. |
Please check the 0.1.19 development version with devtools::install_github("rOpenGov/retroharmonize"). It would be great if you could provide a reproducible example for testing missing values. I tested on two dta files, but only as much that it imports into the survey class. |
Wow, thank you so much! I'll test and return with a reprex -- I personally don't run into .dta files with extended missing values often ( |
Writing a wrappre is not a big deal if there are no special metadata issues. The thing is with SPSS files is that the user can record otherwise valid values (such as 9999) as a numeric code for "Do not know", etc. Which can be either translated into a category as factor, or should be omitted (as NA) when calculating averages. If Stata files do not have similar issues, than I do not think you'll run into troubles. |
I see now, yes the same does happen in Stata. I'm finishing up an .Rmd to share, where I'm trying to work through some of these "don't know/ incomplete" value issues from the .dta files. As a side note, one thing I noticed is that both in the read_dta and _spss functions I couldn't figure out how to pass on the "encoding" argument to haven. Not sure how important this is for .sav files, but for .dta files older than version 14 (surprisingly common) apparently haven needs the encoding specified sometimes -- the help file for haven's read_dta explains this. Otherwise |
I'm linking to an .Rmd file here that walks through my attempt to try out |
Hello and thank you all so much for creating this package! I love using the metadata file to help track changes in labels and variable names over time.
I'm trying to use
read_surveys()
with .dta files, it doesn't seem to read them properly unless I first convert from .dta to .rds with haven, and then import withread_rds()
:Since it looks like
retroharmonize::read_spss()
is a wrapper forhaven::read_spss()
, to what extent could aretroharmonize::read_dta()
be created by substituting the equivalent haven function? Conceptually, what other changes to the code might you think necessary? Again, thanks so much forretroharmonize
and for any help or input you could provide.The text was updated successfully, but these errors were encountered: