-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicates in NPO dataset #3
Comments
e.g., see rows 5 and 6 in the output from the image (forgot to mention that above) |
We need to check back to the raw data - do we actually have the same organization filing twice? It could be two things - perhaps the geocoding code identified an ambiguous address, so it returned a couple of results? Or its was a non-match at one step so it gets passed to another step (match by PO box or by ZIP code only), and somehow gets added back to the sample twice. More likely, if the EIN appears twice in the raw data with two different addresses, it was a resubmission of the application. Usually it is to submit clarifying information, or perhaps they lost their status for failure to file their annual 990 and had to reapply (though that would typically be a couple of years apart, not the same year). My guess is they submit an update to their application with a new address. |
EIN 11111111 seems dubious, though! I would check that one first. |
For these cases they are likely applying for reinstatement of nonprofit status after failing to file the 990 in a timely manner and having their status revoked (there is a lag between the first and second ruledate): |
Thank you for following up |
I agree about EIN 1111111. I will take a look...I think this is something that will need to be addressed. If we are counting the same organization twice, it will obviously skew the results. Just taking note of it for now. |
SOLVED: Hey Jesse, I found another identifying column in the dataset Just to have it documented: I believe we should be using the |
Hey Jesse,
I am in the process of creating a rodeo dataset that will be used for the spatial grids (detailing distances between NPOs and board members)...I've come across an interesting finding and that is that there are ~7,000 EINs that are duplicated in the NPO IRS 1023 EZ dataset. Importantly, a feature of some of those duplicates is that they have different geocoded locations (see attached image). I am trying to understand this discrepancy in the data and wanted to reach out for your insights since you have worked with these data extensively...could it be that these reflect NPOs moving addresses over the years?
The text was updated successfully, but these errors were encountered: