Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing some non-utility patents in bulk download files #20

Closed
crew102 opened this issue Mar 23, 2018 · 9 comments
Closed

Missing some non-utility patents in bulk download files #20

crew102 opened this issue Mar 23, 2018 · 9 comments

Comments

@crew102
Copy link

crew102 commented Mar 23, 2018

As reported by @mustberuss in #5, it looks like the bulk data files are missing some non-utility patents. For example, patent number RE46653 is indexed in whatever backend db the PatentsView API is currently using:

library(patentsview)
res <- search_pv('{"_eq":{"patent_number":"RE46653"}}')
res$query_results
#> #### Distinct entity counts across all downloadable pages of output:

#> total_patent_count = 1

...But this patent is not present in the bulk data files.

@sarahkelley
Copy link
Contributor

@crew102 Thanks for pointing this out! We will look into the issue and get back to you as soon as we can!

@sarahkelley
Copy link
Contributor

sarahkelley commented Apr 4, 2018

@crew102 I can't replicate your error - I can find this file in the patents bulk download file and the count of patents in the bulk downloads matches the count of patents in the backend DB too. Are you looking in a different bulk download file?

@crew102
Copy link
Author

crew102 commented Apr 4, 2018

I can't seem to get to the bulk data download page right now..Also the API isn't responding to my requests, so it's kinda hard to figure out what's going on at the moment.

@sarahkelley
Copy link
Contributor

Thanks for your patients, we are having a volume of requests issue right now and are working on fixing it! Hopefully things should be back in operation soon!

@mustberuss
Copy link

@sarahkelley the issue I mentioned in #5 is that the bulk cpc zip file only contains utility patents. I emailed the uspto and was told there is no bulk cpc file for plants and reissued patents. As a result patentsview is missing cpcs for all reissued patents and plant patents with cpcs (roughly half have been assigned cpcs)

@sarahkelley
Copy link
Contributor

@mustberuss Ah, I see! That makes sense.

@sarahkelley
Copy link
Contributor

@crew102 API is working again.

@crew102
Copy link
Author

crew102 commented Apr 4, 2018

Regarding the missing patents, was this issue fixed after 2017/10/03? I haven't updated my database with fresh patentsview data (taken from bulk download files) since then. I don't see patent RE46653 in my db, but that could be b/c the issue was fixed after my last update and I just haven't ingested the most recent data.

Also, can you confirm that the bulk files were indeed updated since 2017/10/03? http://www.patentsview.org/download/ suggests that this date was indeed their last update, but I seem to remember them being updated more recently than that. I suspect that they have indeed since been updated but the dates in the filenames (e.g., http://www.patentsview.org/data/20171226/patent.tsv.zip) weren't changed (also the date under "MOST RECENT DATA")

@sarahkelley
Copy link
Contributor

@crew102 Yes, the missing patent issue should be entirely resolved in the December data and the bulk downloads were most recently updated using the December data (the website should now reflect that change also).

I am going to close this issue now, but please reach out again if you are still finding any issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants