Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert doc-number to patent number? #2

Open
victorconan opened this issue Aug 8, 2022 · 4 comments
Open

Convert doc-number to patent number? #2

victorconan opened this issue Aug 8, 2022 · 4 comments

Comments

@victorconan
Copy link

I noticed the parser returned the doc-number rather than patent number for the patents. Although one can search a patent using doc-number, I cannot find a mapping for doc-number vs. patent number. Do you know how to get the patent number? Thanks!

@TamerKhraisha
Copy link
Owner

Hi @victorconan
Thank you for reporting this issue, I will investigate your request in the coming days and will get back to you. I am sure the patent number is in the document and could be extracted

@victorconan
Copy link
Author

Hi @victorconan Thank you for reporting this issue, I will investigate your request in the coming days and will get back to you. I am sure the patent number is in the document and could be extracted

I looked at the xml file, and it seems they used doc-number and didn't distinguish whether it is patent number or something else :/ But the tag section puts publication-reference and application-reference in it:

<us-bibliographic-data-grant>
<publication-reference>
<document-id>
<country>US</country>
<doc-number>D0939807</doc-number>
<kind>S1</kind>
<date>20220104</date>
</document-id>
</publication-reference>
<application-reference appl-type="design">
<document-id>
<country>US</country>
<doc-number>29667332</doc-number>
<date>20181019</date>
</document-id>
</application-reference>

My guess is D0939807 from publication-reference is a patent number with extra 0 after D (not sure why, it is weird). And 29667332 from application-reference is application number. I think the parser only parses the latter one?

@victorconan
Copy link
Author

I think the bug is here:

def get_patent_identification_data(root_tree):
    publication_info = root_tree.find(publication_info_base_path)
    application_info = root_tree.find(application_info_base_path)
    term_of_grant_info = root_tree.find(us_term_of_grant_path)
    term_of_grant_length = root_tree.find(us_term_of_grant_length)
    term_of_grant_extension = root_tree.find(us_term_of_grant_extension)
    us_term_of_grant_disclaimer = root_tree.find(us_term_of_grant_disclaimer_text)
    invention_title = root_tree.find(invention_title_path)
    document_data = {}    
    if publication_info != None:
        publication_reference_info = {element.tag: element.text for element in list(publication_info)}
        document_data = {**document_data,**publication_reference_info}
    if application_info !=None:
        application_reference_info = {element.tag: element.text for element in list(application_info)}
        if application_info.attrib and application_info.attrib['appl-type']:
            application_reference_info['application_type'] =  application_info.attrib['appl-type']
        document_data = {**document_data,**application_reference_info}

Here if a patent has application info, then the publication info will be overwritten.

@federiconuta
Copy link

Hi all,
sorry for jumping into the conversation. maybe a workaround on this is to rely on google patents api in order to convert the doc-number to a patent number.
Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants