Skip to content
This repository has been archived by the owner on Dec 9, 2024. It is now read-only.

Incorrect parsing of USPC for 1-digit classes #1

Open
Radcliffe opened this issue May 31, 2016 · 0 comments
Open

Incorrect parsing of USPC for 1-digit classes #1

Radcliffe opened this issue May 31, 2016 · 0 comments

Comments

@Radcliffe
Copy link

Radcliffe commented May 31, 2016

Issue

US patent classifications having single-digit classes are parsed incorrectly. For example, 2/322 is misinterpreted as 23/22.

Explanation

In the source XML files, the US patent classification is represented by a character string of variable length. The first three characters contain the class, and the remaining characters contain the subclass. If the class has less than three characters then it is padded with leading spaces. In particular, if the class has only one digit, then the string should have two leading spaces.

The string containing the US patent classification is cleaned using a number of functions, including xml_util.remove_escape_sequences(string). This function replaces the two leading spaces with a single space, causing the string to be parsed incorrectly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant