You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Attempting processing of older, greenbook grant_bibliographic format files either fails with heapsize or a java.util.NoSuchElementException.
Is this a case on completeness, or were these files never intended for processing and there is a better source?
a good example is 1990 or 1998 files, a single .dat file for the whole year, and weekly pba*.zip files are also supplied, which also don't load.
http://patentscur.reedtech.com/downloads/GrantRedBookBib/1990/1990.zip http://patentscur.reedtech.com/downloads/GrantRedBookBib/1998/pba19980106_wk01.zip http://patentscur.reedtech.com/downloads/GrantRedBookBib/1998/1998.zip
(the GrantRedBookBib subfolder in source appears misleading, as the text files, when manually extracted, are clearly APS).
both zips fail during TransformCli with a NoSuchElement exception and appear to mis-classify the file as CpcMasterFile format - log:
2017-05-02 18:03:57,678 INFO [ main] TransformerCli - --- Start --- 2017-05-02 18:03:57,709 INFO [ main] 1998.zip TransformerCli - Dump File[1]: C:\data\out\uspto\grant_bibliographic\1998\1998.zip 2017-05-02 18:03:57,709 INFO [ main] 1998.zip PatentDocFormatDetect - PatentDocFormat fromFileName: CpcMasterFile 2017-05-02 18:03:57,724 INFO [ main] 1998.zip ZipReader - Reading zip file: C:\data\out\uspto\grant_bibliographic\1998\1998.zip Exception in thread "main" java.util.NoSuchElementException at gov.uspto.common.file.archive.ZipReader.next(ZipReader.java:122) at gov.uspto.patent.bulk.DumpFile.open(DumpFile.java:65) at gov.uspto.patent.bulk.DumpFileXml.open(DumpFileXml.java:31) at gov.uspto.patent.TransformerCli.processDumpFile(TransformerCli.java:166) at gov.uspto.patent.TransformerCli.process(TransformerCli.java:122) at gov.uspto.patent.TransformerCli.main(TransformerCli.java:301)
Manually extracting the text file in the zip doesn't get much further, in the case of 1998, the file is too large for my VM (1.7GB)
2017-05-02 18:05:23,347 INFO [ main] TransformerCli - --- Start --- 2017-05-02 18:05:23,378 INFO [ main] 1998.dat TransformerCli - Dump File[1]: C:\data\out\uspto\grant_bibliographic\1998\1998.dat 2017-05-02 18:05:23,378 INFO [ main] 1998.dat PatentDocFormatDetect - PatentDocFormat fromFileName: CpcMasterFile 2017-05-02 18:05:23,378 INFO [ main] 1998.dat PatentDocFormatDetect - PatentType fromContent: Greenbook Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) at java.lang.StringBuilder.append(StringBuilder.java:136) at gov.uspto.patent.bulk.DumpFileXml.read(DumpFileXml.java:66) at gov.uspto.patent.bulk.DumpFile.next(DumpFile.java:92) at gov.uspto.patent.bulk.DumpFile.next(DumpFile.java:1) at gov.uspto.patent.TransformerCli.processDumpFile(TransformerCli.java:173) at gov.uspto.patent.TransformerCli.process(TransformerCli.java:122) at gov.uspto.patent.TransformerCli.main(TransformerCli.java:301)
or using 1990 file, the file size appears ok but the format can't parse:
2017-05-02 18:06:59,785 INFO [ main] TransformerCli - --- Start --- 2017-05-02 18:06:59,826 INFO [ main] 1990.dat TransformerCli - Dump File[1]: C:\data\out\uspto\grant_bibliographic\1990\1990.dat 2017-05-02 18:06:59,828 INFO [ main] 1990.dat PatentDocFormatDetect - PatentDocFormat fromFileName: CpcMasterFile 2017-05-02 18:06:59,831 INFO [ main] 1990.dat PatentDocFormatDetect - PatentType fromContent: Greenbook Exception in thread "main" java.lang.NullPointerException at gov.uspto.patent.TransformerCli.processDumpFile(TransformerCli.java:175) at gov.uspto.patent.TransformerCli.process(TransformerCli.java:122) at gov.uspto.patent.TransformerCli.main(TransformerCli.java:301)
The text was updated successfully, but these errors were encountered:
Attempting processing of older, greenbook grant_bibliographic format files either fails with heapsize or a java.util.NoSuchElementException.
Is this a case on completeness, or were these files never intended for processing and there is a better source?
a good example is 1990 or 1998 files, a single .dat file for the whole year, and weekly pba*.zip files are also supplied, which also don't load.
http://patentscur.reedtech.com/downloads/GrantRedBookBib/1990/1990.zip http://patentscur.reedtech.com/downloads/GrantRedBookBib/1998/pba19980106_wk01.zip http://patentscur.reedtech.com/downloads/GrantRedBookBib/1998/1998.zip
(the GrantRedBookBib subfolder in source appears misleading, as the text files, when manually extracted, are clearly APS).
both zips fail during TransformCli with a NoSuchElement exception and appear to mis-classify the file as CpcMasterFile format - log:
2017-05-02 18:03:57,678 INFO [ main] TransformerCli - --- Start --- 2017-05-02 18:03:57,709 INFO [ main] 1998.zip TransformerCli - Dump File[1]: C:\data\out\uspto\grant_bibliographic\1998\1998.zip 2017-05-02 18:03:57,709 INFO [ main] 1998.zip PatentDocFormatDetect - PatentDocFormat fromFileName: CpcMasterFile 2017-05-02 18:03:57,724 INFO [ main] 1998.zip ZipReader - Reading zip file: C:\data\out\uspto\grant_bibliographic\1998\1998.zip Exception in thread "main" java.util.NoSuchElementException at gov.uspto.common.file.archive.ZipReader.next(ZipReader.java:122) at gov.uspto.patent.bulk.DumpFile.open(DumpFile.java:65) at gov.uspto.patent.bulk.DumpFileXml.open(DumpFileXml.java:31) at gov.uspto.patent.TransformerCli.processDumpFile(TransformerCli.java:166) at gov.uspto.patent.TransformerCli.process(TransformerCli.java:122) at gov.uspto.patent.TransformerCli.main(TransformerCli.java:301)
Manually extracting the text file in the zip doesn't get much further, in the case of 1998, the file is too large for my VM (1.7GB)
2017-05-02 18:05:23,347 INFO [ main] TransformerCli - --- Start --- 2017-05-02 18:05:23,378 INFO [ main] 1998.dat TransformerCli - Dump File[1]: C:\data\out\uspto\grant_bibliographic\1998\1998.dat 2017-05-02 18:05:23,378 INFO [ main] 1998.dat PatentDocFormatDetect - PatentDocFormat fromFileName: CpcMasterFile 2017-05-02 18:05:23,378 INFO [ main] 1998.dat PatentDocFormatDetect - PatentType fromContent: Greenbook Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) at java.lang.StringBuilder.append(StringBuilder.java:136) at gov.uspto.patent.bulk.DumpFileXml.read(DumpFileXml.java:66) at gov.uspto.patent.bulk.DumpFile.next(DumpFile.java:92) at gov.uspto.patent.bulk.DumpFile.next(DumpFile.java:1) at gov.uspto.patent.TransformerCli.processDumpFile(TransformerCli.java:173) at gov.uspto.patent.TransformerCli.process(TransformerCli.java:122) at gov.uspto.patent.TransformerCli.main(TransformerCli.java:301)
or using 1990 file, the file size appears ok but the format can't parse:
2017-05-02 18:06:59,785 INFO [ main] TransformerCli - --- Start --- 2017-05-02 18:06:59,826 INFO [ main] 1990.dat TransformerCli - Dump File[1]: C:\data\out\uspto\grant_bibliographic\1990\1990.dat 2017-05-02 18:06:59,828 INFO [ main] 1990.dat PatentDocFormatDetect - PatentDocFormat fromFileName: CpcMasterFile 2017-05-02 18:06:59,831 INFO [ main] 1990.dat PatentDocFormatDetect - PatentType fromContent: Greenbook Exception in thread "main" java.lang.NullPointerException at gov.uspto.patent.TransformerCli.processDumpFile(TransformerCli.java:175) at gov.uspto.patent.TransformerCli.process(TransformerCli.java:122) at gov.uspto.patent.TransformerCli.main(TransformerCli.java:301)
The text was updated successfully, but these errors were encountered: