-
Notifications
You must be signed in to change notification settings - Fork 80
Normalized Output in HTML
Brian Feldman edited this page Dec 3, 2018
·
2 revisions
Within output JSON each formatted field contains "raw", "normalized" and "plaintext". The normalized field is the HTML referred to here. The HTML format discussed below allows for the display within a web browser. And is a best attempt to map to a standardized document across all Patent document types (Greenbook, SGML, PAP, RedBook XML).
BR
P
B, U
SUB, SUP
TABLE, THEAD, TGROUP, TR, TD
UL, OL, LI
DL, DT, DD
H1, H2, H3, H4, H5, H6
A, SPAN, PRE
Q, DEL, INS
O, SMALLCAPS, SUB2, SUP2
- XHTML: Close all tags which includes BR tags
- Entities have a class denoting its type
- All Entity instances have an Id
- html link "a" is used to denote a reference, could support future clickable link
- html "span" used to annotate text
- Subscript and superscript replace value with unicode value when mappable
<h2 id="H-0001" level="1"></h2>
Usually denoted in xml format as paragraph with id starting with "H".
<h4 id="H-0001" level="1"></h4>
<p id="P-00001" level="0"></p>
<a id="FR-0001" idref="FIG-1A" class="figref">FIG. 1A</figref>
<a id="CR-0001" idref="CLM-00001" class="claim">claim 1</a>
<span id="FOR-0001" class="formula">c=a+b</span>
<span id="MTH-0001" class="math" format="mathml">
<math> ... </math>
</span>
Note: Chrome doesn't support displaying MathML, you will need to install a javascript framework such as MathJax. An example exists below in section "Display in Browser".
<ul id="ul0002">
<li id="ul0002-0001">element 1</li>
<li id="ul0002-0002">element 2</li>
</ul>
<table id="TBL-0001">
<tr>
<td>cell1</td>
<td>cell2</td>
</tr>
</table>
<pre id="TBL-0001" class="freetext-table"></pre>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Roboto:400,400italic,500,500italic,700"/>
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Product+Sans"/>
<style type="text/css">
body {counter-reset:paragraph;}
body, table {font-family: 'Roboto', sans-serif;background-color:#fff;color:#333;}
body ::selection{background-color: #C6DAFC;color: #333;}
p{padding-left:45px;font-size:14.5px;line-height:22px;text-indent:15px;display:block;word-break:break-word; -webkit-margin-before:1em; -webkit-margin-after:1em;-webkit-margin-start:0px; -webkit-margin-end:0px;}
p:before {position:absolute; margin-left:-70px; color:#CCC; content:counter(paragraph); counter-increment: paragraph;}
table.pgwide-1{width:100%}
table.pgwide-0{width:100%}
table {border-collapse:collapse;}
table.border-all{box-shadow:0 2px 3px rgba(0,0,0,0.06);}
table.border-sides{box-shadow:0 0 3px rgba(0,0,0,0.06);}
table.border-topbot{box-shadow:0 3px 0 rgba(0,0,0,0.06);}
td, th {padding-left:12px;padding-right: 12px;}
th {background-color:#f1f1f1f1;line-height:23px;}
.border-all{border:1px solid rgba(150,150,150,0.3);border-bottom:1px solid rgba(125,125,125,0.3);}
.border-none{border: none;}
.border-topbot{border-top:1px solid rgba(150,150,150,0.3);border-bottom:1px solid rgba(125,125,125,0.3);}
.border-sides{border-left:1px solid rgba(150,150,150,0.4);border-right:1px solid rgba(150,150,150,0.4);}
.border-top{border-top:1px solid rgba(150,150,150,0.3);}
.border-bottom{border-bottom:1px solid rgba(125,125,125,0.3);}
.border-undefined{border-collapse: collapse;}
h2{font-size:18px;text-align:center;}
h4{font-size:13.5px}
h2.level-1, h4.level-1{text-indent:12px;}
h2.level-2, h4.level-2{text-indent:24px;}
h2.level-3, h4.level-3{text-indent:36px;}
span.figref, span.clmref, span.patcite, span.nplcite{font-weight:bold;}
entry{display:table-column;}
sup2{vertical-align:65%;font-size:smaller;}
sub2{vertical-align:-65%;font-size:smaller;}
ul.ul-dash{list-style:none;margin-left:0;padding-left:1em;}
ul.ul-dash > li:before {display:inline-block;content:"-";width:1em;margin-left:-1em;}
o{text-decoration:overline;}
o.single{text-decoration:overline;}
u.single{text-decoration:underline;text-decoration-style:solid;}
u.double, o.double{text-decoration-style:double;}
u.dots, o.dots{text-decoration-style:dotted;}
u.dash, o.dash{text-decoration-style:dashed;}
smallcaps{font-variant: small-caps;}
</style>
</head>
<script>window.MathJax = { MathML: { extensions: ["mml3.js"]}};</script>
<script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=MML_HTMLorMML"></script>
<body>
.... PLACE CONTENT HERE ....
</body>
</html>