Releases: weblyzard/inscriptis
Releases · weblyzard/inscriptis
Improved indentation and custom rendering styles
- improved indentation, if span and div tags are used
- support for custom rendering styles
- improved documentation
- use travis for auto CI
- requires Python 2.7+ or Python 3.5+ since lxml does not support Python 3 versions <3.5
Improved table rendering (nested tables and line breaks in tables)
- Correctly handle nested tables and line breaks (e.g. due to enumerations, list or paragraph breaks) in tables.
- Improved content stripping.
Please take a look at the Rendering document for an overview of how Inscriptis renders different tables.
Use the requests library for URL fetching
- use requests for URL fetching (this addresses #17 and prevents
403
responses with some Web servers).
Fixed handling of negative margins.
- correctly parse negative margins in CSS definitions.
- This fixes a bug that led for some pages to a high number (>1000) of newlines between content.
Use server encoding, if available in the inscript.py client.
This prevents encoding errors when using inscript.py
for converting HTML pages to text.
Decode HTLM entities
Decode HTML entities such as Auml;
, Ouml;
, Uuml;
prior to returning the plain text version of the HTML page.
Improved parsing and PyPI metadata
- improved handling of highly nested tables
- more comprehensive PyPI metadata
flask web service and more reliable parsing
Changelog
- optional flask web service for converting html to python
- bug fixes
- allow infinitely nested lists
- fix a css parsing bug
- correctly handle empty documents