Skip to content

Releases: weblyzard/inscriptis

Improved indentation and custom rendering styles

25 Sep 13:09
b064737
Compare
Choose a tag to compare
  • improved indentation, if span and div tags are used
  • support for custom rendering styles
  • improved documentation
  • use travis for auto CI
  • requires Python 2.7+ or Python 3.5+ since lxml does not support Python 3 versions <3.5

Improved table rendering (nested tables and line breaks in tables)

26 Feb 09:33
d45c687
Compare
Choose a tag to compare
  • Correctly handle nested tables and line breaks (e.g. due to enumerations, list or paragraph breaks) in tables.
  • Improved content stripping.

Please take a look at the Rendering document for an overview of how Inscriptis renders different tables.

Use the requests library for URL fetching

31 Jan 13:49
Compare
Choose a tag to compare
  • use requests for URL fetching (this addresses #17 and prevents 403 responses with some Web servers).

Fixed handling of negative margins.

21 Dec 14:35
Compare
Choose a tag to compare
  • correctly parse negative margins in CSS definitions.
  • This fixes a bug that led for some pages to a high number (>1000) of newlines between content.

Use server encoding, if available in the inscript.py client.

11 Dec 19:50
Compare
Choose a tag to compare

This prevents encoding errors when using inscript.py for converting HTML pages to text.

Decode HTLM entities

15 Nov 15:26
Compare
Choose a tag to compare

Decode HTML entities such as Auml;, Ouml;, Uuml;prior to returning the plain text version of the HTML page.

Improved parsing and PyPI metadata

17 Apr 11:19
Compare
Choose a tag to compare
  • improved handling of highly nested tables
  • more comprehensive PyPI metadata

flask web service and more reliable parsing

24 Nov 08:47
Compare
Choose a tag to compare

Changelog

  1. optional flask web service for converting html to python
  2. bug fixes
    • allow infinitely nested lists
    • fix a css parsing bug
    • correctly handle empty documents