Skip to content

A Nim CSS Selectors library for the WHATWG standard compliant Chame HTML parser. Query HTML using CSS selectors with Nim just like you can with JavaScript.

License

Notifications You must be signed in to change notification settings

Niminem/CSS3Selectors

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSS3Selectors

A Nim CSS Selectors library for the WHATWG standard compliant Chame HTML parser. Query HTML using CSS selectors with Nim just like you can with JavaScript's document.querySelector/document.querySelectorAll.

CSS3Selectors was created largely off the back of GULPF's Nimquery library. Rather than using Nim's htmlparser, which is currently unreliable to scrape wild HTML, we leverage the Chame HTML parser.

CSS3Selectors is almost fully compliant with the CSS3 Selectors standard. The exceptions:

  • :root, :lang(...), :enabled, :disabled
  • :link, ::first-line, ::first-letter, :visited
  • :active, ::before, ::after, :hover,
  • :focus, :target, :checked,

Those selectors were not implemented because they didn't make much sense in the situations where Nimquery was useful.

While this library has been rigorously stress-tested there still may be bugs. Please report any you encounter in the wild :)

Installation

Install from nimble: nimble install css3selectors

Alternatively clone via git: git clone https://github.com/Niminem/CSS3Selectors

Usage

import std/streams
import pkg/chame/minidom
import css3selectors

let html = """
    <!DOCTYPE html>
    <html>
    <head><title>Example</title></head>
    <body>
        <p>1</p>
        <p>2</p>
        <p>3</p>
        <p>4</p>
    </body>
    </html>
    """
let document = Node(parseHtml(newStringStream(html)))
let elements = document.querySelectorAll("p:nth-child(odd)")
echo elements # @[<p>1</p>, <p>3</p>]

let htmlFragment = parseHTMLFragment("<h1 id='test'>Hello World</h1><h2>Test Test</h2>", Element())
let element = htmlFragment.querySelector("#test")
echo element # <h1 id="test">Hello World</h1>

API

proc querySelectorAll*(root: Node | seq[Node],
                       queryString: string,
                       options: set[QueryOption] = DefaultQueryOptions): seq[Element]

Get all elements matching queryString. Raises ParseError if parsing of queryString fails. See Options for information about the options parameter.

root parameter is either a Node (for HTML documents via parseHtml) or a seq[Node] (for HTML fragments via parseHTMLFragment).


proc querySelector*(root: Node | seq[Node],
                    queryString: string,
                    options: set[QueryOption] = DefaultQueryOptions): Element

Get the first element matching queryString, or nil if no such element exists. Raises ParseError if parsing of queryString fails. See Options for information about the options parameter.

root parameter is either a Node (for HTML documents via parseHtml) or a seq[Node] (for HTML fragments via parseHTMLFragment).


proc parseHtmlQuery*(queryString: string,
                     options: set[QueryOption] = DefaultQueryOptions): Query

Parses a query for later use. Raises ParseError if parsing of queryString fails. See Options for information about the options parameter.


proc exec*(query: Query,
           root: Node,
           single: bool): seq[Element]

Execute an already parsed query. If single = true, it will never return more than one element.

Note: The root parameter accepts a Node. If you would like to execute on an HTML Fragment via parseHTMLFragment (which returns a seq[Node]) you will need to make a root element for it using:

# dom_utils.nim
func makeElemRoot*(list: seq[Node]): Element

Options

The QueryOption enum contains flags for configuring the behavior when parsing/searching:

  • optUniqueIds: Indicates if id attributes should be assumed to be unique.
  • optSimpleNot: Indicates if only simple selectors are allowed as an argument to the :not(...) psuedo-class. Note that combinators are not allowed in the argument even if this flag is excluded.
  • optUnicodeIdentifiers: Indicates if unicode characters are allowed inside identifiers. Doesn't affect strings where unicode is always allowed.

The default options is defined as const DefaultQueryOptions* = { optUniqueIds, optUnicodeIdentifiers, optSimpleNot }.

Below is an example of using the options parameter to allow a complex :not(...) selector.

import std/streams
import pkg/chame/minidom
import css3selectors

let html = """
<!DOCTYPE html>
  <html>
    <head><title>Example</title></head>
    <body>
      <p>1</p>
      <p class="maybe-skip">2</p>
      <p class="maybe-skip">3</p>
      <p>4</p>
    </body>
  </html>
"""
let document = Node(parseHtml(newStringStream(html)))
let options = DefaultQueryOptions - { optSimpleNot }
let elements = document.querySelectorAll("p:not(.maybe-skip:nth-child(even))", options)
echo elements
# @[<p>1</p>, <p class="maybe-skip">3</p>, <p>4</p>]

TODO

  • Add more helper procs like those we see in std/xmltree for easier DOM parsing (ex: innerText()). We may want to move these into another library over time.

About

A Nim CSS Selectors library for the WHATWG standard compliant Chame HTML parser. Query HTML using CSS selectors with Nim just like you can with JavaScript.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages