Skip to content

Implement an HTML parser in pure JS(TS) without relying on any libraries.

Notifications You must be signed in to change notification settings

rni-l/pure-js-html-parser

Repository files navigation

pure-js-html-parser

Implement an HTML parser in pure JS(TS) without relying on any libraries. Supports basic functionalities such as querying, adding, modifying, removing elements and converting to code.

中文文档

Quickly use

install:

npm i pure-js-html-parser

use:

import { Parser } from 'pure-js-html-parser'

const txt = `<div a='a' b="2">a</div>`

const $ = new Parser(txt)

// Get the parsed data
$.parserData
/**
 * [
    {
      tag: "div",
      value: "",
      type: "tag",
      children: [
        {
          tag: "",
          value: "a",
          type: "text",
          children: [],
          attributes: [],
        },
      ],
      attributes: [
        {
          key: "a",
          value: "a",
        },
        {
          key: "b",
          value: "2",
        },
      ],
    },
  ]
 */

Data structure

export interface IParseHtmlAttribute {
  key: string;
  value: string | undefined;
}
export type IParseValueType = "tag" | "text";
export interface IParseHtmlItem {
  tag: string;
  value: string;
  type: IParseValueType;
  children: IParseHtmlItem[];
  attributes: IParseHtmlAttribute[];
}

If the node contains a tag:<div id="a'"></div>, then the type is tag. The output is:

{
  tag: 'div',
  value: '',
  type: 'tag',
  children: [],
  attributes: [ { key: 'id', value: 'a' } ]
}

If it is text, such as the content a or the a within <div>a</div>, then the type is text. The output is:

{
  tag: '',
  value: 'a',
  type: 'text',
  children: [],
  attributes: []
}

Query elements

const txt = `<div class="a" id="a" a='a' b="2">a</div>`
const $ = new Parser(txt)
// query tag
$.query('div')
// query class
$.query('.a')
// query id
$.query('#a')
// query all
$.queryAll('.a')

Add element

const txt = `<div class="a" id="a" a='a' b="2">a</div>`
const $ = new Parser(txt)
// Insert an element at the end of the ".a" element
$.push({
  tag: 'div',
  value: '',
  type: 'tag',
  children: [],
  attributes: []
}, '.a')
// Insert an element at the end
$.push({
  tag: 'div',
  value: '',
  type: 'tag',
  children: [],
  attributes: []
})

Modify element

const txt = `<div class="a" id="a" a='a' b="2">a</div>`
const $ = new Parser(txt)
// Modify the attribute of the ".a" element'
$.modify('.a', (item) => {
  item.attributes[2].value = "a2"
  return item
})

Remove element

const txt = `<div class="a" id="a" a='a' b="2"><div class="b"></div></div>`
const $ = new Parser(txt)
$.remove('.b')

Transform to HTML code

const txt = `<div class="a" id="a" a='a' b="2"><div class="b"></div></div>`
const $ = new Parser(txt)
$.transform()

TODO

  • 重构整个功能实现
  • 处理 svg 的标签
  • 处理注释

About

Implement an HTML parser in pure JS(TS) without relying on any libraries.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published