Skip to content

parsing_block

Ortiz Troncoso, Alvaro edited this page Oct 23, 2017 · 1 revision

Block rules

The block rules are responsible for parsing Markdown syntax encompassing a full or several full lines at once and emitting the tokens required to represent these markdown blocks. For example, block code, blockquotes, headers, hr, etc.

A block rule is a function expecting the following argumnents:

  1. state: an instance of StateBlock
  2. startLine: the index of the current line
  3. endLine: the index of the last available line
  4. checkMode: a flag indicating whether we should simply check if the current line marks the begining of the syntax we are trying to parse.

Both startLine and endLine refer to values used to index several informations about lines within the state.

The checkMode is here for optimization, if it's set to true, we can return true as soon as we are sure the present line marks the beginning of a block we are trying to parse (For example, in the case of fenced block code, the check would be "Is the current line some indentation, the "```" marker and, optionally, a language name?").

State block

The definition for StateBlock prototype is in state_block.js and its data consists of:

  • src: the complete string the parser is currently working on
  • parser: The current block parser (here to make nested calls easier)
  • env: a namespaced data key-value store to allow core rules to exchange data
  • tokens: the tokens generated by the parser up to now, you will emit new tokens by calling push(newToken) on this
  • bMarks: a collection marking for each line the position of its start in src
  • eMarks: a collection marking for each line the position of its end in src
  • tShift: a collection marking for each line, how much spaces were used to indent it
  • blkIndent: how much spaces indentation were required by the parent block
  • level: the nested level for the current block

The most important methods are:

  • isEmpty(line): checks whether the line at index line is empty (or consists solely of blank space)
  • skipEmptyLines(from): returns the index of the first non-empty after from
  • skiptSpaces(pos): returns the next non-blank position at or after pos
  • skipChars(pos, code): returns the next position for a character different than code at or after pos
  • skipCharsBack(pos, code, min): returns the previous position for a character different than code at or before pos, but after min
  • getLines(begin, end, indent, keepLastLF): returns the text content of the block of lines from begin (included) to end (excluded). For each line, the initial blank spaces will be skipped (up to indent blank spaces per line). If keepLastLF is set to true, the last character of the excerpt will be a line-feed.

Rule parser behaviour

If checkMode is set to true, simply return a boolean depending on whether the current line should be considered has the first line of your block. Otherwise, proceed with the complete parsing.

NB: It is your responsibility to make sure you have reached the maximum nesting level allowed by comparing state.level and state.options.maxNesting.

NB: If for any reason, the block you are trying to parse is incorrectly formated and you are unable to parse it, you must abort and return false without modifying state in any way.

To completely parse a block, you will need to emit additional tokens in state.tokens.

Once you are sure the current line marks the beginning of one of "your" blocks, you should push an open tag token corresponding to the begining of your block. You will also need to find its end. Use the state methods to help you with this.

Your next decision should be whether you wish to allow other blocks to be nested in your content or not. If you do, you will need to invoke state.parser.tokenize(state, startLine, endLine, true) where state is updated accordingly to allow the next batch of rules to run smoothly and startLine and endLine are, respectively, the first and last line of content of your block.

If you do not wish other to be nested, simply push a new inline token with the content of the block. You could use getLines(begin, end, indent, keepLastLF) to help you with that.

If your block needs to be divided further, you may push whatever combination of intervening tokens you deem necessary.

The last token you will need to emit is the end tag token of your block.

Finally, you will need to update state to reflect that the part of the src covering your block has been taken care of. This means updating state.line to the index of the first line following your block. And return true.

Clone this wiki locally