- The
parsergen
/scannergen
combo generates source code files of LR1/GLR parser & scanner from a set of annotated production rules, aka grammar. - Both
parsergen
&scannergen
use the same combo (i.e. themselves) to re-generate their own parser & scanner, respectively, to evolve. - Building the generated code with
-std=c++2a
is required. - 🧘 Most often you need the combo, but not always:
- Sometimes reusing an existing scanner with another parser is feasible and cheaper. (
%IDDEF_SOURCE
) - Sometimes a standalone scanner suffices. (see
CBrackets
)
- Sometimes reusing an existing scanner with another parser is feasible and cheaper. (
(Created by gh-md-toc)
in ArchLinux
-
Make sure you have installed
yay
or any other pacman wrapper. -
yay -S parsergen
to install. -
yay -Ql parsergen
to see the installed files:parsergen /usr/ parsergen /usr/bin/ parsergen /usr/bin/grammarstrip parsergen /usr/bin/parsergen parsergen /usr/bin/scannergen parsergen /usr/share/ parsergen /usr/share/licenses/ parsergen /usr/share/licenses/parsergen/ parsergen /usr/share/licenses/parsergen/LICENSE parsergen /usr/share/parsergen/ parsergen /usr/share/parsergen/RE_Suite.txt
-
Three commands
grammarstrip
parsergen
scannergen
at your disposal.
from github in any of Linux distros
-
Make sure you have installed
cmake
make
gcc
git
, or the likes. -
git clone https://github.com/buck-yeh/parsergen.git cd parsergen cmake -D FETCH_DEPENDEES=1 -D DEPENDEE_ROOT=_deps . make -j PSGEN_DIR="/full/path/to/current/dir"
p.s. You can install a tagged version by replacing
main
with tag name. -
Three commands at your disposal:
$PSGEN_DIR/ParserGen/grammarstrip
$PSGEN_DIR/ParserGen/parsergen
$PSGEN_DIR/ScannerGen/scannergen
-
🤔 But is it possible to just type
grammarstrip
parsergen
scannergen
to run them?
💡 Append the following lines to~/.bashrc
:PSGEN_DIR="/full/path/to/parsergen/dir" alias grammarstrip="$PSGEN_DIR/ParserGen/grammarstrip" alias parsergen="$PSGEN_DIR/ParserGen/parsergen" alias scannergen="$PSGEN_DIR/ScannerGen/scannergen"
And run the following line:
. ~/.bashrc
There you go! It will also take effect in subsequently opened console windows and will last after reboot.
When you need to quickly implement a parser for an improvised or deliberately designed DSL, prepare a grammar file in simple BNF rules with semantic annotations and then let the combo generate C++ code of parser & scanner.
example/CalcInt/grammar.txt
defines a calculator for basic arithmetics + - * / %
of integral constants in decimal, octal, or hexadecimal.
lexid Spaces // (1)
//
// Output Options (2)
//
%CONTEXT [[std::ostream &]]
%ON_ERROR [[
$c <<"COL#" <<$pos.m_Col <<": " <<$message <<'\n';
]]
%EXTRA_TOKENS [[dec_num|oct_num|hex_num|spaces]]
//%SHOW_UNDEFINED
//
// Operator Precedence (3)
//
left + -
left * / %
right ( )
//
// Grammar with Reduction Code (4)
//
<@> ::= <Expr> [[
$r = $1;
]]
<Expr> ::= <Expr> + <Expr> [[
bux::unlex<int>($1) += bux::unlex<int>($3);
$r = $1;
]]
<Expr> ::= <Expr> - <Expr> [[
bux::unlex<int>($1) -= bux::unlex<int>($3);
$r = $1;
]]
<Expr> ::= <Expr> * <Expr> [[
bux::unlex<int>($1) *= bux::unlex<int>($3);
$r = $1;
]]
<Expr> ::= <Expr> / <Expr> [[
bux::unlex<int>($1) /= bux::unlex<int>($3);
$r = $1;
]]
<Expr> ::= <Expr> % <Expr> [[
bux::unlex<int>($1) %= bux::unlex<int>($3);
$r = $1;
]]
<Expr> ::= ( <Expr> ) [[
$r = $2;
]]
<Expr> ::= $Num [[
$r = bux::createLex(dynamic_cast<bux::C_IntegerLex&>(*$1).value<int>());
]]
(1) New lexid
(2) % Option
(4) Production rule
When package parsergen
is installed in ArchLinux
parsergen grammar.txt Parser tokens.txt && \
scannergen Scanner /usr/share/parsergen/RE_Suite.txt tokens.txt
When parsergen
is built from github
parsergen grammar.txt Parser tokens.txt && \
scannergen Scanner "$PSGEN_DIR/ScannerGen/RE_Suite.txt" tokens.txt
where
Parameter | Description |
---|---|
grammar.txt |
Annotated BNF rules and other types of options. |
Parser |
Output file base - parsergen generates Parser.cpp Parser.h ParserIdDef.h |
Scanner |
Output file base - scannergen generates Scanner.cpp Scanner.h |
tokens.txt |
Output of parsergen & input of scannergen |
RE_Suite.txt |
Recurring token definitions provided with scannergen and used by tokens.txt |
💡 Put the commands in a script called reparse
for recurring uses.
ℹ️ parsergen
will prompt (y/n)
questions three times and scannergen
will prompt twice.
> ./reparse
About to parse 'grammar.txt' ...
Total 1 lex-symbols 1 nonterms 9 literals
states = 30 shifts = 106
Spent 0.005232879"
38 out of 106 goto keys erased for redundancy.
ParserIdDef.h already exists. Overwrite it ?(y/n)y
Parser.h already exists. Overwrite it ?(y/n)y
Parser.cpp already exists. Overwrite it ?(y/n)y
Parser created
#pos_args = 4
About to parse '/usr/share/parsergen/RE_Suite.txt' ...
About to parse 'tokens.txt' ...
Scanner.h already exists. Overwrite it ?(y/n)y
Scanner.cpp already exists. Overwrite it ?(y/n)y
> _
ℹ️ from example/CalcInt/main.cpp
#include "Parser.h" // C_Parser
#include "ParserIdDef.h" // TID_LEX_Spaces
#include "Scanner.h" // C_Scanner
💡 Including ParserIdDef.h
may not be necessary when spaces can't be ignored.
C_Parser parser{/*args of context ctor*/};
bux::C_ScreenerNo<TID_LEX_Spaces> screener{parser}; // (1)
C_Scanner scanner{screener};
bux::C_IMemStream in{line}; // or other std::istream derived
bux::scanFile(">", in, scanner);
// Check if parsing is ok
// ... (2)
// Acceptance
if (!parser.accepted())
{
std::cerr <<"Incomplete expression!\n";
continue; // or break or return
}
// Apply the result
// parser.getFinalLex() ... (3)
(1) Screener is filter of scanner and can filter out, change, aggregate selected tokens. Don't use it if you don't need it:
C_Parser parser{/*args of context ctor*/};
C_Scanner scanner{parser};
bux::C_IMemStream in{line}; // or other std::istream derived
bux::scanFile(">", in, scanner);
(2) Time to check integrity of your context status.
(3) parser.getFinalLex()
returns reference to the merged result of type bux::LR1::C_LexInfo
. In this example, the expected result is integral value of type int
and can be conveniently obtained by calling bux::unlex<T>()
bux::unlex<int>(parser.getFinalLex())
An alternative way is to store the result in the user context instance thru "production code" instead of calling parser.getFinalLex()
.