BNF Parser Generator - Homepage
|
The BNF parser generator takes a syntax not unlike BNF and generates a "C" parser for it, a parser that can parse either strings or files. This is a flexible tool, ment for smaller parsing tasks where bison+flex are just to big to use.
The BNF input grammar, as the name suggests follows the estabishled BNF syntax quite closely, and if you're familliar with that grammar you won't have very much trouble understanding the following. All grammar rules are written in the format as the bnf program recognizes them, so they also function as examples.
grammar ::= [{ assignment }] eoi;
A BNF grammar is an optional repetition of assignments, followed by eoi
which matches with the end of the input.
assignment ::= name ('::=' | '=') expression;
An assignment is a name
followed by '=' or '::=' followed by
an expression. The expression is the meat of the grammar.
expression = term [{ '|' term }];
An expression is a list of terms, seperated by a '|', meaning or
.
term = factor [{ white factor }];
A term is a list of whitespace
seperated factors.
factor = IO | name | '[' expression ']' | '&{' expression '&}';
A factor is either an IO
term, or its a name
, or its
an optional expression
, or its an repetition
of an
expression.
IO = '\'' string '\'' | '"' string '"' | '`' string '`';
With the IO rule we handle the input that the generated parser reads
and the output it produces.
IO are strings, quoted in three different ways, either
forward quoted
to denote that it matches a string in the input only,
doublequoted
meaning that the string should be copied verbatim to the output stream,
or
backward quoted
meaning that this string should be copied to
the output by the parser.
As a first example we want a program that reads a repetition of the word "foo" and replaces it with a single word "bar". Here is the grammar you feed to the BNF program.
sample = [{ {'foo'} `bar` | . }] eoi;
There is something new here, the dot
, which simply means 'match
and echo any character'.
When you place the above grammar in a sample.bnf file and generate
the output "C" parser, bnf defines a "C" function called sample()
and the body of that function parses the input we want. We need to write
an additional "C" file containing a call to this function. That "C" file
looks like this:
#include <stdio.h> int __BNF_fil_io_parser(FILE* input,FILE* output,int (*syntax)()); int sample(void); int main() { return __BNF_fil_io_parser(stdin,stdout,sample); }
The __BNF_fil_io_parser()
function, like the sample()
function are defined in the generated "C" file. Now all there is left
to do is compile the two files and link them together. There is also a
__BNF_str_io_parser()
function that uses strings instead of files,
so you can also use the parser in the situation that files are unapproperiate.
A fundamental property of parsers generated with BNF is that they are simple "C" functions, so you can intermix calls to ordinairy "C" functions that you define yourself within the grammar. This is the fundamental way of making parsers that do more than simple echoing of the input. You can make stuff like:
foo = somefunc { 'foo' `bar` | . } eoi; ----- int somefunc() { /* do something */ return 0; }
Returning 0 in such a function denotes OK, which means to to parser can
continue with processing input. If you return a non-zero value, the parser
will 'reject' it. Ofcourse you could make the call an optional element in
the grammar like this: foo = [somefunc] "foo";
. Now the returnvalue
of somefunc()
is always ignored.
Any decent parser generator should be able to parse infix algebraic expressions, and BNF is no exception, so here follows the grammar:
input ::= ws expr ws eoi; expr ::= ws powterm [{ws '^' ws powterm}]; powterm ::= ws factor [{ws ('*'|'/') ws factor}]; factor ::= ws term [{ws ('+'|'-') ws term}]; term ::= '(' ws expr ws ')' | '-' ws expr | number; number ::= {dgt} ['.' {dgt}] [('e'|'E') ['-'] {dgt}]; dgt ::= '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'; ws ::= [{' '|'\t'|'\n'|'\r'}];
A simple compile body like the one above could be used and then you'll be able to verify weather a certain expression is correct or not. Now that's exciting, isn't it? We would also like to calculate the value of the expression ofcourse, and that's exactly where things get ugly with calls to "C" functions cluttered everywhere around the above syntax.
Therefore, before actually start cluttering your code, always check if the
bare grammar parses what you want and as a next stage, start implementing
'what to do' with the grammar. To play with this you should know that there
is an important (global) variable: extern char* __BNF_input
that
always points to 'where you are' in the input stream/string.
Want to contribute? Join the project, get source from cvs, code away, and publish the release.