Standard language theory: Rationals, Context-free, Contextual, $LL$, $LR$, $LALR$...
Lexer: Transform a stream of characters into a stream of tokens
Parser: Transform a stream of tokens into a parse tree
Tools: Antlr, Javacc, byacc/j... Takes as input an annotated BNF grammar and output executable code
Grammar is expressed in different language(s)
Preprocessing step at build time to generate parsers
More complex builds, plugins, tools, tools, tools...
$$ \pi: \Sigma^* \longrightarrow \Sigma^* \times T$$
$$ \lfloor \pi\rfloor: \Sigma^* \longrightarrow T$$
$$ \lfloor \pi(w)\rfloor = \{ t | (\epsilon,t) \in \pi(w) \} $$
are pure functions...
that can be combined arbitrarily...
to build complex parsers from simple ones
Grammar and semantics are written in the same language
Language can be more easily grown piecemeal
Parser is easier to (unit) test $\longrightarrow$ TDD
Parsers can be reused, factorized in libraries...
Implementations for a wide variety of programming languages
Originally developed by Ben Yu (see original site on Codehaus)
Ported to C# and Ruby
Now hosted and maintained on github
{<?> [{<?> aField :: INTEGER, anOptional :: UUID?}] anotherField :: FLOAT} {<?> aField :: (STRING-> INTEGER )} {<LEGITIMATE_SON> [{<?> aField :: INTEGER, anOptional :: UUID?}, {<NAMED_SCHEMA> a :: INTEGER, b :: UUID?}] aDate :: DATETIME}
((SAUCISSON OR SAUCISSONS) ((INTOXICATION* ALIMENTAIRE* ADJ 1) OR SANT[EÈÉËÊ] OR NUTRITION* OR APPORT OR APPORTS) WITHIN 25) OR ((SAUCISSON OR SAUCISSONS) (AIL OR LAIL) ADJ 1) OR ((SAUCISSON OR SAUCISSONS) (SALAGE OR SALAGES OR FUMAGE OR FUMAGES OR (CHAIR CUITIER* ADJ 1)) WITHIN 30) ...
// Totals for the first displayed breakdown column IF ( NOT bIsBkd AND ( NOT bIsTitle ) ) THEN { IF ( iLevel == iFirstDisplayedBreakdown OR ( ( iLevel == ( iFirstDisplayedBreakdown + 1 ) ) AND bIsVerticalTotal ) ) THEN { SetBackgroundColor(235, 235, 235), SetFontBold(TRUE) }, // note the absence of parens around conditional... IF Net_total<0. AND ( NOT bIsTotal ) AND HasTitle( "Net total" )==TRUE THEN { SetBackgroundColor(255, 212, 148) //SetFontBold(TRUE) } }
Warning Performance of parsing is rarely a bottleneck in compilers/interpreters...
Compared two parsers for the same language, one in jparsec, another in Antlr
Using differentiation of parser combinators
Can skip a rule to some token (eg. semicolon) on errors
Can express grammar over stream of arbitrary objects, not just characters
Dualize parsers, For all those post-modern programmers (and for testing too...)
Pull requests are welcomed!