|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.codehaus.jparsec.Parser<T>
public abstract class Parser<T>
Defines grammar and encapsulates parsing logic. A Parser
takes as input a CharSequence
source and
parses it when the parse(CharSequence)
method is called. A value of type T
will be returned if
parsing succeeds, or a ParserException
is thrown to indicate parsing error. For example:
Parser<String> scanner = Scanners.IDENTIFIER; assertEquals("foo", scanner.parse("foo"));
Parser
s are immutable and inherently covariant on the type parameter T
. Because Java generics has
no native support for covariant type parameter, a workaround is to use the cast()
method to explicitly
force covariance whenever needed.
Parser
s run either on character level to scan the source, or on token level to parse a list of Token
objects returned from another parser. This other parser that returns the list of tokens for token level
parsing is hooked up via the from(Parser, Parser)
or from(Parser)
method.
The following are important naming conventions used throughout the library:
Token
is called a
lexer.
index
parameters are 0-based indexes in the original source.
Nested Class Summary | |
---|---|
static class |
Parser.Reference<T>
An atomic mutable reference to Parser used in recursive grammars. |
Method Summary | ||
---|---|---|
Parser<List<T>> |
atLeast(int min)
A Parser that runs this parser greedily for at least min times. |
|
Parser<T> |
atomic()
A Parser that undoes any partial match if this fails. |
|
Parser<T> |
between(Parser<?> before,
Parser<?> after)
A Parser that runs this between before and after . |
|
|
cast()
Casts this to a Parser of type R . |
|
Parser<List<T>> |
endBy(Parser<?> delim)
A Parser that runs this for 0 or more times delimited and terminated by delim . |
|
Parser<List<T>> |
endBy1(Parser<?> delim)
A Parser that runs this for 1 or more times delimited and terminated by delim . |
|
Parser<Boolean> |
fails()
A Parser that returns true if this fails, false otherwise. |
|
Parser<T> |
followedBy(Parser<?> parser)
A Parser that sequentially executes this and then parser , whose return value is ignored. |
|
Parser<T> |
from(Parser<?> tokenizer,
Parser<Void> delim)
A Parser that takes as input the tokens returned by tokenizer delimited by
delim , and runs this to parse the tokens. |
|
Parser<T> |
from(Parser<? extends Collection<Token>> lexer)
A Parser that takes as input the Token collection returned by lexer ,
and runs this to parse the tokens. |
|
|
ifelse(Map<? super T,? extends Parser<? extends R>> consequence,
Parser<? extends R> alternative)
A Parser that runs consequence if this succeeds, or alternative otherwise. |
|
|
ifelse(Parser<? extends R> consequence,
Parser<? extends R> alternative)
A Parser that runs consequence if this succeeds, or alternative otherwise. |
|
Parser<T> |
infixl(Parser<? extends Map2<? super T,? super T,? extends T>> op)
A Parser for left-associative infix operator. |
|
Parser<T> |
infixn(Parser<? extends Map2<? super T,? super T,? extends T>> op)
A Parser that parses non-associative infix operator. |
|
Parser<T> |
infixr(Parser<? extends Map2<? super T,? super T,? extends T>> op)
A Parser for right-associative infix operator. |
|
Parser<T> |
label(String name)
A Parser that reports reports an error about name expected, if this fails with no partial
match. |
|
Parser<List<Token>> |
lexer(Parser<?> delim)
A Parser that greedily runs this repeatedly, and ignores the pattern recognized by delim
before and after each occurrence. |
|
Parser<List<T>> |
many()
p.many() is equivalent to p* in EBNF. |
|
Parser<List<T>> |
many1()
p.many1() is equivalent to p+ in EBNF. |
|
|
map(Map<? super T,? extends R> map)
A Parser that runs this parser and transforms the return value using map . |
|
static
|
newReference()
Creates a new instance of Parser.Reference . |
|
|
next(Map<? super T,? extends Parser<? extends To>> map)
A Parser that executes this , maps the result using map to another Parser object
to be executed as the next step. |
|
|
next(Parser<R> parser)
A Parser that sequentially executes this and then parser . |
|
Parser<?> |
not()
A Parser that fails if this succeeds. |
|
Parser<?> |
not(String unexpected)
A Parser that fails if this succeeds. |
|
Parser<T> |
notFollowedBy(Parser<?> parser)
A Parser that succeeds if this succeeds and the pattern recognized by parser isn't
following. |
|
Parser<T> |
optional()
p.optional() is equivalent to p? in EBNF. |
|
Parser<T> |
optional(T defaultValue)
A Parser that returns defaultValue if this fails with no partial match. |
|
Parser<T> |
or(Parser<? extends T> alternative)
p1.or(p2) is equivalent to p1 | p2 in EBNF. |
|
T |
parse(CharSequence source)
Parses source . |
|
T |
parse(CharSequence source,
String moduleName)
Parses source . |
|
T |
parse(Readable readable)
Parses source read from readable . |
|
T |
parse(Readable readable,
String moduleName)
Parses source read from readable . |
|
Parser<T> |
peek()
A Parser that runs this and undoes any input consumption if succeeds. |
|
Parser<T> |
postfix(Parser<? extends Map<? super T,? extends T>> op)
A Parser that runs this and then runs op for 0 or more times greedily. |
|
Parser<T> |
prefix(Parser<? extends Map<? super T,? extends T>> op)
A Parser that runs op for 0 or more times greedily, then runs this . |
|
Parser<T> |
reluctantBetween(Parser<?> before,
Parser<?> after)
Deprecated. This method probably only works in the simplest cases. And it's a character-level parser only. Use it at your own risk. It may be deleted later when we find a better way. |
|
|
retn(R value)
A Parser that executes this , and returns value if succeeds. |
|
Parser<List<T>> |
sepBy(Parser<?> delim)
A Parser that runs this 0 or more times separated by delim . |
|
Parser<List<T>> |
sepBy1(Parser<?> delim)
A Parser that runs this 1 or more times separated by delim . |
|
Parser<List<T>> |
sepEndBy(Parser<?> delim)
A Parser that runs this for 0 ore more times separated and optionally terminated by delim . |
|
Parser<List<T>> |
sepEndBy1(Parser<?> delim)
A Parser that runs this for 1 ore more times separated and optionally terminated by delim . |
|
Parser<Void> |
skipAtLeast(int min)
A Parser that runs this parser greedily for at least min times and ignores the return
values. |
|
Parser<Void> |
skipMany()
p.skipMany() is equivalent to p* in EBNF. |
|
Parser<Void> |
skipMany1()
p.skipMany1() is equivalent to p+ in EBNF. |
|
Parser<Void> |
skipTimes(int n)
A Parser that sequentially runs this for n times and ignores the return values. |
|
Parser<Void> |
skipTimes(int min,
int max)
A Parser that runs this parser for at least min times and up to max times, with
all the return values ignored. |
|
Parser<String> |
source()
A Parser that returns the matched string in the original source. |
|
Parser<Boolean> |
succeeds()
A Parser that returns true if this succeeds, false otherwise. |
|
Parser<List<T>> |
times(int n)
A Parser that runs this for n times and collects the return values in a List . |
|
Parser<List<T>> |
times(int min,
int max)
A Parser that runs this parser for at least min times and up to max times. |
|
Parser<Token> |
token()
A Parser that runs this and wraps the return value in a Token . |
|
Parser<List<T>> |
until(Parser<?> parser)
A Parser that matches this parser zero or many times
until the given parser succeeds. |
|
Parser<WithSource<T>> |
withSource()
A Parser that returns both parsed object and matched string. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Method Detail |
---|
public static <T> Parser.Reference<T> newReference()
Parser.Reference
.
Used when your grammar is recursive (many grammars are).
public final <R> Parser<R> retn(R value)
Parser
that executes this
, and returns value
if succeeds.
public final <R> Parser<R> next(Parser<R> parser)
Parser
that sequentially executes this
and then parser
. The return value of parser
is preserved.
public final <To> Parser<To> next(Map<? super T,? extends Parser<? extends To>> map)
Parser
that executes this
, maps the result using map
to another Parser
object
to be executed as the next step.
public final Parser<List<T>> until(Parser<?> parser)
Parser
that matches this parser zero or many times
until the given parser succeeds. The input that matches the given parser
will not be consumed. The input that matches this parser will
be collected in a list that will be returned by this function.
public final Parser<T> followedBy(Parser<?> parser)
Parser
that sequentially executes this
and then parser
, whose return value is ignored.
public final Parser<T> notFollowedBy(Parser<?> parser)
Parser
that succeeds if this
succeeds and the pattern recognized by parser
isn't
following.
public final Parser<List<T>> many()
p.many()
is equivalent to p*
in EBNF. The return values are collected and returned in a List
.
public final Parser<Void> skipMany()
p.skipMany()
is equivalent to p*
in EBNF. The return values are discarded.
public final Parser<List<T>> many1()
p.many1()
is equivalent to p+
in EBNF. The return values are collected and returned in a List
.
public final Parser<Void> skipMany1()
p.skipMany1()
is equivalent to p+
in EBNF. The return values are discarded.
public final Parser<List<T>> atLeast(int min)
Parser
that runs this
parser greedily for at least min
times. The return values are
collected and returned in a List
.
public final Parser<Void> skipAtLeast(int min)
Parser
that runs this
parser greedily for at least min
times and ignores the return
values.
public final Parser<Void> skipTimes(int n)
Parser
that sequentially runs this
for n
times and ignores the return values.
public final Parser<List<T>> times(int n)
Parser
that runs this
for n
times and collects the return values in a List
.
public final Parser<List<T>> times(int min, int max)
Parser
that runs this
parser for at least min
times and up to max
times. The
return values are collected and returned in List
.
public final Parser<Void> skipTimes(int min, int max)
Parser
that runs this
parser for at least min
times and up to max
times, with
all the return values ignored.
public final <R> Parser<R> map(Map<? super T,? extends R> map)
Parser
that runs this
parser and transforms the return value using map
.
public final Parser<T> or(Parser<? extends T> alternative)
p1.or(p2)
is equivalent to p1 | p2
in EBNF.
alternative
- the alternative parser to run if this fails.public final Parser<T> optional()
p.optional()
is equivalent to p?
in EBNF. null
is the result when this
fails with
no partial match.
public final Parser<T> optional(T defaultValue)
Parser
that returns defaultValue
if this
fails with no partial match.
public final Parser<?> not()
Parser
that fails if this
succeeds. Any input consumption is undone.
public final Parser<?> not(String unexpected)
Parser
that fails if this
succeeds. Any input consumption is undone.
unexpected
- the name of what we don't expect.public final Parser<T> peek()
Parser
that runs this
and undoes any input consumption if succeeds.
public final Parser<T> atomic()
Parser
that undoes any partial match if this
fails.
public final Parser<Boolean> succeeds()
Parser
that returns true
if this
succeeds, false
otherwise.
public final Parser<Boolean> fails()
Parser
that returns true
if this
fails, false
otherwise.
public final <R> Parser<R> ifelse(Parser<? extends R> consequence, Parser<? extends R> alternative)
Parser
that runs consequence
if this
succeeds, or alternative
otherwise.
public final <R> Parser<R> ifelse(Map<? super T,? extends Parser<? extends R>> consequence, Parser<? extends R> alternative)
Parser
that runs consequence
if this
succeeds, or alternative
otherwise.
public final Parser<T> label(String name)
Parser
that reports reports an error about name
expected, if this
fails with no partial
match.
public final <R> Parser<R> cast()
this
to a Parser
of type R
. Use it only if you know the parser actually returns
value of type R
.
public final Parser<T> between(Parser<?> before, Parser<?> after)
Parser
that runs this
between before
and after
. The return value of this
is preserved.
Equivalent to Parsers.between(Parser, Parser, Parser)
, which preserves the natural order of the
parsers in the argument list, but is a bit more verbose.
@Deprecated public final Parser<T> reluctantBetween(Parser<?> before, Parser<?> after)
Parser
that first runs before
from the input start,
then runs after
from the input's end, and only
then runs this
on what's left from the input.
In effect, this
behaves reluctantly, giving
after
a chance to grab input that would have been consumed by this
otherwise.
public final Parser<List<T>> sepBy1(Parser<?> delim)
Parser
that runs this
1 or more times separated by delim
.
The return values are collected in a List
.
public final Parser<List<T>> sepBy(Parser<?> delim)
Parser
that runs this
0 or more times separated by delim
.
The return values are collected in a List
.
public final Parser<List<T>> endBy(Parser<?> delim)
Parser
that runs this
for 0 or more times delimited and terminated by delim
.
The return values are collected in a List
.
public final Parser<List<T>> endBy1(Parser<?> delim)
Parser
that runs this
for 1 or more times delimited and terminated by delim
.
The return values are collected in a List
.
public final Parser<List<T>> sepEndBy1(Parser<?> delim)
Parser
that runs this
for 1 ore more times separated and optionally terminated by delim
. For example: "foo;foo;foo"
and "foo;foo;"
both matches foo.sepEndBy1(semicolon)
.
The return values are collected in a List
.
public final Parser<List<T>> sepEndBy(Parser<?> delim)
Parser
that runs this
for 0 ore more times separated and optionally terminated by delim
. For example: "foo;foo;foo"
and "foo;foo;"
both matches foo.sepEndBy(semicolon)
.
The return values are collected in a List
.
public final Parser<T> prefix(Parser<? extends Map<? super T,? extends T>> op)
Parser
that runs op
for 0 or more times greedily, then runs this
. The Map
objects returned from op
are applied from right to left to the return value of p
.
p.prefix(op)
is equivalent to op* p
in EBNF.
public final Parser<T> postfix(Parser<? extends Map<? super T,? extends T>> op)
Parser
that runs this
and then runs op
for 0 or more times greedily.
The Map
objects returned from op
are applied from left to right to the return
value of p.
This is the preferred API to avoid StackOverflowError
in left-recursive parsers.
For example, to parse array types in the form of "T[]" or "T[][]", the following
left recursive grammar will fail:
Terminals terms = Terminals.operators("[", "]");
Parser.Reference<Type> ref = Parser.newReference();
ref.set(Parsers.or(leafTypeParser,
Parsers.sequence(ref.lazy(), terms.phrase("[", "]"), new Unary<Type>() {...})));
return ref.get();
A correct implementation is: Terminals terms = Terminals.operators("[", "]");
return leafTypeParer.postfix(terms.phrase("[", "]").retn(new Unary<Type>() {...}));
A not-so-obvious example, is to parse the expr ? a : b
ternary operator. It too is a
left recursive grammar. And un-intuitively it can also be thought as a postfix operator.
Basically, we can parse "? a : b" as a whole into a unary operator that accepts the condition
expression as input and outputs the full ternary expression: Parser<Expr> ternary(Parser<Expr> expr) {
return expr.postfix(
Parsers.sequence(terms.token("?"), expr, terms.token(":"), expr,
new Map4<...>() {
public Unary<Expr> map(unused, consequence, unused, alternative) {
// (condition) -> Ternary(condition, consequence, alternative)
return new Unary<Expr>() {
...
return new TernaryExpr(condition, consequence, alternative);
}
}
}));
}
OperatorTable
also handles left recursion transparently.
p.postfix(op)
is equivalent to p op*
in EBNF.
public final Parser<T> infixn(Parser<? extends Map2<? super T,? super T,? extends T>> op)
Parser
that parses non-associative infix operator. Runs this
for the left operand, and then
runs op
and this
for the operator and the right operand optionally. The Map2
objects
returned from op
are applied to the return values of the two operands, if any.
p.infixn(op)
is equivalent to p (op p)?
in EBNF.
public final Parser<T> infixl(Parser<? extends Map2<? super T,? super T,? extends T>> op)
Parser
for left-associative infix operator. Runs this
for the left operand, and then runs
op
and this
for the operator and the right operand for 0 or more times greedily. The Map2
objects returned from op
are applied from left to right to the return values of this
, if any. For
example: a + b + c + d
is evaluated as (((a + b)+c)+d)
.
p.infixl(op)
is equivalent to p (op p)*
in EBNF.
public final Parser<T> infixr(Parser<? extends Map2<? super T,? super T,? extends T>> op)
Parser
for right-associative infix operator. Runs this
for the left operand, and then runs
op
and this
for the operator and the right operand for 0 or more times greedily. The Map2
objects returned from op
are applied from right to left to the return values of this
, if any. For
example: a + b + c + d
is evaluated as a + (b + (c + d))
.
p.infixr(op)
is equivalent to p (op p)*
in EBNF.
public final Parser<Token> token()
Parser
that runs this
and wraps the return value in a Token
.
It is normally not necessary to call this method explicitly. lexer(Parser)
and from(Parser,
Parser)
both do the conversion automatically.
public final Parser<String> source()
Parser
that returns the matched string in the original source.
public final Parser<WithSource<T>> withSource()
Parser
that returns both parsed object and matched string.
public final Parser<T> from(Parser<? extends Collection<Token>> lexer)
Parser
that takes as input the Token
collection returned by lexer
,
and runs this
to parse the tokens. Most parsers should use the simpler
from(Parser, Parser)
instead.
this
must be a token level parser.
public final Parser<T> from(Parser<?> tokenizer, Parser<Void> delim)
Parser
that takes as input the tokens returned by tokenizer
delimited by
delim
, and runs this
to parse the tokens. A common misunderstanding is that
tokenizer
has to be a parser of Token
. It doesn't need to be because
Terminals
already takes care of wrapping your logical token objects into physical
Token
with correct source location information tacked on for free. Your token object
can literally be anything, as long as your token level parser can recognize it later.
The following example uses Terminals.tokenizer()
:
Terminals terminals = ...; return parser.from(terminals.tokenizer(), Scanners.WHITESPACES.optional()).parse(str);And tokens are optionally delimited by whitespaces.
Optionally, you can skip comments using an alternative scanner than WHITESPACES
:
Terminals terminals = ...;
Parser<?> delim = Parsers.or(
Scanners.WHITESPACE,
Scanners.JAVA_LINE_COMMENT,
Scanners.JAVA_BLOCK_COMMENT).skipMany();
return parser.from(terminals.tokenizer(), delim).parse(str);
In both examples, it's important to make sure the delimiter scanner can accept empty string
(either through optional()
or skipMany()
), unless adjacent operator
characters shouldn't be parsed as separate operators.
i.e. "((" as two left parenthesis operators.
this
must be a token level parser.
public Parser<List<Token>> lexer(Parser<?> delim)
Parser
that greedily runs this
repeatedly, and ignores the pattern recognized by delim
before and after each occurrence. The result tokens are wrapped in Token
and are collected and returned
in a List
.
It is normally not necessary to call this method explicitly. from(Parser, Parser)
is more convenient
for simple uses that just need to connect a token level parser with a lexer that produces the tokens. When more
flexible control over the token list is needed, for example, to parse indentation sensitive language, a
pre-processor of the token list may be needed.
this
must be a tokenizer that returns a token value.
public final T parse(CharSequence source, String moduleName)
source
.
source
- the source stringmoduleName
- the name of the module, this name appears in error message
public final T parse(CharSequence source)
source
.
public final T parse(Readable readable) throws IOException
readable
.
IOException
public final T parse(Readable readable, String moduleName) throws IOException
readable
.
readable
- where the source is read frommoduleName
- the name of the module, this name appears in error message
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |