org.codehaus.jparsec
Class Parser<T>

java.lang.Object
  extended by org.codehaus.jparsec.Parser<T>

public abstract class Parser<T>
extends Object

Defines grammar and encapsulates parsing logic. A Parser takes as input a CharSequence source and parses it when the parse(CharSequence) method is called. A value of type T will be returned if parsing succeeds, or a ParserException is thrown to indicate parsing error. For example:

 Parser<String> scanner = Scanners.IDENTIFIER;
 assertEquals("foo", scanner.parse("foo"));
 

Parsers are immutable and inherently covariant on the type parameter T. Because Java generics has no native support for covariant type parameter, a workaround is to use the cast() method to explicitly force covariance whenever needed.

Parsers run either on character level to scan the source, or on token level to parse a list of Token objects returned from another parser. This other parser that returns the list of tokens for token level parsing is hooked up via the from(Parser, Parser) or from(Parser) method.

The following are important naming conventions used throughout the library:

Author:
Ben Yu

Nested Class Summary
static class Parser.Reference<T>
          An atomic mutable reference to Parser used in recursive grammars.
 
Method Summary
 Parser<List<T>> atLeast(int min)
          A Parser that runs this parser greedily for at least min times.
 Parser<T> atomic()
          A Parser that undoes any partial match if this fails.
 Parser<T> between(Parser<?> before, Parser<?> after)
          A Parser that runs this between before and after.
<R> Parser<R>
cast()
          Casts this to a Parser of type R.
 Parser<List<T>> endBy(Parser<?> delim)
          A Parser that runs this for 0 or more times delimited and terminated by delim.
 Parser<List<T>> endBy1(Parser<?> delim)
          A Parser that runs this for 1 or more times delimited and terminated by delim.
 Parser<Boolean> fails()
          A Parser that returns true if this fails, false otherwise.
 Parser<T> followedBy(Parser<?> parser)
          A Parser that sequentially executes this and then parser, whose return value is ignored.
 Parser<T> from(Parser<?> tokenizer, Parser<Void> delim)
          A Parser that takes as input the tokens returned by tokenizer delimited by delim, and runs this to parse the tokens.
 Parser<T> from(Parser<? extends Collection<Token>> lexer)
          A Parser that takes as input the Token collection returned by lexer, and runs this to parse the tokens.
<R> Parser<R>
ifelse(Map<? super T,? extends Parser<? extends R>> consequence, Parser<? extends R> alternative)
          A Parser that runs consequence if this succeeds, or alternative otherwise.
<R> Parser<R>
ifelse(Parser<? extends R> consequence, Parser<? extends R> alternative)
          A Parser that runs consequence if this succeeds, or alternative otherwise.
 Parser<T> infixl(Parser<? extends Map2<? super T,? super T,? extends T>> op)
          A Parser for left-associative infix operator.
 Parser<T> infixn(Parser<? extends Map2<? super T,? super T,? extends T>> op)
          A Parser that parses non-associative infix operator.
 Parser<T> infixr(Parser<? extends Map2<? super T,? super T,? extends T>> op)
          A Parser for right-associative infix operator.
 Parser<T> label(String name)
          A Parser that reports reports an error about name expected, if this fails with no partial match.
 Parser<List<Token>> lexer(Parser<?> delim)
          A Parser that greedily runs this repeatedly, and ignores the pattern recognized by delim before and after each occurrence.
 Parser<List<T>> many()
          p.many() is equivalent to p* in EBNF.
 Parser<List<T>> many1()
          p.many1() is equivalent to p+ in EBNF.
<R> Parser<R>
map(Map<? super T,? extends R> map)
          A Parser that runs this parser and transforms the return value using map.
static
<T> Parser.Reference<T>
newReference()
          Creates a new instance of Parser.Reference.
<To> Parser<To>
next(Map<? super T,? extends Parser<? extends To>> map)
          A Parser that executes this, maps the result using map to another Parser object to be executed as the next step.
<R> Parser<R>
next(Parser<R> parser)
          A Parser that sequentially executes this and then parser.
 Parser<?> not()
          A Parser that fails if this succeeds.
 Parser<?> not(String unexpected)
          A Parser that fails if this succeeds.
 Parser<T> notFollowedBy(Parser<?> parser)
          A Parser that succeeds if this succeeds and the pattern recognized by parser isn't following.
 Parser<T> optional()
          p.optional() is equivalent to p? in EBNF.
 Parser<T> optional(T defaultValue)
          A Parser that returns defaultValue if this fails with no partial match.
 Parser<T> or(Parser<? extends T> alternative)
          p1.or(p2) is equivalent to p1 | p2 in EBNF.
 T parse(CharSequence source)
          Parses source.
 T parse(CharSequence source, String moduleName)
          Parses source.
 T parse(Readable readable)
          Parses source read from readable.
 T parse(Readable readable, String moduleName)
          Parses source read from readable.
 Parser<T> peek()
          A Parser that runs this and undoes any input consumption if succeeds.
 Parser<T> postfix(Parser<? extends Map<? super T,? extends T>> op)
          A Parser that runs this and then runs op for 0 or more times greedily.
 Parser<T> prefix(Parser<? extends Map<? super T,? extends T>> op)
          A Parser that runs op for 0 or more times greedily, then runs this.
 Parser<T> reluctantBetween(Parser<?> before, Parser<?> after)
          Deprecated. This method probably only works in the simplest cases. And it's a character-level parser only. Use it at your own risk. It may be deleted later when we find a better way.
<R> Parser<R>
retn(R value)
          A Parser that executes this, and returns value if succeeds.
 Parser<List<T>> sepBy(Parser<?> delim)
          A Parser that runs this 0 or more times separated by delim.
 Parser<List<T>> sepBy1(Parser<?> delim)
          A Parser that runs this 1 or more times separated by delim.
 Parser<List<T>> sepEndBy(Parser<?> delim)
          A Parser that runs this for 0 ore more times separated and optionally terminated by delim.
 Parser<List<T>> sepEndBy1(Parser<?> delim)
          A Parser that runs this for 1 ore more times separated and optionally terminated by delim.
 Parser<Void> skipAtLeast(int min)
          A Parser that runs this parser greedily for at least min times and ignores the return values.
 Parser<Void> skipMany()
          p.skipMany() is equivalent to p* in EBNF.
 Parser<Void> skipMany1()
          p.skipMany1() is equivalent to p+ in EBNF.
 Parser<Void> skipTimes(int n)
          A Parser that sequentially runs this for n times and ignores the return values.
 Parser<Void> skipTimes(int min, int max)
          A Parser that runs this parser for at least min times and up to max times, with all the return values ignored.
 Parser<String> source()
          A Parser that returns the matched string in the original source.
 Parser<Boolean> succeeds()
          A Parser that returns true if this succeeds, false otherwise.
 Parser<List<T>> times(int n)
          A Parser that runs this for n times and collects the return values in a List.
 Parser<List<T>> times(int min, int max)
          A Parser that runs this parser for at least min times and up to max times.
 Parser<Token> token()
          A Parser that runs this and wraps the return value in a Token.
 Parser<List<T>> until(Parser<?> parser)
          A Parser that matches this parser zero or many times until the given parser succeeds.
 Parser<WithSource<T>> withSource()
          A Parser that returns both parsed object and matched string.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

newReference

public static <T> Parser.Reference<T> newReference()
Creates a new instance of Parser.Reference. Used when your grammar is recursive (many grammars are).


retn

public final <R> Parser<R> retn(R value)
A Parser that executes this, and returns value if succeeds.


next

public final <R> Parser<R> next(Parser<R> parser)
A Parser that sequentially executes this and then parser. The return value of parser is preserved.


next

public final <To> Parser<To> next(Map<? super T,? extends Parser<? extends To>> map)
A Parser that executes this, maps the result using map to another Parser object to be executed as the next step.


until

public final Parser<List<T>> until(Parser<?> parser)
A Parser that matches this parser zero or many times until the given parser succeeds. The input that matches the given parser will not be consumed. The input that matches this parser will be collected in a list that will be returned by this function.


followedBy

public final Parser<T> followedBy(Parser<?> parser)
A Parser that sequentially executes this and then parser, whose return value is ignored.


notFollowedBy

public final Parser<T> notFollowedBy(Parser<?> parser)
A Parser that succeeds if this succeeds and the pattern recognized by parser isn't following.


many

public final Parser<List<T>> many()
p.many() is equivalent to p* in EBNF. The return values are collected and returned in a List.


skipMany

public final Parser<Void> skipMany()
p.skipMany() is equivalent to p* in EBNF. The return values are discarded.


many1

public final Parser<List<T>> many1()
p.many1() is equivalent to p+ in EBNF. The return values are collected and returned in a List.


skipMany1

public final Parser<Void> skipMany1()
p.skipMany1() is equivalent to p+ in EBNF. The return values are discarded.


atLeast

public final Parser<List<T>> atLeast(int min)
A Parser that runs this parser greedily for at least min times. The return values are collected and returned in a List.


skipAtLeast

public final Parser<Void> skipAtLeast(int min)
A Parser that runs this parser greedily for at least min times and ignores the return values.


skipTimes

public final Parser<Void> skipTimes(int n)
A Parser that sequentially runs this for n times and ignores the return values.


times

public final Parser<List<T>> times(int n)
A Parser that runs this for n times and collects the return values in a List.


times

public final Parser<List<T>> times(int min,
                                   int max)
A Parser that runs this parser for at least min times and up to max times. The return values are collected and returned in List.


skipTimes

public final Parser<Void> skipTimes(int min,
                                    int max)
A Parser that runs this parser for at least min times and up to max times, with all the return values ignored.


map

public final <R> Parser<R> map(Map<? super T,? extends R> map)
A Parser that runs this parser and transforms the return value using map.


or

public final Parser<T> or(Parser<? extends T> alternative)
p1.or(p2) is equivalent to p1 | p2 in EBNF.

Parameters:
alternative - the alternative parser to run if this fails.

optional

public final Parser<T> optional()
p.optional() is equivalent to p? in EBNF. null is the result when this fails with no partial match.


optional

public final Parser<T> optional(T defaultValue)
A Parser that returns defaultValue if this fails with no partial match.


not

public final Parser<?> not()
A Parser that fails if this succeeds. Any input consumption is undone.


not

public final Parser<?> not(String unexpected)
A Parser that fails if this succeeds. Any input consumption is undone.

Parameters:
unexpected - the name of what we don't expect.

peek

public final Parser<T> peek()
A Parser that runs this and undoes any input consumption if succeeds.


atomic

public final Parser<T> atomic()
A Parser that undoes any partial match if this fails.


succeeds

public final Parser<Boolean> succeeds()
A Parser that returns true if this succeeds, false otherwise.


fails

public final Parser<Boolean> fails()
A Parser that returns true if this fails, false otherwise.


ifelse

public final <R> Parser<R> ifelse(Parser<? extends R> consequence,
                                  Parser<? extends R> alternative)
A Parser that runs consequence if this succeeds, or alternative otherwise.


ifelse

public final <R> Parser<R> ifelse(Map<? super T,? extends Parser<? extends R>> consequence,
                                  Parser<? extends R> alternative)
A Parser that runs consequence if this succeeds, or alternative otherwise.


label

public final Parser<T> label(String name)
A Parser that reports reports an error about name expected, if this fails with no partial match.


cast

public final <R> Parser<R> cast()
Casts this to a Parser of type R. Use it only if you know the parser actually returns value of type R.


between

public final Parser<T> between(Parser<?> before,
                               Parser<?> after)
A Parser that runs this between before and after. The return value of this is preserved.

Equivalent to Parsers.between(Parser, Parser, Parser), which preserves the natural order of the parsers in the argument list, but is a bit more verbose.


reluctantBetween

@Deprecated
public final Parser<T> reluctantBetween(Parser<?> before,
                                                   Parser<?> after)
Deprecated. This method probably only works in the simplest cases. And it's a character-level parser only. Use it at your own risk. It may be deleted later when we find a better way.

A Parser that first runs before from the input start, then runs after from the input's end, and only then runs this on what's left from the input. In effect, this behaves reluctantly, giving after a chance to grab input that would have been consumed by this otherwise.


sepBy1

public final Parser<List<T>> sepBy1(Parser<?> delim)
A Parser that runs this 1 or more times separated by delim.

The return values are collected in a List.


sepBy

public final Parser<List<T>> sepBy(Parser<?> delim)
A Parser that runs this 0 or more times separated by delim.

The return values are collected in a List.


endBy

public final Parser<List<T>> endBy(Parser<?> delim)
A Parser that runs this for 0 or more times delimited and terminated by delim.

The return values are collected in a List.


endBy1

public final Parser<List<T>> endBy1(Parser<?> delim)
A Parser that runs this for 1 or more times delimited and terminated by delim.

The return values are collected in a List.


sepEndBy1

public final Parser<List<T>> sepEndBy1(Parser<?> delim)
A Parser that runs this for 1 ore more times separated and optionally terminated by delim. For example: "foo;foo;foo" and "foo;foo;" both matches foo.sepEndBy1(semicolon).

The return values are collected in a List.


sepEndBy

public final Parser<List<T>> sepEndBy(Parser<?> delim)
A Parser that runs this for 0 ore more times separated and optionally terminated by delim. For example: "foo;foo;foo" and "foo;foo;" both matches foo.sepEndBy(semicolon).

The return values are collected in a List.


prefix

public final Parser<T> prefix(Parser<? extends Map<? super T,? extends T>> op)
A Parser that runs op for 0 or more times greedily, then runs this. The Map objects returned from op are applied from right to left to the return value of p.

p.prefix(op) is equivalent to op* p in EBNF.


postfix

public final Parser<T> postfix(Parser<? extends Map<? super T,? extends T>> op)
A Parser that runs this and then runs op for 0 or more times greedily. The Map objects returned from op are applied from left to right to the return value of p.

This is the preferred API to avoid StackOverflowError in left-recursive parsers. For example, to parse array types in the form of "T[]" or "T[][]", the following left recursive grammar will fail:

   Terminals terms = Terminals.operators("[", "]");
   Parser.Reference<Type> ref = Parser.newReference();
   ref.set(Parsers.or(leafTypeParser,
       Parsers.sequence(ref.lazy(), terms.phrase("[", "]"), new Unary<Type>() {...})));
   return ref.get();
 
A correct implementation is:
   Terminals terms = Terminals.operators("[", "]");
   return leafTypeParer.postfix(terms.phrase("[", "]").retn(new Unary<Type>() {...}));
 
A not-so-obvious example, is to parse the expr ? a : b ternary operator. It too is a left recursive grammar. And un-intuitively it can also be thought as a postfix operator. Basically, we can parse "? a : b" as a whole into a unary operator that accepts the condition expression as input and outputs the full ternary expression:
   Parser<Expr> ternary(Parser<Expr> expr) {
     return expr.postfix(
       Parsers.sequence(terms.token("?"), expr, terms.token(":"), expr,
       new Map4<...>() {
         public Unary<Expr> map(unused, consequence, unused, alternative) {
           // (condition) -> Ternary(condition, consequence, alternative)
           return new Unary<Expr>() {
             ...
             return new TernaryExpr(condition, consequence, alternative);
           }
         }
       }));
   }
 
OperatorTable also handles left recursion transparently.

p.postfix(op) is equivalent to p op* in EBNF.


infixn

public final Parser<T> infixn(Parser<? extends Map2<? super T,? super T,? extends T>> op)
A Parser that parses non-associative infix operator. Runs this for the left operand, and then runs op and this for the operator and the right operand optionally. The Map2 objects returned from op are applied to the return values of the two operands, if any.

p.infixn(op) is equivalent to p (op p)? in EBNF.


infixl

public final Parser<T> infixl(Parser<? extends Map2<? super T,? super T,? extends T>> op)
A Parser for left-associative infix operator. Runs this for the left operand, and then runs op and this for the operator and the right operand for 0 or more times greedily. The Map2 objects returned from op are applied from left to right to the return values of this, if any. For example: a + b + c + d is evaluated as (((a + b)+c)+d).

p.infixl(op) is equivalent to p (op p)* in EBNF.


infixr

public final Parser<T> infixr(Parser<? extends Map2<? super T,? super T,? extends T>> op)
A Parser for right-associative infix operator. Runs this for the left operand, and then runs op and this for the operator and the right operand for 0 or more times greedily. The Map2 objects returned from op are applied from right to left to the return values of this, if any. For example: a + b + c + d is evaluated as a + (b + (c + d)).

p.infixr(op) is equivalent to p (op p)* in EBNF.


token

public final Parser<Token> token()
A Parser that runs this and wraps the return value in a Token.

It is normally not necessary to call this method explicitly. lexer(Parser) and from(Parser, Parser) both do the conversion automatically.


source

public final Parser<String> source()
A Parser that returns the matched string in the original source.


withSource

public final Parser<WithSource<T>> withSource()
A Parser that returns both parsed object and matched string.


from

public final Parser<T> from(Parser<? extends Collection<Token>> lexer)
A Parser that takes as input the Token collection returned by lexer, and runs this to parse the tokens. Most parsers should use the simpler from(Parser, Parser) instead.

this must be a token level parser.


from

public final Parser<T> from(Parser<?> tokenizer,
                            Parser<Void> delim)
A Parser that takes as input the tokens returned by tokenizer delimited by delim, and runs this to parse the tokens. A common misunderstanding is that tokenizer has to be a parser of Token. It doesn't need to be because Terminals already takes care of wrapping your logical token objects into physical Token with correct source location information tacked on for free. Your token object can literally be anything, as long as your token level parser can recognize it later.

The following example uses Terminals.tokenizer():

 Terminals terminals = ...;
 return parser.from(terminals.tokenizer(), Scanners.WHITESPACES.optional()).parse(str);
 
And tokens are optionally delimited by whitespaces.

Optionally, you can skip comments using an alternative scanner than WHITESPACES:

   Terminals terminals = ...;
   Parser<?> delim = Parsers.or(
       Scanners.WHITESPACE,
       Scanners.JAVA_LINE_COMMENT,
       Scanners.JAVA_BLOCK_COMMENT).skipMany();
   return parser.from(terminals.tokenizer(), delim).parse(str);
 

In both examples, it's important to make sure the delimiter scanner can accept empty string (either through optional() or skipMany()), unless adjacent operator characters shouldn't be parsed as separate operators. i.e. "((" as two left parenthesis operators.

this must be a token level parser.


lexer

public Parser<List<Token>> lexer(Parser<?> delim)
A Parser that greedily runs this repeatedly, and ignores the pattern recognized by delim before and after each occurrence. The result tokens are wrapped in Token and are collected and returned in a List.

It is normally not necessary to call this method explicitly. from(Parser, Parser) is more convenient for simple uses that just need to connect a token level parser with a lexer that produces the tokens. When more flexible control over the token list is needed, for example, to parse indentation sensitive language, a pre-processor of the token list may be needed.

this must be a tokenizer that returns a token value.


parse

public final T parse(CharSequence source,
                     String moduleName)
Parses source.

Parameters:
source - the source string
moduleName - the name of the module, this name appears in error message
Returns:
the result

parse

public final T parse(CharSequence source)
Parses source.


parse

public final T parse(Readable readable)
              throws IOException
Parses source read from readable.

Throws:
IOException

parse

public final T parse(Readable readable,
                     String moduleName)
              throws IOException
Parses source read from readable.

Parameters:
readable - where the source is read from
moduleName - the name of the module, this name appears in error message
Returns:
the result
Throws:
IOException


Copyright © 2014. All rights reserved.