ANTLR predicatesEdit
This page provides an overview of the different types of predicate available in ANTLR:
Syntactic predicates
- Specifies the syntactic validity of applying an alternative
- Indicate syntactic context that must be satisfied in order for an alternative to match
- Automatically look ahead at future input to help make decisions (arbitrary lookahead)
- Used to order a rule’s alternatives (specify precedence among rules that would otherwise be ambiguous); this would suggest that it makes no sense to apply syntactic predicates in a rule with only one alternative and experience seems to confirm this (see PEG-style predicates in ANTLR for example)
- First alternative that matches wins
- Put another way, syntactic predicates indicate where backtracking may occur
- Take the form:
(...)=>
where ...
is a grammar fragment
- May only appear on the extreme left edge of an alternative
- In a rule with many alternatives, the last rule does not need an explicit predicate (it is assumed to be the default if nothing else matches)
- Similarly, in a rule with many alternatives, if some of the rules are not mutually ambiguous then they don’t need explicit predicates either
- The
backtrack = true
option may be used to have ANTLR automatically insert syntactic predicates on alternatives that do not already have a user-specified predicate
- The overhead of backtracking may be amortized by turning on memoizing, preferably on a per-rule basis (the
memoize = true
option)
- Actions are not executed during backtracking
- Under the covers use a "pushdown machine" rather than a DFA (as in normal LL(*) parsing), so are more powerful recognizers (they can recognize context-free structures, unlike DFAs which cannot recognize recursive structures and are equivalent in power to regular expressions)
- Explicitly specified syntactic predicates are implemented in ANTLR using gated semantic predicates behind the scenes
- Implicitly added syntactic predicates (added for auto-backtracking) are implemented using normal disambiguating semantic predicates and are therefore only evaluated when LL(*) fails to predict an alternative
- Documented in Chapter 14 of the ANTLR book (starting on page 331)
Example
(...)=>
where ...
is a grammar fragmentbacktrack = true
option may be used to have ANTLR automatically insert syntactic predicates on alternatives that do not already have a user-specified predicatememoize = true
option)Here is an example lexer rule that uses a syntactic predicate to order two ambiguous alternatives:
CRLF
: ('\r'? '\n')=> '\r'? '\n'
| '\r'
;
This will match \r\n
, \r
or \n
, even in the same file (even if a file has mixed line endings). Without the syntactic predicate ANTLR doesn’t know what to do when faced with input such as \r\n
(is it two CRLF
tokens or one?); with the predicate ANTLR doesn’t issue an ambiguity warning and always chooses the longest match (\r\n
) if possible.
Semantic predicates
- Specifies the semantic validity of applying an alternative
- Are boolean expressions in the target language that are evaluated at runtime
- Used to guide recognition
- Enforce non-syntactic rules (rules for which mere syntax is not enough to describe a valid sentence)
- Help to recognize context-sensitive language constructs
- Are "hoisted" up into other rules when necessary to inform the decision making process
- Must be free of side effects
- Must be possible to repeatedly evaluate them and get the same result
- Evaluation order must not matter
- Should not reference local variables or parameters (because this would break hoisting)
- Documented in Chapter 13 of the ANTLR book (starting on page 317)
Validating semantic predicates
- Must be possible to repeatedly evaluate them and get the same result
- Evaluation order must not matter
- Look like normal actions but are followed immediately by a question mark
{..}?
...
is a boolean expression written in the target language; if it evaluates to false then aFailedPredicateException
is thrown
Gated semantic predicates
- Precede alternatives and are written as
{...}?=>
where ...
is a boolean expression written in the target language
- When evaluating to false, effectively disable the alternative, making it invisible to the recognizer (no exception is thrown)
- Put another way, they dynamically "turn on" or "turn off" portions of a grammar
- Always hoisted, even when decisions are deterministic
- Useful when you want to distinguish between alternatives that are not syntactically ambiguous
- May only appear in rules that actually have multiple alternatives
- May be used in lexer and parser rules
Disambiguating semantic predicates
- Precede alternatives and are written as
{...}?
- Used in making prediction decisions
- Are only used (evaluated) when syntax alone is insufficient to distinguish between alternatives
- Put another way, they disambiguate syntactically identical alternatives
- Unlike gated semantic predicates may be used in rules which have only one alternative
- Are hoisted into rules higher up in the decision chain when LL(*) lookahead alone is not sufficient to distinguish between alternatives
- Unlike gated semantic predicates, are not hoisted for deterministic decisions
- Only predicates reachable from the left edge without consuming an input symbol are hoisted
- To fully resolve a non-determinism, all alternatives must be covered by a disambiguating predicate
- As a special case, if the last (and only the last) conflicting alternative is not covered then ANLTR implicitly covers it as a "default"
- Predicated alternatives specified earlier have precedence over those specified later; that is, the semantic predicates are evaluated in the order specified in the alternatives in the grammar
- Multiple predicates may be specified for a single alternative:
rule
: {pred1}? {pred2}? TOK // both pred1 and pred2 must evaluate to true
| {pred3}? TOK
;
See also
{...}?=>
where ...
is a boolean expression written in the target language- Precede alternatives and are written as
{...}?
- Used in making prediction decisions
- Are only used (evaluated) when syntax alone is insufficient to distinguish between alternatives
- Put another way, they disambiguate syntactically identical alternatives
- Unlike gated semantic predicates may be used in rules which have only one alternative
- Are hoisted into rules higher up in the decision chain when LL(*) lookahead alone is not sufficient to distinguish between alternatives
- Unlike gated semantic predicates, are not hoisted for deterministic decisions
- Only predicates reachable from the left edge without consuming an input symbol are hoisted
- To fully resolve a non-determinism, all alternatives must be covered by a disambiguating predicate
- As a special case, if the last (and only the last) conflicting alternative is not covered then ANLTR implicitly covers it as a "default"
- Predicated alternatives specified earlier have precedence over those specified later; that is, the semantic predicates are evaluated in the order specified in the alternatives in the grammar
- Multiple predicates may be specified for a single alternative:
rule
: {pred1}? {pred2}? TOK // both pred1 and pred2 must evaluate to true
| {pred3}? TOK
;