Grammatica Reference Manual

Tokens

The token definitions in the grammar file consist of a token name and a token pattern. The token name must consist of characters from the set [a-zA-Z0-9_] and may not conflict with any other token name, nor with any of the production names.

The token patterns can be specified either as a string, in double quotes ("), or as a regular expression, in between special delimiters (<< and >>). The regular expression syntax is largely the one supported by JDK 1.4, as documented in the Java API to the java.util.regexp.Pattern class. See the figure below for two example token definitions.

STRING_TOKEN   = "Value"
REGEXP_TOKEN   = <<.>>

Figure 1. Two example token definitions. The first for a simple verbatim string, and the second for a regular expression.

It is also possible to set an ignore or an error flag on a token definition. The ignore flag is used to signal that the token should be discarded after being read, whereas the error flag is used to cause a parse error whenever the token is found. Two example token declarations using these flags are listed in the figure below.

WHITESPACE    = <<[ \t\n\r]+>> %ignore%
UNKNOWN_CHAR  = <<.>> %error unexpected token%

Figure 2. Two example token definitions with ignore and error flags. The error flag also allows adding a specific error message to the parse error thrown when encountered.