C language target (ANTLR 3.0 prerelease)Edit
These notes made trying to figure out how to use the C language target with the ANTLR prerelease (prior to the release of ANTLR 3.0). With the official 3.0 release extensive examples for the C language target were made available, rendering these notes largely redundant. See C language target for the most up-to-date information.
Grammar setup
Specify the target language in the options
block:
grammar Walrus;
options { language = C; }
CLASSPATH
To run ANTLR from the command line you must set up the CLASSPATH
environment variable correctly. For example, on my system where ANTLR 3.0b7 is installed under /usr/local/
I have the following in my CLASSPATH
:
/usr/local/antlr/lib/antlr-3.0b7.jar
/usr/local/antlr/lib/antlr-2.7.7.jar
/usr/local/antlr/lib/stringtemplate-3.0.jar
I set this in my Bash shell with the following command:
# split across two lines for readability
export CLASSPATH="/usr/local/antlr/lib/antlr-3.0b7.jar:/usr/local/antlr/lib/antlr-2.7.7.jar"
export CLASSPATH="$CLASSPATH:/usr/local/antlr/lib/stringtemplate-3.0.jar"
Building
Compiling the grammar
Given a grammar file, Walrus.g
:
java org.antlr.Tool Walrus.g
Produces the following files:
Walrus.tokens
WalrusLexer.c
WalrusLexer.h
WalrusParser.c
WalrusParser.h
Walrus__.g
The lexer header contains information about how to actually use it; for example, an extremely simple lexer with only a few different token types yields:
* The lexer WalrusLexerhas the callable functions (rules) shown below,
* which will invoke the code for the associated rule in the source grammar
* assuming that the input stream is pointing to a token/text stream that could begin
* this rule.
*
* For instance if you call the first (topmost) rule in a parser grammar, you will
* get the results of a full parse, but calling a rule half way through the grammar will
* allow you to pass part of a full token stream to the parser, such as for syntax checking
* in editors and so on.
*
* The parser entry points are called indirectly (by function pointer to function) via
* a parser context typedef pWalrusLexer, which is returned from a call to WalrusLexerNew().
*
* As this is a generated lexer, it is unlikely you will call it 'manually'. However
* the entry points are provided anyway.
* * The entry points for WalrusLexer are as follows:
*
* - void pWalrusLexer->SPECIAL_CHAR(pWalrusLexer)
* - void pWalrusLexer->RAW_TEXT(pWalrusLexer)
* - void pWalrusLexer->WALRUS_ESCAPE_SEQUENCE(pWalrusLexer)
* - void pWalrusLexer->Tokens(pWalrusLexer)
*
* The return type for any particular rule is of course determined by the source
* grammar file.
And here is the parser header for an extremely simple parser with only one rule:
* The parser WalrusParserhas the callable functions (rules) shown below,
* which will invoke the code for the associated rule in the source grammar
* assuming that the input stream is pointing to a token/text stream that could begin
* this rule.
*
* For instance if you call the first (topmost) rule in a parser grammar, you will
* get the results of a full parse, but calling a rule half way through the grammar will
* allow you to pass part of a full token stream to the parser, such as for syntax checking
* in editors and so on.
*
* The parser entry points are called indirectly (by function pointer to function) via
* a parser context typedef pWalrusParser, which is returned from a call to WalrusParserNew().
*
* The entry points for WalrusParser are as follows:
*
* - WalrusParser_anything_return pWalrusParser->anything(pWalrusParser)
*
* The return type for any particular rule is of course determined by the source
* grammar file.
Compiling the generated C code
Prerequisites
The generated files #include
the file, antlr3.h
(on my system located at /usr/local/antlr-3.0b7/lib/C/include/antlr3.h
).
At least in 3.0b7 the ANTLR C runtime libraries are not shipped in binary form anywhere that I know of, so you have to build and install them first.
# from top-level of expanded ANTLR distribution tarball
cd lib/C/dist
tar xzvf libantlr3c-3.0.0-rc7.tar.gz
cd libantlr3c-3.0.0-rc7
./configure
make
make check # (currently does nothing)
sudo make install
This installs the following libraries:
/usr/local/lib/libantlr3c.dylib
/usr/local/lib/libantlr3c.la
/usr/local/lib/libantlr3c.a
As well as the headers in /usr/local/include/
:
/usr/local/include/antlr3.h
(and various others)
Technically, I think it would be more correct if these were installed in /usr/local/lib/antlr/
because there are quite a few of them.
Compiling
To compile the lexer, parser and link to the libantlr3c
runtime library:
gcc -Wall main.c WalrusLexer.c WalrusParser.c -lantlr3c -o walrus_parser
A main.c
file is required because otherwise the linker will complain about a missing _main
symbol. The simplest possible example (which does nothing) is this:
int main(int argc, char *argv[])
{
return 0;
}
Now an example which actually does something:
- Initialize a stream from a string at runtime
- Check for errors
- Manually invoke the lexer and print tokens one at a time
- Invoke the parser (commented out; comment out the lexer instead to see what happens)
#import "WalrusLexer.h"
[/tags/import #import] "WalrusParser.h"
int main(int argc, char *argv[])
{
pANTLR3_UINT8 input_string = "hello world";
pANTLR3_INPUT_STREAM stream = antlr3NewAsciiStringInPlaceStream(input_string, strlen(input_string), "input text stream");
if (stream == (pANTLR3_INPUT_STREAM)ANTLR3_ERR_NOMEM)
{
fprintf(stderr, "no memory\n");
exit(EXIT_FAILURE);
}
else if (stream == (pANTLR3_INPUT_STREAM)ANTLR3_ERR_NOFILE)
{
fprintf(stderr, "file not found\n");
exit(EXIT_FAILURE);
}
pWalrusLexer lexer = WalrusLexerNew(stream);
pANTLR3_COMMON_TOKEN_STREAM tokens = antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT, lexer->pLexer->tokSource);
pANTLR3_COMMON_TOKEN token;
while ((token = tokens->tstream->LT(tokens->tstream, 1))->getType(token) != ANTLR3_TOKEN_EOF)
{
printf("Token: %s\n", token->toString(token)->chars);
tokens->tstream->istream->consume(tokens->tstream->istream);
}
/*pWalrusParser parser = WalrusParserNew(tokens);
parser->anything(parser);*/
return 0;
}
The example does compile with some warnings:
main.c:6: warning: pointer targets in initialization differ in signedness
main.c:7: warning: pointer targets in passing argument 1 of ‘strlen’ differ in signedness
main.c:7: warning: pointer targets in passing argument 3 of ‘antlr3NewAsciiStringInPlaceStream’ differ in signedness
Grammar details
To insert material at the top of the parser file
For example, to include a header file, you would place something like this after the options
block:
@header {
[/tags/include #include] <ruby.h>
}
Obviously, for this to work you would need to pass the appropriate include path to GCC via the -I
switch.
To insert material at the top of the lexer implementation
*@lexer::header {
[/tags/include #include] <ruby.h>
}
The @members
section
*@lexer::header {
[/tags/include #include] <ruby.h>
}
@members
section
Given that C is not object-oriented it doesn’t make much sense to use a @members
section when using the C language target. If you do use it, you will wind up with a file-scoped variable that will be inserted about half-way down the generated file; in that case, if you want file-scoped variables then you may as well just stick them in the @header
section anyway.
However, if you want to include a helper function that can be called by any of your actions then the @members
section may indeed be the best place to put it.
The @declarations
and @init
sections
Each rule (lexer or parser rule) compiles down to a single function. The @declarations
and @init
sections allow you to declare variables that will be local to a given function and then initialize them. By performing the declaration and initialization in separate sections the generated should be very portable (for example, without using C99 mode).
Example
foo
@declarations { int bar; }
@init { bar = 10; } :
A_TOKEN;
foo
@declarations { int bar; }
@init { bar = 10; } :
A_TOKEN;
ANTLR will generate a function for parsing the foo
rule. int bar;
will appear near the top of the generated function, after the other variables in the function are declared. bar = 10;
will appear a little further down, after the other variabels in the function are initialized.