Python-like indentation tokens for ANTLR4 JavaScript runtime
npm install antlr-denter-jsThis project adds INDENT and DEDENT tokens for autogenerated ANTLR4 parsers for Python-like scopes. This defines a DenterHelper that can be added to an ANTLR4 grammar.
This is a JavaScript port of the original ANTLR-Denter project, adapted for use with the ANTLR4 JavaScript runtime.
This is a plugin that is spliced into an ANTLR grammar's lexer, and allows that lexer to make use of INDENT and DEDENT to represent Python-like scope entry and termination.
When DenterHelper injects DEDENT tokens, it will prefix any string of them with a single NL. A single NL is also inserted before the EOF token if there are no DEDENTs to insert (that is, if the last line of the source file is not indented). A NL is _not_ inserted before an INDENT, since indents always imply a newline before them (and thus make the newline token meaningless).
For example, given this input:
```
hello
world
universe
dolly
Would be parsed as:
``
"hello"
INDENT
"world"
INDENT
"universe"
NL
DEDENT
DEDENT
"dolly"
NL
This approach lets you define expressions, single-line statements, and block statements naturally.
1. Expressions in your parser grammar should not end in newlines. This makes compound expressions work naturally.
2. Single-line statements in your grammar should end in newlines. For example, an assignment expression might be identifier '=' expression NL.block: INDENT statement+ DEDENT
3. Blocks are bookended by INDENT and DEDENT, without mentioning extra newlines: .if
- You should _not_ include a newline before the INDENT
- An would be something like if expression ':' block. (Note the lack of NL after the :.)
In the example above, universe and dolly represent simple expressions, and you can imagine that the grammar would contain something like statement: expression NL | helloBlock;.
The DenterHelper processor asserts correct indentation on DEDENT. Take the following example:
``
someStatement()
if foo():
if bar():
fooAndBar()
bogusLine()
bogusLine() does not dedent to the indentation of any valid scope - lacking indentation to qualify as part of the if foo():'s scope and too indented to share a scope with someStatement(). In Python this is expressed as an IndentationError.
The DenterHelper processor handles this by inserting two tokens: a DEDENT followed immediately by an INDENT (the total sequence here would actually be two DEDENTs followed by an INDENT, since bogusLine() is twice-dedented from fooAndBar()). The rationale is that the line has dedented to its parent, and then indented.
As a consequence, the DenterHelper processor will also assert correct indentation for all lines where an INDENT is not expected. Take the following example in a Python-like grammar of two method calls:
``
someStatement()
bogusLine()
This would be illegal due to no INDENTs being expected after someStatement().
`bash`
npm install antlr-denter-js
In an ANTLR grammar definition MyGrammar.g4, use the following:
`antlr
tokens { INDENT, DEDENT }
@lexer::header {
import { DenterHelper } from 'antlr-denter-js';
}
@lexer::members {
this.denter = DenterHelper.builder()
.nl(SimpleCalcLexer.NL)
.indent(SimpleCalcLexer.INDENT)
.dedent(SimpleCalcLexer.DEDENT)
.pullToken(() => super.nextToken());
this.nextToken = () => this.denter.nextToken();
}
NL: ('\r'? '\n' ' '); // For tabs just switch out ' ' with '\t'*
`
Note: The exact syntax for @lexer::header and @lexer::members may vary depending on your ANTLR4 JavaScript target version. Adjust accordingly.
See the example/ directory for a complete working example with a simple calculator grammar that uses indentation.
The main class that handles indentation processing.
#### Static Methods
- DenterHelper.builder(): Returns a new builder instance for creating a DenterHelper.
#### Instance Methods
- nextToken(): Returns the next token, handling indentation as needed.getOptions()
- : Returns a DenterOptions instance for configuring behavior.
Options for configuring DenterHelper behavior.
#### Methods
- ignoreEof(): Don't do any special handling for EOFs; they'll just be passed through normally. This is useful when the lexer will be used to parse rules that are within a line, such as expressions.
Use the builder pattern to create a DenterHelper instance:
`javascript``
const denter = DenterHelper.builder()
.nl(NL_TOKEN_TYPE)
.indent(INDENT_TOKEN_TYPE)
.dedent(DEDENT_TOKEN_TYPE)
.pullToken(pullTokenFunction);
MIT License - see the LICENSE file for details.
Many thanks to yshavit for developing the original ANTLR-Denter project, which this JavaScript port is based on.
- The original ANTLR-Denter for Java.
- ANTLR4, the language toolkit.
- antlr-denter-cs for C#.