Tokenizes a text from left to right and returns the list of tokens with their types.
npm install js-tokeniserI needed a tokenizer and tried some of the existing, but they missed some features I needed:
- node-tokenizer doesn't return information about the token types it finds, only the text.
- tokenizer-array returns this information but has a strange way of finding tokens: it searches by bisecting the text, but this doesn't find all tokens, at least in my case.
Since the task of creating a tokenizer doesn't seems too hard, I decided to create an own one - here it is.
The only requirement is to use ES6. It makes code easier to read (in my opinion), you might need a translator (like Babel if you want to use it in some browsers or with older Node versions (time to update?), but it is simply the future.
```
npm i -S js-tokeniser
`javascript
const tokeniser = require('js-tokeniser')
let result = tokeniser('// Test file\nName = Test\nAuthor = "Joachim Schirrmacher"',
[
{type: "comment", regex: /^\s\/\/[^\n]/},
{type: "definition", regex: /^\s(\w+)\s=\s((?:[^\n]+\\\n)[^\n]+)/},
]
)
console.log(result)
`
prints the following output:
`javascript`
[ { type: 'comment', matches: [] },
{ type: 'definition', matches: [ 'Name', 'Test' ] },
{ type: 'definition', matches: [ 'Author', '"Joachim Schirrmacher"' ] }
]
As you see, you don't only get the three recognised tokens, but also the matches that are found. This makes it
easy to handle a lot without defining separate tokens but instead use patterns (like in 'definition').
In previous versions, tokeniser returned the unfiltered result from String.match(), which contains an input and an index` attribute as well as the whole match.
These elements are stripped in version beginning at 1.1.0, so if you need this side-effect, be sure to modify your code.