A tiny but elegant parser combinator library written by Mepy
npm install nihil``nihil` is a parser combinator library for Javascript(ES2015+).
nihil is null, nothingness, etc, while `nihil` is useful and elegant
NPM.
`nihil` is a lot inspired by bnb
But written in the style of FP(Functional Programming), mainly, I thought
details in ## Reference.
`nihil` has zero dependencies and is about 5.2KB(exactly 5291 Bytes) raw Javascript.
As for minifying and zipping, I haven't tried it.
`nihil` source code uses some features of ES2015(or ES6),
such as destructuring assignment, arrow function.
So you'd better use modern browser like Chrome, Firefox, etc.
`nihil` library contains only, maybe you can consider, one object `nihil`.
So it of course supports both Browser and NodeJS.
`nihil` is elegant and easy to use, try it? -> Read ## Tutorial
nihil` is a lot inspired by bnb
At first, I used `bnb` for another toy project
However, I met some troubles :
1. Creating a recursive parser is a little hard
2. Assign bnb.parser.parse to a variable will throw an error
Because `bnb` implement parser with class
Assign instance.parse to a variable will lose `this` pointer
I decide to implement my own one, and I read the source code of `bnb`
But `nihil` is totally new written, I thought it should be NOT a derivative of `bnb`
The similarity between `nihil` and `bnb` might only be
`bnb.match(RegExp)` <=> `nihil(RegExp)`
and chain method `.and`, `.or`, etc.Tutorial
$3
You can use `nihil(RegExp)` to create parsers:
`js
nihil(/a+/) // a parser
`$3
With a parser, you can parse raw string with method `.try` or `.parse`:
`js
const a = nihil(/a+/)
a.try("aaaa") // "aaaa"
a.try("bbbb") // throw { expected: '/a+/', location: 0 }
a.parse("aaaa") // { right: true, value: 'aaaa' }
a.parse("bbbb") // { right: false, error: { expected: '/a+/', location: 0 } }
`
The difference between them is that `.try` would throw error while `.parse` would return.For convenience, we will use
`.try` in the following text.$3
You can use `nihil.and(RegExp,...)` to create a sequential parser:
`js
nihil.and(/a/,/b/,/c*/)
.try("bbbbcccc") // [ '', 'bbbb', 'cccc' ]
`
If exists a parser `k`, you can use `k.and(parser,...)`:
`js
const k = nihil(/k+/)
const g = nihil(/g+/)
k.and(g,k)
.try("kkkggggkkkk") // [ 'kkk', 'gggg', 'kkkk' ]
`
But It would trouble you if the second parser is a simple parser generated by `nihil`
Well, you can simply enter RegExps in place of simple parsers:
`js
k.and(/g+/,k)
.try("kkkggggkkkk") // [ 'kkk', 'gggg', 'kkkk' ]
`$3
If you want to parse "a" or "b", You can of course use RegExp `/a|b/`
But if you want to parse [one complex language] or [another complex language],
You can also do it with a complex RegExp which might bring errors,
but using `nihil.or`, you can implement the same function as `/a|b/`
with well-understanding code style:
`js
const a_b = nihil.or(/a/,/b/)
a_b.try("a") // [ 'a' ]
a_b.try("b") // [ 'b' ]
``.or` is similar to `.and`, so you can also do like these:
`js
const a = nihil(/a/)
const b = nihil(/b/)
const c = nihil(/c/)
a.or(b,c).try("c") // [ 'c' ]
a.or(b,/c/).try("c") // [ 'c' ]
`ATTENTION : The order of parser makes difference, see ### Larger Precedes.
The combined parser will try to parse using from left parser to right parser.
`js
nihil.or(/a/,/a+/).parse("aaa") // { right: false, error: { expected: '', location: 1 } }
nihil.or(/a+/,/a/).parse("aaa") // { right: true, value: [ 'aaa' ] }
`Well, maybe you are considering it is just another style to write RegExp
and doubting whether it is necessary to learn
`.or` method.
What if the complex language is not a RE language but a LL(1) language?
You can not implement it with RegExp!!!
Read the following tutorial, you can create a parser of LL(m) .
(m as big as you want and if your computer can compute it in time.) $3
`.keep` method is used for selecting parsers according former value.
With this, you can implement a LL(m) parser.
For example, maybe improper, we implement a LL(1) parser
which accepts a string like 'aA', 'bB', ..., 'zZ'
`js
const lowUp = nihil(/[a-z]/)
.keep($1=>nihil(RegExp($1.toUpperCase())))
.parse
lowUp("nN") // { right: true, value: [ 'n', 'N' ] }
lowUp("iJ") // { right: false, error: { expected: '/I/', location: 1 } }
`ATTENTION : in
`.keep` method, you must give a function
which returns a parser but NOT a RegExp,
different from `.and` and `.or`.As for LL(m), the format is like this, simply m = 2
`js
const char4 = nihil.and(/./,/./,/./,/./)
// result of char4 is an array with a length of 4 > m = 2
const parser = char4.keep(([$1,$2,$3,$4])=>{
if($4=='a'){return nihil(/r/)}
else if($3=='b'){return nihil(/s/)}
else{return nihil(/t/)}
}).try// parser accepts /...ar/, /..b[^a]s/ or /..[^b][^a]t/
parser("wxyar"), // [ 'w', 'x', 'y', 'a', 'r' ]
parser("wxbzs"), // [ 'w', 'x', 'b', 'z', 's' ]
parser("wxyzt"), // [ 'w', 'x', 'y', 'z', 't' ]
`REASON : Why name this method after 'keep'?
Because this method KEEP the former value (e.g. result of
`char4`)
Can we choose not to keep it?
Of course, use `.drop` instead of `.keep`
But it is little used in my view.$3
Until now, value of result of parsers are all strings.
Sometimes, we need to transform them, e.g. transform `'3'` to `3`
`js
nihil(/0|[1-9][0-9]*/) // String of number, decimal
.map(Number) // from String to Number
.try("3") // 3
``.map` could do other things, like verifying the value:
For example, we decide to reject 114514
`js
const num = nihil(/0|[1-9][0-9]*/)
.map(Number)
.map($1=>{
if($1==114514)
return undefined
else
return $1
})
num.and(/\s+/,num)
.try("114514 3") // [ undefined, ' ', 3 ]
`However, you might realize that the parser,
in fact, replace 114514 with undefined.
Can we DROP 114514, make the result
`[ ' ', 3 ]`?
Use `.drop`.$3
Except that `.drop` drops the former parser's result,
it is nearly the same as `.keep`What needs to be emphasized is
that
`fn` in `.drop(fn)` MUST return a parser
NEITHER RegExp, NOR array of values in `.map``nihil.box(value)` will return a parser
which does nothing but return value,
so called a box (a parser containing value).
You might feels it useful, right?`js
const num = nihil(/0|[1-9][0-9]*/)
.map(Number)
.drop($1=>{
if($1==114514)
return nihil
else
return nihil.box($1)
})const foo = num.and(/\s+/,num).try
foo("114514 3") // [ ' ', 3]
foo("10492 3") // [ 10492, ' ', 3 ]
`
But wait, what is `nihil` in `return nihil`?$3
As is seen, `nihil` acts as a parser, doing nothing. Formly,
`nihil`, as its name, is a nihil (NULL parser, PSEUDO parser).It can accept a RegExp and return a parser, which we have been using.
`nihil` is a parser, so you can use its `.and`, `.or`.
But `nihil`'s `.keep` , `.drop` and `.map` must accept a parser first,
because it doesn't return value when parsing. See API.Until now, we have nearly been empowered to write LL(∞) parsers.
But without knowing the flaw of parser combinator, i.e. the top down parser,
The following methods would be hard to make by the above methods.
Higher Order Tutorial
For a quick-look, You can only read built-in method.
But reading completely is useful for understanding `nihil`'s feature.$3
Recall the result of example in `.drop` tutorial, `[ ' ' , 3 ]`
We might find that the ' ' is so ugly, we want to drop it!
Of course, we can use `.map` to map the array `[ ' ' , 3 ]` to `[ 3 ]`.
But the annoying thing is we need to consider more situations :
`js
const number = nihil(/0|[1-9][0-9]*/).map(Number)
const space = nihil(/\s*/) // accept 0~∞ space(s)
const left = " 3"
const right = "3 "
const leftright = " 3 "
`#### using built-in
`.sep`
`nihil` implements a built-in method `.sep` for parsers:
`js
var parse = number.sep(space).try
parse(left) // 3
parse(right) // 3
parse(leftright) // 3
`
But it hides some interesting features, let's implement our `.sep`.#### using simple
`.and`
`js
// first method
var parse = nihil.and(space, number, space).parse
parse(left) // { right: false, error: { expected: '/\\s*/', location: 3 } }
parse(right) // { right: true, value: [ 3, ' ' ] }
parse(leftright) // { right: true, value: [ ' ', 3, ' ' ] } }
`
Obviously, sometimes it would err.#### using simple
`.and` and partial `.or(nihil)`
`js
// first method
var parse = nihil.and(space, number, space.or(nihil)).parse
parse(left) // { right: true, value: [ ' ', 3 ] }
parse(right) // { right: true, value: [ 3, ' ' ] }
parse(leftright) // { right: true, value: [ ' ', 3, ' ' ] } }
`
We have solved the problem it errs, but how to map three different cases to the same result `3`?
It is a bit difficult, especially differentiating the first and the second. #### using
`.drop` ahead of combining parsers
When meeting \s*, we drop it.
`js
var space_drop = space.drop(_=>nihil);
var parse = nihil.and(space_drop, number, space_drop.or(nihil))
.map($=>$[0])
.parse
parse(left) // { right: true, value: 3 }
parse(right) // { right: true, value: 3 }
parse(leftright) // { right: true, value: 3 }
`
Finally, we solve it. #### feature
-
`parser.drop(_=>nihil)` can drop the value of the parser
- `parser.or(nihil)` can skip the parser if it couldn't match, similar to the symbol `?` in the RegExp.$3
If we want to parse `" 3 4 5 "`,
we can use `number.and(number, number)` with `.sep`.
But if we don't know how many numbers occur?
`js
const array4 = "3 4 6 4 "
const array2 = " 5 9"
`#### using built-in
`.loop`
Of course, there is a built-in `.loop`:
`js
var array = number.sep(space)
.loop()
array.try(array4) // [ 3, 4, 6, 4 ]
array.try(array2) // [ 5, 9 ]
`
but it also hides some interesting features, let's implement our `.loop`.#### loop = recursion
As programmers all know, loop is equal to recusion.
So we can implement a recursive parser, the following is a BNF(Backus-Naur Form):
`BNF
array ::= number array | nihil
`
But how can `array` call itself?#### straightly calling itself
`js
var array = undefined // ensure array undefined
var array = number.sep(space)
.and(array)
.or(nihil)
// Thrown:
// TypeError: Cannot read property 'raw' of undefined
`
It failed, because array is undefined when we define array. #### using wrapper and closure
In Javascript, closure is useful.
In this way, an object can be captured by a function.
We can use the object in the function when called.
In other words, we wrap the object with a function.
We usually call this trick "late-bound" (especially in λ-calculus)
`js
var array = undefined // ensure array undefined
var array = number.sep(space)
.and(_=>array(_))
.or(nihil)
array.try(array4) // [ 3, [ 4, [ 6, [ 4 ] ] ] ]
array.try(array2) // [ 5, [ 9 ] ]
`
ATTENTION : We defer the use time of `array`!
It was called when the arrow function `_=>array(_)` called.Wait, is parser a function, now that called with argument
`_`?
Yes, a parser is a function, it accepts `source` and return `value`.
We use `.try` or `.parse`,
because `source` is a wrapper of `String`,
and we also need to wrap `value`s into results or throw errors.
See nihil.parser#### unfolding the result
Noticing that the result is a recursive array,
we need to unfold it into a simple array.
`js
const unfold = a=>(a.length == 1)?a:([a[0],...a[1]]) // a means array
var array = undefined // ensure array undefined
var array = number.sep(space)
.and(_=>array(_))
.map(unfold)
.or(nihil)
array.try(array4) // [ 3, 4, 6, 4 ]
array.try(array2) // [ 5, 9 ]
`#### feature
-
`parser` is indeed a function, we can call it with source
- `_=>parser(_)` can defer the time of using parser.
- `unfold` is needed for maping recursive array to simple array.$3
Now, we could parse string such as `"3 4 6 4 "`, `" 5 9"`, etc.
We want to add `"["` and `"]"` on the left and right of array.#### using built-in
`.wrap`
`js
array
.wrap(/\[/,/\]/)
.try("[3 4 6 4 ]") // [ 3, 4, 6, 4 ]
`
#### using `.and` and `.map`
`js
nihil
.and(/\[/,array,/\]/)
.map(([l,v,r])=>v)
.try("[3 4 6 4 ]") // [ 3, 4, 6, 4 ]
`
`nihil` indeed uses such a method to offer a handy tool `.wrap`.#### feature
-
`parser.wrap` is a useful tool.$3
Let's run 2 code segments with a slight difference:
`js
var number = nihil(/0|[1-9][0-9]/).map(Number).sep(/\s/)
number.loop().or(number)
.parse("3 4") // { right: true, value: [ 3, 4 ] }
`
`js
var number = nihil(/0|[1-9][0-9]/).map(Number).sep(/\s/)
number.or(number.loop())
.parse("3 4") // { right: false, error: { expected: '', location: 2 } }
`
The second one is incorrect.
The `number` parsed the string first and succueeded,
so `number.loop()` had no opportunity to parse the raw string.
However, `number` could only parse one number,
so it expected \ after parsing `"3 "`, but given `"4"`,
finally it returned an error.Notation :
- P, Q, R, ... are Parsers.
- P | Q means
`P.or(Q)`
- L(P) is the language described by P.
- p ∈ L(P) is a string of the language L(P).
- p < q means p is a prefix substring of q.
Definition : P < Q iff L(P) < L(Q) iff ∀q ∈ L(Q), ∃p ∈ L(P), p < qTheorem : P < Q ⇒ L(P | Q) = L(P)
Proof : Omitted.
Therefore, if P < Q,
`P.or(Q)` acts as `P`.#### feature
- when
`.or` combining parsers, the larger should precedes. $3
Recall the BNF in the loop recursion
`BNF
array ::= number array | nihil
`
As a matter of fact, the following BNF is also correct.
`BNF
array ::= array number | nihil
`
However, the corresponding Javascript code is incorrect:
`js
var number = nihil(/0|[1-9][0-9]/).map(Number).sep(/\s/)
var array = nihil.and(_=>array(_),number).or(nihil)
array.try("3 4 ")
// Thrown:
// RangeError: Maximum call stack size exceeded
`
This is because `array` always calling itself,
making the function calling stack overflow.We call the first BNF "right recursion", the second "left recursion".
#### feature
- when
`.and` creating recursive parsers, transform left recursion to right recursion.$3
The recursive parser made in the `.loop` is linked-list-like.
Now we implement a tree-like one.
`js
var number = nihil(/0|[1-9][0-9]*/).map(Number)
var array = number.or(_=>array(_)).sep(/\s*/).loop().wrap(/\[/,/\]/)
array.try("[ 3 [5[9]] 6 [4 5 ] ]") // [ 3, [ 5, [ 9 ] ], 6, [ 4, 5 ] ]
`
Useful Tools
$3
Sometimes we want to add a label to the parser result.
`js
var decimal = nihil(/0|[1-9][0-9]*/).map(Number).label("decimal")
var hexadecimal = nihil(/0x(0|[0-9a-fA-F][1-9a-fA-F]*)/).map(Number).label("hexadecimal")
var number = decimal.or(hexadecimal)
number.try("33") // { label: 'decimal', value: 33 }
`
In fact, you can use `.map` to implement it.
`js
parser.label = label=>parser.map(value=>({label,value}))
`
$3
Sometimes we want to locate where parser works.
`js
var a = nihil(/a+/)
var b = nihil(/b+/).locate()
a.or(b).loop()
.try("aaabbbbaa") // [ 'aaa', { beg: 3, value: 'bbbb', end: 7 }, 'aa' ]
`Current Bugs
`js
var a = nihil(/a+/).drop(_=>nihil)
var b = nihil(/b+/)
b.or(a).sep(/\s+/).loop().parse("aaabbbbaa")
// { right: false, error: { expected: '', location: 3 } }
`
The reason might be that `.loop` uses `nihil.nihil` to break the looping, while `.drop` offers `nihil` generating `nihil.nihil`.
API
$3
nihil(parser A)=>parser A
nihil(source)=>nihil.nihil
nihil(RegExp)=>parser(RE)$3
- `{right:true, value : ... }` is the result when parsing succeeded.
- `{right:false, error : ... }` is the result when parsing failed.
- `nihil.nihil = { right:true, nihil:true }` is the result of the nihil parser.$3
It is the helper function of nihil
If you are a new hand of Javascript but familiar with OOP(Object Oriented Programming),
you can consider it as `class parser`As a matter of fact, its real member function are
`.try` and `parse`
for easily parse string and return proper result.
Because a parser is a function indeed, you can call it with argument `source`,
see nihil.sourceAs for other functions, they are all for convenience of creating parsers
in the grammar of chain calling.
You can intuitively feels it in Tutorial.
Also, the source code of
`nihil.parser`
promotes your understanding of the whole `nihil`.$3
Label the raw string with an ima(a Japanese word, means "current") location.
`js
const source = nihil.source("abc") // source = { raw: "abc", ima: 0 }
const ab = nihil(/ab/)
ab(source) // { right: true, value: "ab" }
// source = { raw: "abc", ima: 2 }
`$3
A parser returning the ima location of parsing source as value.
`js
const source = nihil.source("abc") // source = { raw: "3", ima: 0 }
const ab = nihil(/ab/)
ab(source)
nihil.location(source) // {right: true, value: 2}
// source = { raw: "abc", ima: 2 }
`$3
input array of RegExp or parser, return a parser
which parses source sequentially in the order of array$3
input array of RegExp or parser, return a parser
which tries to parse source from array[0] to array[array.length-1]:=end.
If array[k] succeeds, array[k+1],...,end wouldn't be used to parse source.$3
It helps implement the parser of LL(∞).
`nihil.keep` accepts `parser` and
selecting function `fn`
with value return by a `parser` as argument
and a parser as return value, and returns a new parser.
`js
var fn = value=>parser
var LLm = nihil.keep(parser)(fn)
`$3
It is similar to `nihil.keep` with difference
that it would DROP the values parsed out by `parse`
$3
Used to map the value of `parse` to the format you like
`js
nihil.map(parse)(mapping)
`$3
Used to deal with the `seperator` on the left and the right of `parser`.
`js
nihil.sep(parser)(seperator)
`$3
Used to looping parse string with `parser`.
`js
nihil.loop(parser)
`$3
Used to label the value of `parser`.
`js
nihil.label(parser)(label)
`$3
Used to locate the value of `parser`
`js
nihil.locate(parser)
``