A tiny parser framework
npm install teg-parserWARNING: This is currently in beta as I finalize out the API, write docs, and examples.
Teg is a tiny declarative parser toolkit written in Typescript. It aims to be a semantic and approachable library for parsing. Teg's semantics are mostly based off PEGS: Parsing Expression Grammers
* 0 dependencies
* Browser or Node
* 4.4kb minified (but highly tree-shakeable!)
* Well-tested
* Helpful error messages
* Straightforward and semantic by default
* But also powerful and composable API.
``sh`
npm install teg-parser
`ts
import { template, line } from "teg-parser"
/* Parse markdown level 1 headings /
const h1Parser = template# ${line}
const result = h1Parser.run("# heading\n")
assert(result.isSuccess())
assert.deepEqual(result.value, ["heading"])
const failResult = h1Parser.run("not a heading")
assert(failResult.isFailure())
console.log(failResult)
/**
* Logs
Parse Failure
| not a heading
| ^
Failed at index 0: Char did not match "#"
In middle of parsing text("#") at 0
In middle of parsing text("# ") at 0
In middle of parsing template(text("# "), line, text("")) at 0
*/
`
Often, you'll want to do some processing on a successful parse. To make this ergonomic, parsers define a map function that will let you transform successfully parsed content.
`ts
import { template, maybe, zeroOrMore, line, takeUntilAfter } from "teg-parser"
type Blockquote = {
content: string
}
const blockquote: Parser
= zeroOrMore(template> ${line})> Line 1\n> Line 2\n> Line 3
.map((lines) => lines.map(([line]) => line).join("\n"))
.map((content) => ({ content }))const result = blockquote.run(
)`assert(result.isSuccess())
assert.deepEqual(result.value, {
content: "Line 1\nLine 2\nLine 3",
})ParserSince it's written in typescript, types are inferred as much as possible.
Much of the idea comes from Chet Corcos's article on parsers. Although
s currently implementbimap,fold, andchainmethods as described in the article, I haven't found them as useful in real-world usage, and may remove them or change them.examplesExamples
There are some examples available in the
directory. It's TODO to build out more; help out if you want!`- [x] Markdown
- [x] CLI args
- [ ] Unordered list
- [ ] JSON
- [ ] LaTeXYou can also see an example of a bigger parser I use for my custom blog post format here: https://github.com/tanishqkancharla/tk-parser/blob/main/src/index.ts (although it's using an older version of teg right now).
API
Combinators
tsx`
/* Matches a text string /
export const text =(value: T) => Parser `tsx# ${line}
/**
* Tagged template text for parsing.
*
* "template" will parse "# Heading" to ["Heading"]${word}content
*
* Can use multiple parsers together. Keep in mind parsers run greedily,
* so "template" will fail on "textcontent" b/c thewordparser`
* will match "textcontent", and then it will try to match the text "content"
*/
export const template`tsx`
/**
* Match the given parser n or more times, with an optional delimiter parser
* in between.
*/
const nOrMore:(
n: number,
parser: Parser,
delimiter?: Parser
) => Parser
/**
* Match the given parser zero or more times, with an optional delimiter
* NOTE: this will always succeed.
*/
const zeroOrMore:(parser: Parser , delimiter?: Parser ) => Parser
/**
* Match the given parser one or more times, with an optional delimiter
*/
const oneOrMore:(parser: Parser , delimiter?: Parser ) => Parser `tsx`
/* Matches exactly one of the given parsers, checked in the given order /
const oneOf:[]>(
parsers: ParserArray
) => ParserArray[number]`tsx`
/**
* Match the given parsers in sequence
*
* @example
* sequence([text("a"), text("b"), text("c")]) => Parser<"abc">
*/
const sequence: (
parsers: Parser[],
delimiter?: Parser
) => Parser`tsx`
/**
* Look ahead in the stream to match the given parser.
* NOTE: This consumes no tokens.
*/
const lookahead:(parser: Parser ) => Parser `tsx`
/**
* Tries matching a parser, returns undefined if it fails
* NOTE: This parser always succeeds
*/
const maybe:(parser: Parser ) => Parser `tsxtakeUntilAfter(text("\n"))
/**
* Keep consuming until the given parser succeeds.
* Returns all the characters that were consumed before the parser succeded.
*
* @example
*takes until after the newline buttakeUpTo(text("\n"))
* doesn't include the newline itself in the result
*/
const takeUntilAfter:(parser: Parser ) => Parser
/**
* Keep consuming until before the given parser succeeds.
* Returns all the characters that were consumed before the parser succeded.
*
* @example
*takes all chars until before the newline`
*/
export const takeUpTo:(parser: Parser ): Parser `Built-in primitive parsers
tsx`
/**
* Takes the first sentence in the stream
* i.e. up to (and including) the first newline
*/
const line = takeUntilAfter(text("\n"));/* Matches a single lowercase English letter /
const lower: Parser/* Matches a single uppercase English letter /
const upper: Parser/* Matches a single English letter, case insensitive /
const letter: Parser/**
* Match an English word
*/
const word: Parser/* Match a single digit from 0 to 9 /
const digit: Parserconst integer: Parser
/* Match a single hexadecimal digit (0-9, A-F), case insensitive /
const hexDigit: Parser/* Match a single English letter or digit /
const alphaNumeric: Parser`Custom Parser
tsx`
const custom = new Parser((stream) => {
// ... logic
return new ParseSuccess(result, stream)
// or
return new ParseFailure(errorMessage, stream)
})teg-parser/testParserAll primitive parsers and combinators are built using these constructors, so you can look at those for examples.
Testing parsers
Teg ships utilities to test parsers at
. It is used like this:`tsx`
import { testParser } from "teg-parser/testParser";const test = testParser(parser)
/* Assert the content passed in completely parses to the expected value /
test.parses(content, expected)/**
* Assert the content gets parsed to the expected value, but without asserting
* all the content is consumed
*/
test.parsePartial(content, expected)/* Assert the parser successfully matches the given content /
test.matches(content)/* Assert the parser fails on the given content /
test.fails(content)tegESM and CJS
comes with out of the box support for both ESM and CJS. The correct format will be used depending on whether you useimport(ESM) orrequire` (CJS). However, a lot of parsers in teg are just simple utilities, so if you use ESM, you will be probably be able to tree-shake away a significant portion of the library.Name
(Tiny or Typed) Parser Expression Grammer
Author
Please make an issue on Github or email/dm me if you have feedback or suggestions!