Regex Utils

Zero-dependency TypeScript library for regex utilities that go beyond string matching.
These are surprisingly hard to come by for any programming language. ✨

- Documentation
- Online demos:
- RegExp Equivalence Checker
- Random Password Generator

API Overview 🚀

- 🔗 Set-style operations:
- .and(...) - Compute intersection of two regex.
- .not() - Compute the complement of a regex.
- .without(...) - Compute the difference of two regex.
- ✅ Set-style predicates:
- .isEquivalent(...) - Check whether two regex match the same strings.
- .isSubsetOf(...)
- .isSupersetOf(...)
- .isDisjointFrom(...)
- .isEmpty() - Check whether a regex matches no strings.
- 📜 Generate strings:
- .sample(...) - Generate random strings matching a regex.
- .enumerate() - Exhaustively enumerate strings matching a regex.
- 🔧 Miscellaneous:
- .size() - Count the number of strings that a regex matches.
- .derivative(...) - Compute a Brzozowski derivative of a regex.
- and others...

Installation 📦

``bash npm install @gruhn/regex-utils``typescript import { RB } from '@gruhn/regex-utils'`

`Syntax Support`

| Feature | Support | Examples | |---------|---------|-------------| | Quantifiers | ✅ |a*, a+, a{3,10}, a?| | Lazy Quantifiers | ✅ |a*?, a+?, a{3,10}?, a??| | Alternation | ✅ |a\|b| | Character classes | ✅ |., \w, [a-zA-Z], ... | | Escaping | ✅ |\$, \., ... | | (Non-)capturing groups | ✅ |(?:...), (...)| | Start/end anchors | ⚠️¹ |^, $| | Lookahead | ⚠️² |(?=...), (?!...)| | Lookbehind | ⚠️² |(?<=...), (? |
| Word boundary | ❌ | \b, \B| | Unicode property escapes | ❌ |\p{...}, \P{...}| | Backreferences | ❌ |\1 \2... | |dotAll flag | ✅ | /.../s, (?s:...)| |global flag | ✅ | /.../g| |hasIndices flag | ✅ | /.../d| |ignoreCase flag | ❌ | /.../i (?i:...)| |multiline flag | ❌ | /.../m (?m:...)| |unicode flag | ❌ | /.../u| |unicodeSets flag | ❌ | /.../v| |sticky flag | ❌ | /.../y |

1. Some complex patterns are not supported like anchors inside quantifiers (^a)+ or anchors inside lookaheads (?=^a). 2. Not supported are nested lookaheads/lookbehinds like(?=a(?=b)) and lookaheads/lookbehinds combinations like (?=a)b(?<=c).

An UnsupportedSyntaxErroris thrown when unsupported patterns are detected. The library SHOULD ALWAYS either throw an error or respect the regex specification exactly. Please report a bug if the library silently uses a faulty interpretation.

Handling syntax-related errors:`typescript import { RB, ParseError, UnsupportedSyntaxError } from '@gruhn/regex-utils'

try { RB(/^[a-z]*$/) } catch (error) { if (error instanceof SyntaxError) { // Invalid regex syntax! Native error, not emitted by this library. // E.g. this will also throw aSyntaxError: new RegExp(')') } else if (error instanceof ParseError) { // The regex syntax is valid but the internal parser could not handle it. // If this happens it's a bug in this library. } else if (error instanceof UnsupportedSyntaxError) { // Regex syntax is valid but not supported by this library. } }`

`Example use cases 💡`

`$3`

Generate 5 random email addresses:`typescript const email = RB(/^[a-z]+@[a-z]+\.[a-z]{2,3}$/) for (const str of email.sample().take(5)) { console.log(str) }``ky@e.no cc@gg.gaj z@if.ojk vr@y.ehl e@zx.hzq`

Generate 5 random email addresses, which have exactly 20 characters:`typescript const emailLength20 = email.and(/^.{20}$/) for (const str of emailLength20.sample().take(5)) { console.log(str) }``kahragjijttzyze@i.mv gnpbjzll@cwoktvw.hhd knqmyotxxblh@yip.ccc kopfpstjlnbq@lal.nmi vrskllsvblqb@gemi.wc`

`$3`

ONLINE DEMO

Say we found this incredibly complicated regex somewhere in the codebase:`typescript const oldRegex = /^a|b$/`

This can be simplified, right?`typescript const newRegex = /^[ab]$/`

But to double-check we can use .isEquivalentto verify that the new version matches exactly the same strings as the old version. That is, whetheroldRegex.test(str) === newRegex.test(str) for every possible input string:

`typescript RB(oldRegex).isEquivalent(newRegex) // false`

Looks like we made some mistake. We can generate counterexamples using.without(...) and .sample(...). First, we derive new regex that match exactly whatnewRegex matches but not oldRegexand vice versa:`typescript const onlyNew = RB(newRegex).without(oldRegex) const onlyOld = RB(oldRegex).without(newRegex)`onlyNew turns out to be empty (onlyNew.isEmpty() === true) but onlyOldhas some matches:`typescript for (const str of onlyOld.sample().take(5)) { console.log(str) }``aaba aa aba bab aababa`Why doesoldRegexmatch all these strings with multiple characters? Shouldn't it only match "a" or "b" likenewRegex? Turns out we thought thatoldRegex is the same as ^(a|b)$but in reality it's the same as(^a)|(b$).

`$3`

How do you write a regex that matches HTML comments like:``A straightforward attempt would be:`typescript`The problem is that.* also matches the end marker -->, so this is also a match:`typescript and this shouldn't be part of it -->`We need to specify that the inner part can be any string that does not contain-->. With.not() (aka. regex complement) this is easy:

`typescript import { RB } from '@gruhn/regex-utils'

const commentStart = RB('.$/).not() const commentEnd = RB('-->')

const comment = commentStart.concat(commentInner).concat(commentEnd)`

With .toRegExp()we can convert back to a native JavaScript regex:`typescript comment.toRegExp()``/^

`Regex Utils`

Zero-dependency TypeScript library for regex utilities that go beyond string matching. These are surprisingly hard to come by for any programming language. ✨

- Documentation - Online demos: - RegExp Equivalence Checker - Random Password Generator

`API Overview 🚀`

- 🔗 Set-style operations: - .and(...) - Compute intersection of two regex. - .not() - Compute the complement of a regex. - .without(...) - Compute the difference of two regex. - ✅ Set-style predicates: - .isEquivalent(...) - Check whether two regex match the same strings. - .isSubsetOf(...) - .isSupersetOf(...) - .isDisjointFrom(...) - .isEmpty() - Check whether a regex matches no strings. - 📜 Generate strings: - .sample(...) - Generate random strings matching a regex. - .enumerate() - Exhaustively enumerate strings matching a regex. - 🔧 Miscellaneous: - .size() - Count the number of strings that a regex matches. - .derivative(...) - Compute a Brzozowski derivative of a regex. - and others...

`Installation 📦`

``bash npm install @gruhn/regex-utils ` `typescript import { RB } from '@gruhn/regex-utils' `

`Syntax Support`

| Feature | Support | Examples | |---------|---------|-------------| | Quantifiers | ✅ | a*, a+, a{3,10}, a? | | Lazy Quantifiers | ✅ | a*?, a+?, a{3,10}?, a?? | | Alternation | ✅ | a\|b | | Character classes | ✅ | ., \w, [a-zA-Z], ... | | Escaping | ✅ | \$, \., ... | | (Non-)capturing groups | ✅ | (?:...), (...) | | Start/end anchors | ⚠️¹ | ^, $ | | Lookahead | ⚠️² | (?=...), (?!...) | | Lookbehind | ⚠️² | (?<=...), (? | | Word boundary | ❌ |\b, \B| | Unicode property escapes | ❌ |\p{...}, \P{...}| | Backreferences | ❌ |\1 \2... | |dotAll flag | ✅ | /.../s, (?s:...)| |global flag | ✅ | /.../g| |hasIndices flag | ✅ | /.../d| |ignoreCase flag | ❌ | /.../i (?i:...)| |multiline flag | ❌ | /.../m (?m:...)| |unicode flag | ❌ | /.../u| |unicodeSets flag | ❌ | /.../v| |sticky flag | ❌ | /.../y |

Handling syntax-related errors:`typescript import { RB, ParseError, UnsupportedSyntaxError } from '@gruhn/regex-utils'

`Example use cases 💡`

`$3`

ONLINE DEMO

Say we found this incredibly complicated regex somewhere in the codebase:`typescript const oldRegex = /^a|b$/`

This can be simplified, right?`typescript const newRegex = /^[ab]$/`

`typescript RB(oldRegex).isEquivalent(newRegex) // false`

`$3`

`typescript import { RB } from '@gruhn/regex-utils'

const commentStart = RB('.$/).not() const commentEnd = RB('-->')

const comment = commentStart.concat(commentInner).concat(commentEnd)`

With .toRegExp()we can convert back to a native JavaScript regex:`typescript comment.toRegExp()``/^