TypeScript library for regex equivalence, intersection, complement and other utilities that go beyond string matching.
npm install @gruhn/regex-utilsZero-dependency TypeScript library for regex utilities that go beyond string matching.
These are surprisingly hard to come by for any programming language. ✨
- Documentation
- Online demos:
- RegExp Equivalence Checker
- Random Password Generator
- 🔗 Set-style operations:
- .and(...) - Compute intersection of two regex.
- .not() - Compute the complement of a regex.
- .without(...) - Compute the difference of two regex.
- ✅ Set-style predicates:
- .isEquivalent(...) - Check whether two regex match the same strings.
- .isSubsetOf(...)
- .isSupersetOf(...)
- .isDisjointFrom(...)
- .isEmpty() - Check whether a regex matches no strings.
- 📜 Generate strings:
- .sample(...) - Generate random strings matching a regex.
- .enumerate() - Exhaustively enumerate strings matching a regex.
- 🔧 Miscellaneous:
- .size() - Count the number of strings that a regex matches.
- .derivative(...) - Compute a Brzozowski derivative of a regex.
- and others...
``bash`
npm install @gruhn/regex-utils`typescript`
import { RB } from '@gruhn/regex-utils'
| Feature | Support | Examples |
|---------|---------|-------------|
| Quantifiers | ✅ | a*, a+, a{3,10}, a? |a*?
| Lazy Quantifiers | ✅ | , a+?, a{3,10}?, a?? |a\|b
| Alternation | ✅ | |.
| Character classes | ✅ | , \w, [a-zA-Z], ... |\$
| Escaping | ✅ | , \., ... |(?:...)
| (Non-)capturing groups | ✅ | , (...) |^
| Start/end anchors | ⚠️1 | , $ |(?=...)
| Lookahead | ⚠️2 | , (?!...) |(?<=...)
| Lookbehind | ⚠️2 | , (? |
| Word boundary | ❌ | \b, \B |\p{...}
| Unicode property escapes | ❌ | , \P{...} |\1
| Backreferences | ❌ | \2 ... |dotAll
| flag | ✅ | /.../s, (?s:...) |global
| flag | ✅ | /.../g |hasIndices
| flag | ✅ | /.../d |ignoreCase
| flag | ❌ | /.../i (?i:...) |multiline
| flag | ❌ | /.../m (?m:...) |unicode
| flag | ❌ | /.../u |unicodeSets
| flag | ❌ | /.../v |sticky
| flag | ❌ | /.../y |
1. Some complex patterns are not supported like anchors inside quantifiers (^a)+ or anchors inside lookaheads (?=^a).(?=a(?=b))
2. Not supported are nested lookaheads/lookbehinds like and lookaheads/lookbehinds combinations like (?=a)b(?<=c).
An UnsupportedSyntaxError is thrown when unsupported patterns are detected.
The library SHOULD ALWAYS either throw an error or respect the regex specification exactly.
Please report a bug if the library silently uses a faulty interpretation.
Handling syntax-related errors:
`typescript
import { RB, ParseError, UnsupportedSyntaxError } from '@gruhn/regex-utils'
try {
RB(/^[a-z]*$/)
} catch (error) {
if (error instanceof SyntaxError) {
// Invalid regex syntax! Native error, not emitted by this library.
// E.g. this will also throw a SyntaxError: new RegExp(')')`
} else if (error instanceof ParseError) {
// The regex syntax is valid but the internal parser could not handle it.
// If this happens it's a bug in this library.
} else if (error instanceof UnsupportedSyntaxError) {
// Regex syntax is valid but not supported by this library.
}
}
Generate 5 random email addresses:
`typescript`
const email = RB(/^[a-z]+@[a-z]+\.[a-z]{2,3}$/)
for (const str of email.sample().take(5)) {
console.log(str)
}``
ky@e.no
cc@gg.gaj
z@if.ojk
vr@y.ehl
e@zx.hzq
Generate 5 random email addresses, which have exactly 20 characters:
`typescript`
const emailLength20 = email.and(/^.{20}$/)
for (const str of emailLength20.sample().take(5)) {
console.log(str)
}``
kahragjijttzyze@i.mv
gnpbjzll@cwoktvw.hhd
knqmyotxxblh@yip.ccc
kopfpstjlnbq@lal.nmi
vrskllsvblqb@gemi.wc
Say we found this incredibly complicated regex somewhere in the codebase:
`typescript`
const oldRegex = /^a|b$/
This can be simplified, right?
`typescript`
const newRegex = /^[ab]$/
But to double-check we can use .isEquivalent to verify that the new version matches exactly the same strings as the old version.oldRegex.test(str) === newRegex.test(str)
That is, whether for every possible input string:
`typescript`
RB(oldRegex).isEquivalent(newRegex) // false
Looks like we made some mistake.
We can generate counterexamples using .without(...) and .sample(...).newRegex
First, we derive new regex that match exactly what matches but not oldRegex and vice versa:`typescript`
const onlyNew = RB(newRegex).without(oldRegex)
const onlyOld = RB(oldRegex).without(newRegex)onlyNew turns out to be empty (onlyNew.isEmpty() === true) but onlyOld has some matches:`typescript`
for (const str of onlyOld.sample().take(5)) {
console.log(str)
}``
aaba
aa
aba
bab
aababaoldRegex
Why does match all these strings with multiple characters?newRegex
Shouldn't it only match "a" or "b" like ?oldRegex
Turns out we thought that is the same as ^(a|b)$(^a)|(b$)
but in reality it's the same as .
How do you write a regex that matches HTML comments like:
```
A straightforward attempt would be:typescript`.*
The problem is that also matches the end marker -->,`
so this is also a match:typescript`
and this shouldn't be part of it -->-->
We need to specify that the inner part can be any string that does not contain ..not()
With (aka. regex complement) this is easy:
`typescript
import { RB } from '@gruhn/regex-utils'
const commentStart = RB('.$/).not()
const commentEnd = RB('-->')
const comment = commentStart.concat(commentInner).concat(commentEnd)
`
With .toRegExp() we can convert back to a native JavaScript regex:`typescript`
comment.toRegExp()`
/^