Find and replace text in PDF files with preserved formatting
npm install @transkripid/pdf-text-replace
Find and replace text in PDF files while preserving formatting.
- Chainable API mimicking JavaScript's String.replace()
- Supports string and RegExp search patterns
- Preserves font styles, colors, and layout
- Handles FlateDecode compressed streams
- Graceful error handling (returns original buffer on failure)
- Automatic Unicode transliteration (CJK, Cyrillic, accented characters → ASCII)
- Pure TypeScript with minimal dependencies (pako for zlib, any-ascii for transliteration)
``bashFrom npm (when published)
npm install pdf-text-replace
Usage
`typescript
import { PDF } from 'pdf-text-replace';
import { readFileSync, writeFileSync } from 'fs';const input = readFileSync('document.pdf');
const modified = new PDF(input)
.replace('John Doe', 'Jane Smith')
.replace('old@email.com', 'new@email.com')
.replace(/\d{4}-\d{4}-\d{4}/g, 'XXXX-XXXX-XXXX')
.toBuffer();
writeFileSync('modified.pdf', modified);
`API
$3
Create a new PDF instance from a buffer.
$3
Queue a text replacement operation. Returns
this for chaining.-
search - String or RegExp pattern to find
- replacement - Text to replace matches with$3
Apply all queued replacements and return the modified PDF as a Buffer.
Returns the original buffer unchanged if:
- No matches are found
- An error occurs during processing
How It Works
The library parses PDF content streams (both raw and FlateDecode compressed), finds text operators (
Tj, TJ), and performs replacements while:1. Preserving the original font and styling
2. Adjusting horizontal scaling (
Tz operator) when replacement text has different width
3. Rebuilding the PDF with updated stream lengths and xref tableUnicode Support
Replacement text containing Unicode characters is automatically transliterated to ASCII for compatibility with standard PDF fonts (WinAnsiEncoding):
`typescript
// Chinese → Pinyin
.replace('Author', '银宵') // Becomes "YinXiao"// Korean → Romanized
.replace('Name', '스트레이') // Becomes "seuteulei"
// Cyrillic → Latin
.replace('Hello', 'Привет') // Becomes "Privet"
// Accented → Plain ASCII
.replace('Name', 'José García') // Becomes "Jose Garcia"
`This uses any-ascii for transliteration.
Limitations
- Only works with PDFs using
WinAnsiEncoding` (standard Latin text)MIT