remove invisible unicode characters
npm install out-of-character
Unicode has a few-dozen characters that do not render anything, on purpose.
This is cool for cultural idiosyncracies in historical languages.
More often though, their use is unintentional (or nefarious!), and these characters end-up causing problems parsing text formats.

• these are sometimes called 'zero-width', 'ignorable', or 'tag-characters' •

This library helps spot and remove these funboys, before they cause some trouble.
Please remember that some text is meant to have Khmer-vowels, or Kaithi-alphabet characters.


npm install -g out-of-character
detect invisible characters in all files in a directory
``bash`
out-of-character ./path/to/dir
remove them from all files in a directory
`bash`
out-of-character ./path/to/dir --replace
---
detect invisible characters in a file
`bash`
out-of-character ./path/to/file.txt
remove invisible characters from a file
`bash`
out-of-character ./path/to/file.txt --replace

js
import {detect, replace} from 'out-of-character'let str='nothing s͏neak឵y here' //actually, there is.
console.log(detect(str))
/* 😮 😮 😮
[
{
name: 'KHMER VOWEL INHERENT AA',
code: 'U+17B5',
offset: 15,
replacement: ''
},
{
name: 'MONGOLIAN VOWEL SEPARATOR',
code: 'U+180E',
offset: 19,
replacement: ''
}
]*/
// get rid of them!
let after = replace(str)
console.log(str !== after)
// true
`fixing/detecting in files can be done like:
`js
const fs = require('fs')
const {detect, replace} = require('out-of-character')let text = fs.readFileSync('./some-file.txt').toString()
console.log(detect(text))
// yikes.
// ok, fix it
fs.writeFileSync('./some-file.txt', replace(text))
// ok, double-check it.
let goodNow = fs.readFileSync('./some-file.txt').toString()
console.log(detect(goodNow))
// fhew.
``
Thank you to character.construction/blanks by Jan Lelis
and a tale of characters in Unicode by Stefan Judis

MIT