A minimalistic library for sanitizing strings so that they can be safely used as HTML.
npm install purify-htmlbash
npm install purify-html
`
yarn
`bash
yarn add purify-html
`
CDN
`html
`
---
API
`javascript
import PurifyHTML, { setParser } from 'purify-html';
const sanitizer = new PurifyHTML(options);
`
$3
`javascript
setParser({
parse(str: string): Element
stringify(elem: Element): string
}): void
`
See here for details.
$3
#### sanitize(string): string
Performs string cleanup according to the rules passed in options.
`javascript
import PurifyHTML, { setParser } from 'purify-html';
const sanitizer = new PurifyHTML(options);
const untrustedString = '...';
console.log(sanitizer.sanitize(untrustedString));
`
#### toHTMLEntities(string): string
Coerces a string to HTML entities. Very similar to escaping, but for the HTML interpreter.
In HTML, such characters will be rendered "as is".
See more here.
`javascript
const str = '
';
console.log(
sanitizer.toHTMLEntities(str); // => '<br />', displays at the page like '
'
);
`
---
$3
An array with rules for the sanitizer.
`typescript
// Deprecated!
type AttributeRulePresetName =
| '%correct-link%'
| '%http-link%'
| '%https-link%'
| '%ftp-link%'
| '%https-link-without-search-params%'
| '%http-link-without-search-params%'
| '%same-origin%';
interface AttributeRule = {
// attribute name
name: string;
// rules for attribute value
value?:
| string
| string[]
| RegExp
| { preset: AttributeRulePresetName } // Deprecated!
| ((attributeValue: string) => boolean);
};
interface TagRule = {
// tagname
name: string;
// rules for attributes
attributes: AttributeRule[];
// Don't remove comments in THIS node.
// Comments in children nodes will be saved
dontRemoveComments?: boolean;
/*
Example:
Config
[
{ name: 'div', dontRemoveComments: true },
'span'
]
Input:
Output:
*/
};
`
`javascript
import PurifyHTML, { setParser } from 'purify-html';
const sanitizer = new PurifyHTML([
'hr',
{ name: 'br' },
{ name: 'img', attributes: [{ name: 'src' }] },
{
name: 'a',
attributes: [
{ name: 'target', value: ['_blank', '_self', '_parent', '_top'] },
{ name: 'href', value: /^https?:\/\/*/ },
],
},
]);
`
NOTE When using regular expressions to check for untrusted strings, don't forget to check your regular expressions for ReDoS vulnerabilities.
_A successful exploit of the ReDoS vulnerability is to cause the program to hang when trying to parse a specially crafted string._
See more: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
Examples
via bundler:
`javascript
import PurifyHTML from 'purify-html';
const allowedTags = [
// only string
'hr',
// as object
{ name: 'br' },
// attributes check
{ name: 'b', attributes: ['class'] },
// advanced attributes check
{ name: 'p', attributes: [{ name: 'class' }] },
// check attributes values (string)
{ name: 'strong', attributes: [{ name: 'id', value: 'awesome-strong-id' }] },
// check attributes values (RegExp)
{ name: 'em', attributes: [{ name: 'id', value: /awesome-em-id?s/ }] },
// check attributes values (array of strings)
{
name: 'em',
attributes: [
{ name: 'id', value: ['awesome-strong-id', 'other-awesome-strong-id'] },
],
},
// check attribute value (function)
{
name: 'em',
attributes: [{ name: 'id', value: value => value.startsWith('awesome-') }],
},
// use attributes checks preset (Deprecated)
{
name: 'a',
attributes: [{ name: 'href', value: { preset: '%https-link%' } }], // presets are deprecated
},
];
const sanitizer = new PurifyHTML(allowedTags);
const dangerString =
123
;
const safeString = sanitizer.sanitize(dangerString);
console.log(safeString);
/*
<img src="1" onerror="alert(1)">
Bold
Bold
Bold
123
*/
`
Try it.
---
$3
`html
`
Usage for the browser is slightly different from usage with faucets. This is bad, but it had to be done in order not to clog the global scope.
---
HTML comments
$3
For example the line: .
Technically, inserting it into the DOM will not lead to code execution, but it cannot be considered safe either. The result of the sanitize method is declared to be sanitized using the rules specified when the sanitizer was initialized.
> NOTE
> CDATA comments (or CDATA sections), although made for XML, are also supported in HTML. In purify-html they are treated as regular HTML comments.
See more about CDATA Section on MDN
Therefore, you are given the opportunity to control in which places you will leave comments, and in which not.
$3
By default, HTML comments are stripped. You can change it like this:
`javascript
import PurifyHTML from 'purify-html';
const sanitizer = new PurifyHTML(['#comments' / ... /]);
sanitizer.sanitize(/ ... /);
`
If you want comments to be removed everywhere except for specific tags, then you can specify it like this:
`javascript
import PurifyHTML from 'purify-html';
const rules = ['#comments', { name: 'div', dontRemoveComments: true }];
const sanitizer = new PurifyHTML(rules);
sanitizer.sanitize(/ ... /);
`
node-js
When used in an environment where the standard DOMParser is absent, you need to install a parser manually.
For example:
`js
import { JSDOM } from 'jsdom';
global.DOMParser = new JSDOM().window.DOMParser;
import PurifyHTML from 'purify-html';
const sanitizer = new PurifyHTML(); // works
`
Or
`js
import { JSDOM } from 'jsdom';
import PurifyHTML, { setParser } from 'purify-html';
// Scope elem variable, reuse DOMParser instance for performance
{
const elem: Element = new DOMParser()
.parseFromString('', 'text/html')
.querySelector('body');
// Set methods
setParser({
parse(string: string): Element {
elem.innerHTML = string;
return elem;
},
stringify(elem: Element): string {
return elem.innerHTML;
},
});
}
`
setParser
In some cases, you may want to be able to use your parser instead of DOMParser.
This can be done like this:
`javascript
import PurifyHTML, { setParser } from 'purify-html';
setParser({
parse(HTMLString: string): HTMLElement {
// ...
},
stringify(element: HTMLElement): string {
// ...
},
});
const sanitizer = new PurifyHTML();
// ...
`
Note! The root element will be passed to the stringify function, and the CONTENT of the element will be expected as a result.
`javascript
const input = document.createElement('div');
input.innerHTML = 'span';
stringify(input); // 'span' => OK
stringify(input); // 'span' => WRONG
`
---
Why not
document.createElement(...)
Because by processing the string like this:
`javascript
const parse = str => {
const node = document.createElement('div');
div.innerHTML = str;
return div;
};
`
In fact, this function, having received a special payload, will RUN it. The following payload will send a network request:
And in the case of using DOMParser, the code does not run.
---
Attributes checks preset list
- %correct-link% - only currect link
- %http-link% - only http link
- %https-link% - only https link
- %ftp-link% - only ftp link
- %https-link-without-search-params% - delete all search params and force https protocol
- %http-link-without-search-params% - delete all search params and force https protocol
- %same-https-origin% - only link that lead to the same origin that is currently in self.location.origin. + force https protocol
- %same-http-origin% - only link that lead to the same origin that is currently in self.location.origin`. + force http protocol