@mtcute/html-parser

HTML entities parser for mtcute

> NOTE: The syntax implemented here is incompatible with Bot API _HTML_.
>
> Please read Syntax below for a detailed explanation

Features

- Supports all entities that Telegram supports
- Supports nested entities
- Proper newline/whitespace handling (just like in real HTML)
- Interpolation!

Usage

``ts import { html } from '@mtcute/html-parser'

tg.sendText( 'me', html
Hello, me! Updates from the feed:

${await getUpdatesFromFeed()}
)`

`Syntax`

@mtcute/html-parser uses htmlparser2under the hood, so the parser supports nearly any HTML. However, since the text is still processed in a custom way for Telegram, the supported subset of features is documented below:

`Line breaks and spaces`

Line breaks are not preserved,
is used instead, making the syntax very close to the one used when building web pages.

Multiple spaces and indents are collapsed (except in pre), when you do need multiple spaces use instead.

`Inline entities`

Inline entities are entities that are in-line with other text. We support these entities:

| Name | Code | Result (visual) | | ---------------- | ---------------------------------------------------------------- | ---------------------------- | | Bold |text, text| text | | Italic |text, text| _text_ | | Underline |text| text | | Strikethrough |~~text~~, ~~text~~, ~~text~~| ~~text~~ | | Spoiler |text (or tg-spoiler) | N/A | | Monospace (code) |text | text| | Text link |Google| Google | | Text mention |Name| N/A | | Custom emoji |😄 (or ) | N/A |

> Note: It is up to the client to look up user's input entity by ID for text mentions. > In most cases, you can only use IDs of users that were seen by the client while using given storage. > > Alternatively, you can explicitly provide access hash like this: >Name, where abcis user's access hash > written as a hexadecimal integer. Order of the parameters does matter, i.e. >tg://user?hash=abc&id=1234567 will not be processed as expected.

`Block entities`

The only block entity that Telegram supports are

 and , therefore it is the only tags we support too.
Optionally, language for 
 block can be specified like this:`html
export type Foo = 42

`
| Code                                                                                | Result (visual)              |
| ----------------------------------------------------------------------------------- | ---------------------------- |
| 
<pre>multiline\ntext</pre>
                                   | multiline
text
 |
| <pre language="javascript">
  export default 42
</pre>
 | export default 42
 |
 can be "expandable", in which case clients will only render the first three lines of the blockquote,
and the rest will only be shown when the user clicks on the blockquote.`html

  This is a blockquote that will be collapsed by default.

  Lorem ipsum dolor sit amet, consectetur adipiscing elit.

  Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

  This text is not shown until the blockquote is expanded.

`
Nested and overlapped entities
HTML is a nested language, and so is this parser. It does support nested entities, but overlapped entities will not work
as expected!Overlapping entities are supported in unparse(), though.
| Code                                                                                                                | Result (visual)                                                          |
|---------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|
| Welcome back, User!                                                                                 | Welcome back, _User_!                                                |
| bold and italic                                                                                     | bold _and_ italic
⚠️ word "italic" is not actually italic! |
| bold and italic
⚠️ this is how unparse() handles overlapping entities | bold _and_ _italic_                                                  |
Interpolation
Being a tagged template literal, html supports interpolation.
You can interpolate one of the following:
- string - will not be parsed, and appended to plain text as-is
  - In case you want the string to be parsed, use html as a simple function: html\... ${html('bold')} ...\
- number - will be converted to string and appended to plain text as-is
- TextWithEntities or MessageEntity - will add the text and its entities to the output. This is the type returned by html itself:
  `ts
  const bold = htmlbold
  const text = htmlHello, ${bold}!
  `
- falsy value (i.e. null, undefined, false`) - will be ignored
Note that because of interpolation, you almost never need to think about escaping anything,
since the values are not even parsed as HTML, and are appended to the output as-is.

Inline entities

Inline entities are entities that are in-line with other text. We support these entities:

| Name             | Code                                                             | Result (visual)              |
| ---------------- | ---------------------------------------------------------------- | ---------------------------- |
| Bold             |

text, text

                           | text                     |
| Italic           |

text, text

                                   | _text_                       |
| Underline        |

text

                                                    | text                  |
| Strikethrough    |

~~text~~, ~~text~~, ~~text~~

        | ~~text~~                     |
| Spoiler          |

text (or tg-spoiler

)                      | N/A                          |
| Monospace (code) |

text | text

                       |
| Text link        |

Google

                        | Google |
| Text mention     |

Name

                        | N/A                          |
| Custom emoji     |

😄 (or

) | N/A |

> Note: It is up to the client to look up user's input entity by ID for text mentions.
> In most cases, you can only use IDs of users that were seen by the client while using given storage.
>
> Alternatively, you can explicitly provide access hash like this:
>

Name, where abc

 is user's access hash
> written as a hexadecimal integer. Order of the parameters does matter, i.e.
>

tg://user?hash=abc&id=1234567 will not be processed as expected.

Block entities

The only block entity that Telegram supports are

and

, therefore it is the only tags we support too.

Optionally, language for

block can be specified like this:

`html

export type Foo = 42

`| Code | Result (visual) | | ----------------------------------------------------------------------------------- | ---------------------------- | | <pre>multiline\ntext</pre> | multiline text | | <pre language="javascript"> export default 42 </pre> | export default 42 |

can be "expandable", in which case clients will only render the first three lines of the blockquote, and the rest will only be shown when the user clicks on the blockquote.
`html
This is a blockquote that will be collapsed by default. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. This text is not shown until the blockquote is expanded.
`Nested and overlapped entities HTML is a nested language, and so is this parser. It does support nested entities, but overlapped entities will not work as expected!
Overlapping entities are supported in unparse(), though.
| Code | Result (visual) | |---------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------| |Welcome back, User!| Welcome back, _User_! | |bold and italic| bold _and_ italic ⚠️ word "italic" is not actually italic! | |bold and italic⚠️ this is how unparse() handles overlapping entities | bold _and_ _italic_ |
Interpolation
Being a tagged template literal, html supports interpolation.
You can interpolate one of the following: -string- will not be parsed, and appended to plain text as-is - In case you want the string to be parsed, usehtmlas a simple function: html\... ${html('bold')} ...\ -number- will be converted to string and appended to plain text as-is -TextWithEntities or MessageEntity - will add the text and its entities to the output. This is the type returned by htmlitself:`ts const bold = htmlboldconst text = htmlHello, ${bold}!`- falsy value (i.e.null, undefined, false`) - will be ignored
Note that because of interpolation, you almost never need to think about escaping anything,
since the values are not even parsed as HTML, and are appended to the output as-is.

@mtcute/html-parser

@mtcute/html-parser

Features

Usage

`Syntax`

`Line breaks and spaces`

`Inline entities`

`Block entities`

Nested and overlapped entities

`Interpolation`

@mtcute/html-parser

@mtcute/html-parser

Features

Usage

`Syntax`

`Line breaks and spaces`

`Inline entities`

`Block entities`

Nested and overlapped entities

`Interpolation`