Library to work against complex domain names, subdomains and URIs.
npm install tldtstldts is a JavaScript library to extract hostnames, domains, public suffixes, top-level domains and subdomains from URLs.
Features:
1. Tuned for performance (order of 0.1 to 1 μs per input)
2. Handles both URLs and hostnames
3. Full Unicode/IDNA support
4. Support parsing email addresses
5. Detect IPv4 and IPv6 addresses
6. Continuously updated version of the public suffix list
7. TypeScript, ships with umd, esm, cjs bundles and _type definitions_
8. Small bundles and small memory footprint
9. Battle tested: full test coverage and production use
``bash`
npm install --save tldts
Using the command-line interface:
`js`
$ npx tldts 'http://www.writethedocs.org/conf/eu/2017/'
{
"domain": "writethedocs.org",
"domainWithoutSuffix": "writethedocs",
"hostname": "www.writethedocs.org",
"isIcann": true,
"isIp": false,
"isPrivate": false,
"publicSuffix": "org",
"subdomain": "www"
}
Programmatically:
`js
const { parse } = require('tldts');
// Retrieving hostname related informations of a given URL
parse('http://www.writethedocs.org/conf/eu/2017/');
// { domain: 'writethedocs.org',
// domainWithoutSuffix: 'writethedocs',
// hostname: 'www.writethedocs.org',
// isIcann: true,
// isIp: false,
// isPrivate: false,
// publicSuffix: 'org',
// subdomain: 'www' }
`
Modern _ES6 modules import_ is also supported:
`js`
import { parse } from 'tldts';
Alternatively, you can try it _directly in your browser_ here: https://npm.runkit.com/tldts
- tldts.parse(url | hostname, options)tldts.getHostname(url | hostname, options)
- tldts.getDomain(url | hostname, options)
- tldts.getPublicSuffix(url | hostname, options)
- tldts.getSubdomain(url, | hostname, options)
- tldts.getDomainWithoutSuffix(url | hostname, options)
-
The behavior of tldts can be customized using an options argument for all
the functions exposed as part of the public API. This is useful to both change
the behavior of the library as well as fine-tune the performance depending on
your inputs.
`jsfalse
{
// Use suffixes from ICANN section (default: true)
allowIcannDomains: boolean;
// Use suffixes from Private section (default: false)
allowPrivateDomains: boolean;
// Extract and validate hostname (default: true)
// When set to , inputs will be considered valid hostnames.false
extractHostname: boolean;
// Validate hostnames after parsing (default: true)
// If a hostname is not valid, not further processing is performed. When set
// to , inputs to the library will be considered valid and parsing willfalse
// proceed regardless.
validateHostname: boolean;
// Perform IP address detection (default: true).
detectIp: boolean;
// Assume that both URLs and hostnames can be given as input (default: true)
// If set to we assume only URLs will be given as input, which`
// speed-ups processing.
mixedInputs: boolean;
// Specifies extra valid suffixes (default: null)
validHosts: string[] | null;
}
The parse method returns handy properties about a URL or a hostname.
`js
const tldts = require('tldts');
tldts.parse('https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv');
// { domain: 'amazonaws.com',
// domainWithoutSuffix: 'amazonaws',
// hostname: 'spark-public.s3.amazonaws.com',
// isIcann: true,
// isIp: false,
// isPrivate: false,
// publicSuffix: 'com',
// subdomain: 'spark-public.s3' }
tldts.parse(
'https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv',
{ allowPrivateDomains: true },
);
// { domain: 'spark-public.s3.amazonaws.com',
// domainWithoutSuffix: 'spark-public',
// hostname: 'spark-public.s3.amazonaws.com',
// isIcann: false,
// isIp: false,
// isPrivate: true,
// publicSuffix: 's3.amazonaws.com',
// subdomain: '' }
tldts.parse('gopher://domain.unknown/');
// { domain: 'domain.unknown',
// domainWithoutSuffix: 'domain',
// hostname: 'domain.unknown',
// isIcann: false,
// isIp: false,
// isPrivate: false,
// publicSuffix: 'unknown',
// subdomain: '' }
tldts.parse('https://192.168.0.0'); // IPv4
// { domain: null,
// domainWithoutSuffix: null,
// hostname: '192.168.0.0',
// isIcann: null,
// isIp: true,
// isPrivate: null,
// publicSuffix: null,
// subdomain: null }
tldts.parse('https://[::1]'); // IPv6
// { domain: null,
// domainWithoutSuffix: null,
// hostname: '::1',
// isIcann: null,
// isIp: true,
// isPrivate: null,
// publicSuffix: null,
// subdomain: null }
tldts.parse('tldts@emailprovider.co.uk'); // email
// { domain: 'emailprovider.co.uk',
// domainWithoutSuffix: 'emailprovider',
// hostname: 'emailprovider.co.uk',
// isIcann: true,
// isIp: false,
// isPrivate: false,
// publicSuffix: 'co.uk',
// subdomain: '' }
`
| Property Name | Type | Description |
| :-------------------- | :----- | :---------------------------------------------- |
| hostname | str | hostname of the input extracted automatically |domain
| | str | Domain (tld + sld) |domainWithoutSuffix
| | str | Domain without public suffix |subdomain
| | str | Sub domain (what comes after domain) |publicSuffix
| | str | Public Suffix (tld) of hostname |isIcann
| | bool | Does TLD come from ICANN part of the list |isPrivate
| | bool | Does TLD come from Private part of the list |isIP
| | bool | Is hostname an IP address? |
These methods are shorthands if you want to retrieve only a single value (and
will perform better than parse because less work will be needed).
Returns the hostname from a given string.
`javascript
const { getHostname } = require('tldts');
getHostname('google.com'); // returns google.comfr.google.com
getHostname('fr.google.com'); // returns fr.google.google
getHostname('fr.google.google'); // returns foo.google.co.uk
getHostname('foo.google.co.uk'); // returns t.co
getHostname('t.co'); // returns fr.t.co
getHostname('fr.t.co'); // returns example.co.uk
getHostname(
'https://user:password@example.co.uk:8080/some/path?and&query#hash',
); // returns `
Returns the fully qualified domain from a given string.
`javascript
const { getDomain } = require('tldts');
getDomain('google.com'); // returns google.comgoogle.com
getDomain('fr.google.com'); // returns google.google
getDomain('fr.google.google'); // returns google.co.uk
getDomain('foo.google.co.uk'); // returns t.co
getDomain('t.co'); // returns t.co
getDomain('fr.t.co'); // returns example.co.uk
getDomain('https://user:password@example.co.uk:8080/some/path?and&query#hash'); // returns `
Returns the domain (as returned by getDomain(...)) without the public suffix part.
`javascript
const { getDomainWithoutSuffix } = require('tldts');
getDomainWithoutSuffix('google.com'); // returns google
getDomainWithoutSuffix('fr.google.com'); // returns
getDomainWithoutSuffix('fr.google.google'); // returns
getDomainWithoutSuffix('foo.google.co.uk'); // returns t
getDomainWithoutSuffix('t.co'); // returns t
getDomainWithoutSuffix('fr.t.co'); // returns example
getDomainWithoutSuffix(
'https://user:password@example.co.uk:8080/some/path?and&query#hash',
); // returns `
Returns the complete subdomain for a given string.
`javascript
const { getSubdomain } = require('tldts');
getSubdomain('google.com'); // returns fr
getSubdomain('fr.google.com'); // returns
getSubdomain('google.co.uk'); // returns foo
getSubdomain('foo.google.co.uk'); // returns moar.foo
getSubdomain('moar.foo.google.co.uk'); // returns
getSubdomain('t.co'); // returns fr
getSubdomain('fr.t.co'); // returns secure
getSubdomain(
'https://user:password@secure.example.co.uk:443/some/path?and&query#hash',
); // returns `
Returns the [public suffix][] for a given string.
`javascript
const { getPublicSuffix } = require('tldts');
getPublicSuffix('google.com'); // returns comcom
getPublicSuffix('fr.google.com'); // returns co.uk
getPublicSuffix('google.co.uk'); // returns com
getPublicSuffix('s3.amazonaws.com'); // returns s3.amazonaws.com
getPublicSuffix('s3.amazonaws.com', { allowPrivateDomains: true }); // returns unknown
getPublicSuffix('tld.is.unknown'); // returns `
and custom hostnamestldts methods getDomain and getSubdomain are designed to work only with _known and valid_ TLDs.
This way, you can trust what a domain is.
localhost is a valid hostname but not a TLD. You can pass additional options to each method exposed by tldts:
`js
const tldts = require('tldts');
tldts.getDomain('localhost'); // returns null
tldts.getSubdomain('vhost.localhost'); // returns null
tldts.getDomain('localhost', { validHosts: ['localhost'] }); // returns 'localhost'
tldts.getSubdomain('vhost.localhost', { validHosts: ['localhost'] }); // returns 'vhost'
`
tldts made the opinionated choice of shipping with a list of suffixes directly
in its bundle. There is currently no mechanism to update the lists yourself, but
we make sure that the version shipped is always up-to-date.
If you keep tldts updated, the lists should be up-to-date as well!
tldts is the _fastest JavaScript library_ available for parsing hostnames. It is able to parse _millions of inputs per second_ (typically 2-3M depending on your hardware and inputs). It also offers granular options to fine-tune the behavior and performance of the library depending on the kind of inputs you are dealing with (e.g.: if you know you only manipulate valid hostnames you can disable the hostname extraction step with { extractHostname: false }).
Please see this detailed comparison with other available libraries.
tldts is based upon the excellent tld.js` library and would not exist without
the many contributors who worked on the project:
This project would not be possible without the amazing Mozilla's
[public suffix list][]. Thank you for your hard work!
[badge-ci]: https://secure.travis-ci.org/remusao/tldts.svg?branch=master
[badge-downloads]: https://img.shields.io/npm/dm/tldts.svg
[public suffix list]: https://publicsuffix.org/list/
[list the recent changes]: https://github.com/publicsuffix/list/commits/master
[changes Atom Feed]: https://github.com/publicsuffix/list/commits/master.atom
[public suffix]: https://publicsuffix.org/learn/