Run đ€ Transformers directly in your browser, with no need for a server!
Transformers.js is designed to be functionally equivalent to Hugging Face's transformers python library, meaning you can run the same pretrained models using a very similar API. These models support common tasks in different modalities, such as:
- đ Natural Language Processing: text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation.
- đŒïž Computer Vision: image classification, object detection, segmentation, and depth estimation.
- đŁïž Audio: automatic speech recognition, audio classification, and text-to-speech.
- đ Multimodal: embeddings, zero-shot audio classification, zero-shot image classification, and zero-shot object detection.
Transformers.js uses ONNX Runtime to run models in the browser. The best part about it, is that you can easily convert your pretrained PyTorch, TensorFlow, or JAX models to ONNX using đ€ Optimum.
For more information, check out the full documentation.
Installation
To install via NPM, run:
``bash
npm i @huggingface/transformers
`
Alternatively, you can use it in vanilla JS, without any bundler, by using a CDN or static hosting. For example, using ES Modules, you can import the library with:
`html
`
Quick tour
It's super simple to translate from existing code! Just like the python library, we support the pipeline API. Pipelines group together a pretrained model with preprocessing of inputs and postprocessing of outputs, making it the easiest way to run models with the library.
Python (original)
Javascript (ours)
`python
from transformers import pipeline
Allocate a pipeline for sentiment-analysis
pipe = pipeline('sentiment-analysis')
out = pipe('I love transformers!')
[{'label': 'POSITIVE', 'score': 0.999806941}]
`
`javascript
import { pipeline } from '@huggingface/transformers';
// Allocate a pipeline for sentiment-analysis
const pipe = await pipeline('sentiment-analysis');
const out = await pipe('I love transformers!');
// [{'label': 'POSITIVE', 'score': 0.999817686}]
`
You can also use a different model by specifying the model id or path as the second argument to the pipeline function. For example:
`javascript
// Use a different model for sentiment-analysis
const pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment');
`
By default, when running in the browser, the model will be run on your CPU (via WASM). If you would like
to run the model on your GPU (via WebGPU), you can do this by setting device: 'webgpu', for example:
`javascript
// Run the model on WebGPU
const pipe = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', {
device: 'webgpu',
});
`
> [!WARNING]
> The WebGPU API is still experimental in many browsers, so if you run into any issues,
> please file a bug report.
In resource-constrained environments, such as web browsers, it is advisable to use a quantized version of
the model to lower bandwidth and optimize performance. This can be achieved by adjusting the dtype option,
which allows you to select the appropriate data type for your model. While the available options may vary
depending on the specific model, typical choices include "fp32" (default for WebGPU), "fp16", "q8" (default for WASM), and "q4". For more information, check out the quantization guide.
`javascript
// Run the model at 4-bit quantization
const pipe = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', {
dtype: 'q4',
});
`
Examples
Want to jump straight in? Get started with one of our sample applications/templates, which can be found here.
| Name | Description | Links |
|-------------------|----------------------------------|-------------------------------|
| Whisper Web | Speech recognition w/ Whisper | code, demo |
| Doodle Dash | Real-time sketch-recognition game | blog, code, demo |
| Code Playground | In-browser code completion website | code, demo |
| Semantic Image Search (client-side) | Search for images with text | code, demo |
| Semantic Image Search (server-side) | Search for images with text (Supabase) | code, demo |
| Vanilla JavaScript | In-browser object detection | video, code, demo |
| React | Multilingual translation website | code, demo |
| Text to speech (client-side) | In-browser speech synthesis | code, demo |
| Browser extension | Text classification extension | code |
| Electron | Text classification application | code |
| Next.js (client-side) | Sentiment analysis (in-browser inference) | code, demo |
| Next.js (server-side) | Sentiment analysis (Node.js inference) | code, demo |
| Node.js | Sentiment analysis API | code |
| Demo site | A collection of demos | code, demo |
Check out the Transformers.js template on Hugging Face to get started in one click!
`javascript
import { env } from '@huggingface/transformers';
// Specify a custom location for models (defaults to '/models/').
env.localModelPath = '/path/to/models/';
// Disable the loading of remote models from the Hugging Face Hub:
env.allowRemoteModels = false;
// Set location of .wasm files. Defaults to use a CDN.
env.backends.onnx.wasm.wasmPaths = '/path/to/files/';
`
For a full list of available settings, check out the API Reference.
$3
We recommend using our conversion script to convert your PyTorch, TensorFlow, or JAX models to ONNX in a single command. Behind the scenes, it uses đ€ Optimum to perform conversion and quantization of your model.
Here is the list of all tasks and architectures currently supported by Transformers.js.
If you don't see your task/model listed here or it is not yet supported, feel free
to open up a feature request here.
To find compatible models on the Hub, select the "transformers.js" library tag in the filter menu (or visit this link).
You can refine your search by selecting the task you're interested in (e.g., text-classification).
$3
#### Natural Language Processing
| Task | ID | Description | Supported? |
|--------------------------|----|-------------|------------|
| Fill-Mask | fill-mask | Masking some of the words in a sentence and predicting which words should replace those masks. | â (docs) (models) |
| Question Answering | question-answering | Retrieve the answer to a question from a given text. | â (docs) (models) |
| Sentence Similarity | sentence-similarity | Determining how similar two texts are. | â (docs) (models) |
| Summarization | summarization | Producing a shorter version of a document while preserving its important information. | â (docs) (models) |
| Table Question Answering | table-question-answering | Answering a question about information from a given table. | â |
| Text Classification | text-classification or sentiment-analysis | Assigning a label or class to a given text. | â (docs) (models) |
| Text Generation | text-generation | Producing new text by predicting the next word in a sequence. | â (docs) (models) |
| Text-to-text Generation | text2text-generation | Converting one text sequence into another text sequence. | â (docs) (models) |
| Token Classification | token-classification or ner | Assigning a label to each token in a text. | â (docs) (models) |
| Translation | translation | Converting text from one language to another. | â (docs) (models) |
| Zero-Shot Classification | zero-shot-classification | Classifying text into classes that are unseen during training. | â (docs) (models) |
| Feature Extraction | feature-extraction | Transforming raw data into numerical features that can be processed while preserving the information in the original dataset. | â (docs) (models) |
#### Vision
| Task | ID | Description | Supported? |
|--------------------------|----|-------------|------------|
| Background Removal | background-removal | Isolating the main subject of an image by removing or making the background transparent. | â (docs) (models) |
| Depth Estimation | depth-estimation | Predicting the depth of objects present in an image. | â (docs) (models) |
| Image Classification | image-classification | Assigning a label or class to an entire image. | â (docs) (models) |
| Image Segmentation | image-segmentation | Divides an image into segments where each pixel is mapped to an object. This task has multiple variants such as instance segmentation, panoptic segmentation and semantic segmentation. | â (docs) (models) |
| Image-to-Image | image-to-image | Transforming a source image to match the characteristics of a target image or a target image domain. | â (docs) (models) |
| Mask Generation | mask-generation | Generate masks for the objects in an image. | â |
| Object Detection | object-detection | Identify objects of certain defined classes within an image. | â (docs) (models) |
| Video Classification | n/a | Assigning a label or class to an entire video. | â |
| Unconditional Image Generation | n/a | Generating images with no condition in any context (like a prompt text or another image). | â |
| Image Feature Extraction | image-feature-extraction | Transforming raw data into numerical features that can be processed while preserving the information in the original image. | â (docs) (models) |
#### Audio
| Task | ID | Description | Supported? |
|--------------------------|----|-------------|------------|
| Audio Classification | audio-classification | Assigning a label or class to a given audio. | â (docs) (models) |
| Audio-to-Audio | n/a | Generating audio from an input audio source. | â |
| Automatic Speech Recognition | automatic-speech-recognition | Transcribing a given audio into text. | â (docs) (models) |
| Text-to-Speech | text-to-speech or text-to-audio | Generating natural-sounding speech given text input. | â (docs) (models) |
#### Tabular
| Task | ID | Description | Supported? |
|--------------------------|----|-------------|------------|
| Tabular Classification | n/a | Classifying a target category (a group) based on set of attributes. | â |
| Tabular Regression | n/a | Predicting a numerical value given a set of attributes. | â |
#### Multimodal
| Task | ID | Description | Supported? |
|--------------------------|----|-------------|------------|
| Document Question Answering | document-question-answering | Answering questions on document images. | â (docs) (models) |
| Image-to-Text | image-to-text | Output text from a given image. | â (docs) (models) |
| Text-to-Image | text-to-image | Generates images from input text. | â |
| Visual Question Answering | visual-question-answering | Answering open-ended questions based on an image. | â |
| Zero-Shot Audio Classification | zero-shot-audio-classification | Classifying audios into classes that are unseen during training. | â (docs) (models) |
| Zero-Shot Image Classification | zero-shot-image-classification | Classifying images into classes that are unseen during training. | â (docs) (models) |
| Zero-Shot Object Detection | zero-shot-object-detection` | Identify objects of classes that are unseen during training. | â (docs) (models) |
#### Reinforcement Learning
| Task | ID | Description | Supported? |
|--------------------------|----|-------------|------------|
| Reinforcement Learning | n/a | Learning from actions by interacting with an environment through trial and error and receiving rewards (negative or positive) as feedback. | â |