Plug and play ONNX SAM segmentation in the browser
npm install minisamminiSAM is a JavaScript library for performing image segmentation directly in the browser using ONNX models. It uses a lightweight, distilled version of Meta AI's original Segment Anything Model (SAM), specifically optimized for efficient in-browser execution.
The training code used for the model distillation process will be made available soon. miniSAM provides a stateful API for interactive segmentation by allowing users to add include/exclude clicks on an image.
- In-Browser Segmentation: Runs entirely on the client-side, no server needed.
- ONNX Runtime: onnxruntime-web for efficient model inference.
- Stateful Segmentation Sessions: Manage clicks and masks per image using a SegmentationSession.
- Click-Based Interaction: Supports positive (include) and negative (exclude) clicks.
- Embedding Cache: Caches image embeddings to speed up subsequent operations on the same image.
- Customizable Model Paths: Allows specifying paths to your SAM encoder and decoder ONNX models.
Install miniSAM and its peer dependency onnxruntime-web using npm or yarn:
``bash`
npm install minisam onnxruntime-webor
yarn add minisam onnxruntime-web
Here's a basic example of how to use miniSAM:
`javascript
import {
initSegmentation,
createSession,
precomputeEmbedding,
ClickType,
} from "minisam";
async function runSegmentation(myImageElement, clickPoints) {
try {
// Initialize miniSAM (loads models)
await initSegmentation({
// To use default CDN models (recommended):
// encoderModelPath: DEFAULT_ENCODER_MODEL_PATH,
// samModelPath: DEFAULT_SAM_MODEL_PATH,
// For custom local/CDN models:
// encoderModelPath: '/path/to/your/encoder.onnx',
// samModelPath: '/path/to/your/sam_decoder.onnx'
});
console.log("miniSAM initialized!");
// Precompute image embedding for faster interaction
await precomputeEmbedding(myImageElement);
console.log("Embedding precomputed for the image.");
// Create a segmentation session for the image
const session = createSession(myImageElement);
console.log("Segmentation session created.");
// Add clicks
clickPoints.forEach((p) => {
session.addClick(p.x, p.y, p.type);
});
console.log(Added ${session.getClickCount()} clicks.);
// Perform segmentation
const imageDataMask = await session.segment(myImageElement);
if (imageDataMask) {
console.log("Segmentation successful! Mask generated:", imageDataMask);
// Example: Draw mask on canvas
const ctx = myDisplayCanvas.getContext('2d');
myDisplayCanvas.width = imageDataMask.width;
myDisplayCanvas.height = imageDataMask.height;
ctx.putImageData(imageDataMask, 0, 0);
// Example: Overlay on existing canvas
const tempMaskCanvas = document.createElement('canvas');
tempMaskCanvas.width = imageDataMask.width;
tempMaskCanvas.height = imageDataMask.height;
tempMaskCanvas.getContext('2d').putImageData(imageDataMask, 0, 0);
mainDisplayCtx.globalAlpha = 0.5;
mainDisplayCtx.drawImage(tempMaskCanvas, 0, 0, originalImageWidth, originalImageHeight);
} else {
console.log("No mask generated.");
}
// Optional: Session cleanup
session.dispose();
} catch (error) {
console.error("Error during segmentation:", error);
}
}
// Example usage:
const imageEl = document.getElementById('my-image');
const exampleClicks = [
{ x: 100, y: 150, type: "include" as ClickType },
{ x: 200, y: 250, type: "exclude" as ClickType }
];
runSegmentation(imageEl, exampleClicks);
`
Initializes the segmentation engine by loading the ONNX models. This must be called before any other miniSAM functions. See the "Model Loading" section for details on how models are loaded and how to customize paths.
- opts (optional): InitializationOptions object.encoderModelPath?: string
- : URL or path to the encoder ONNX model. If not provided, defaults to the CDN path DEFAULT_ENCODER_MODEL_PATH.samModelPath?: string
- : URL or path to the SAM (decoder) ONNX model. If not provided, defaults to the CDN path DEFAULT_SAM_MODEL_PATH.sessionOptions?: InferenceSession.SessionOptions
- : Advanced ONNX Runtime session options.
Precomputes and caches the embedding for a given image. This can significantly speed up the first call to session.segment() or the legacy segment() function for that image.
- image: The HTMLImageElement or HTMLCanvasElement to process.Promise
- Returns: A that resolves with a unique imageKey (string) for the processed image, which is used internally for caching.
Creates a new stateful SegmentationSession for a specific image.
- image: The HTMLImageElement or HTMLCanvasElement this session will be associated with. The embedding for this image will be computed on the first segment() call if not already cached or precomputed.
A stateless function to perform segmentation. It's recommended to use SegmentationSession for new implementations.
This function will compute or retrieve a cached embedding for the image, then run segmentation.
- params: SegmentParams object.image: HTMLImageElement | HTMLCanvasElement
- : The image to segment.clicks: Array<{ x: number, y: number, clickType: 0 | 1 }>
- : Array of click objects. clickType: 1 for include, 0 for exclude.Promise
- Returns: A that resolves with an ImageData object representing the mask.
Clears all cached image embeddings.
Clears all active SegmentationSession states. Note: This does not call dispose() on individual sessions but rather clears the central store.
---
Important: miniSAM returns masks in RGBA ImageData format where the mask information is stored in the alpha channel, not the RGB channels. When processing the returned ImageData object from session.segment():
- Alpha channel (A) = 255: Foreground pixel (included in the mask)
- Alpha channel (A) = 0: Background pixel (excluded from the mask)
- RGB channels: Typically all 0 (black) and should be ignored for mask logic
This differs from traditional grayscale masks where the mask information is stored in the RGB channels. When extracting objects or applying masks to images, always check the alpha channel values rather than RGB values. For example:
`javascript`
const maskData = imageDataMask.data;
for (let i = 0; i < maskData.length; i += 4) {
const alpha = maskData[i + 3];
if (alpha === 255) {
console.log("Foreground pixel at:", Math.floor(i / 4));
}
}
- The library expects SAM-compatible ONNX models: an image encoder and a mask decoder.
- Image preprocessing scales the longest side of the input image to 1024px and pads it to a square tensor for the encoder.
- Click coordinates are automatically scaled to match the preprocessed image dimensions.
- The output mask is an ImageData object, typically 256x256, which can then be upscaled and drawn onto a canvas.
If you are working on the miniSAM library itself:
- Install dependencies: npm installnpm run build
- Build the library: dist
This command cleans the folder, runs webpack to bundle the library, and then uses npm pack to create a .tgz tarball in the miniSAM root directory (e.g., minisam-0.1.0.tgz`). This tarball can be installed locally by other projects.