apple-local-llm

Call Apple's on-device Foundation Models from JavaScript — no servers, no setup.

Works with Node.js, Electron, and VS Code extensions.

Requirements

- macOS 26+ (Tahoe)
- Apple Silicon (M Series)
- Apple Intelligence enabled in System Settings

Installation

``bash npm install apple-local-llm`

`Quick Start`

`$3`

`typescript import { createClient } from "apple-local-llm";

const client = createClient();

// Check compatibility first const compat = await client.compatibility.check(); if (!compat.compatible) { console.log("Not available:", compat.reasonCode); // Handle fallback to cloud API }

// Generate a response const result = await client.responses.create({ input: "What is the capital of France?", });

if (result.ok) { console.log(result.text); // "The capital of France is Paris." }`

`$3`

`typescript for await (const chunk of client.stream({ input: "Count from 1 to 5." })) { if ("delta" in chunk) { process.stdout.write(chunk.delta); } }`

`API Reference`

`$3`

Creates a new client instance.

`typescript const client = createClient({ model: "default", // Optional: model identifier (currently only "default") onLog: (msg) => console.log(msg), // Optional: debug logging idleTimeoutMs: 5 60 1000, // Optional: helper idle timeout (default: 5 min) });`

Defaults: - Helper auto-shuts down after 5 minutes of inactivity - Helper auto-restarts up to 3 times on crash (with exponential backoff) - Request timeout: 60 seconds (configurable viatimeoutMs)

You can also import and instantiate the class directly:`typescript import { AppleLocalLLMClient } from "apple-local-llm"; const client = new AppleLocalLLMClient(options);`

`$3`

Check if the local model is available. Always call this before making requests.

`typescript const result = await client.compatibility.check(); // { compatible: true } // or { compatible: false, reasonCode: "AI_DISABLED" }`

Reason codes: | Code | Description | |------|-------------| |NOT_DARWIN| Not running on macOS | |UNSUPPORTED_HARDWARE| Not Apple Silicon | |AI_DISABLED| Apple Intelligence not enabled | |MODEL_NOT_READY| Model still downloading | |SPAWN_FAILED| Helper binary failed to start | |HELPER_NOT_FOUND| Helper binary not found | |HELPER_UNHEALTHY| Helper process not responding correctly | |PROTOCOL_MISMATCH | Helper version incompatible with client |

`$3`

Get detailed model capabilities (calls the helper).

`typescript const caps = await client.capabilities.get(); // { available: true, model: "apple-on-device" } // or { available: false, reasonCode: "AI_DISABLED" }`

`$3`

Generate a response.

`typescript const result = await client.responses.create({ input: "Your prompt here", model: "default", // Optional: model identifier max_output_tokens: 500, // Optional: limit response tokens stream: false, // Optional signal: abortController.signal, // Optional: AbortSignal timeoutMs: 60000, // Optional: request timeout (ms) response_format: { // Optional: structured JSON output type: "json_schema", json_schema: { name: "Result", schema: { type: "object", properties: { ... } } } } });`

Structured Output Example:

`typescript const result = await client.responses.create({ input: "List 3 colors", response_format: { type: "json_schema", json_schema: { name: "Colors", schema: { type: "object", properties: { colors: { type: "array", items: { type: "string" } } } } } } }); const data = JSON.parse(result.text); // { colors: ["red", "blue", "green"] }`

> response_format is not supported with streaming.

Returns ResponseResulton success, or an error object:`typescript // Success: { ok: true, text: "...", request_id: "..." } // Error: { ok: false, error: { code: "...", detail: "..." } }`

Note: The return type is a discriminated union, not the exported ResponseResult interface.

Error codes: | Code | Description | |------|-------------| |UNAVAILABLE| Model not available (see reason codes above) | |TIMEOUT| Request timed out (default: 60s) | |CANCELLED| Request was cancelled via AbortSignal | |RATE_LIMITED| System rate limit exceeded | |GUARDRAIL| Content violated Apple's safety guidelines | |INTERNAL | Unexpected error |

`$3`

Async generator for streaming responses.

`typescript for await (const chunk of client.stream({ input: "..." })) { if ("delta" in chunk) { // Partial content console.log(chunk.delta); } else if ("done" in chunk) { // Final complete text console.log(chunk.text); } }`

`$3`

Cancel an in-progress request.

`typescript const result = await client.responses.cancel("req_123"); // { ok: true } or { ok: false, error: { code: "NOT_RUNNING", detail: "..." } }`

`$3`

Gracefully shut down the helper process.

`typescript await client.shutdown();`

`TypeScript Types`

All types are exported:

`typescript import type { ClientOptions, ReasonCode, CompatibilityResult, CapabilitiesResult, ResponsesCreateParams, ResponseResult, JSONSchema, ResponseFormat, } from "apple-local-llm";`

`CLI Usage`

The fm-proxy binary can also be used directly from the command line:

`bash

`Simple prompt`


fm-proxy "What is the capital of France?"
Streaming output

fm-proxy --stream "Tell me a story"
fm-proxy -s "Tell me a story"
Limit output tokens

fm-proxy --max-tokens=50 "Count to 100"
Start HTTP server

fm-proxy --serve
fm-proxy --serve --port=3000
Other options

fm-proxy --help      # Show usage (or -h)
fm-proxy --version   # Show version (or -v)
fm-proxy --stdio     # stdio mode (used internally by npm package)

$3

Run fm-proxy --serve to start a local HTTP server:

`bash fm-proxy --serve --port=8080`

Endpoints:

| Endpoint | Method | Description | |----------|--------|-------------| |/health| GET | Health check and availability status | |/generate | POST | Text generation (supports streaming) |

Options:

| Option | Description | |--------|-------------| |--port=| Set server port (default: 8080) | |--auth-token= | Require Bearer token for /generate |

You can also set AUTH_TOKEN environment variable instead of --auth-token.

CORS: All endpoints support CORS with Access-Control-Allow-Origin: *.

Examples:

`bash

`Health check`


curl http://127.0.0.1:8080/health
Response: {"status":"ok","model":"apple-on-device","available":true}
Simple generation

curl -X POST http://127.0.0.1:8080/generate \
  -H "Content-Type: application/json" \
  -d '{"input": "What is 2+2?"}'
Response: {"text":"2+2 equals 4."}
With max_output_tokens

curl -X POST http://127.0.0.1:8080/generate \
  -H "Content-Type: application/json" \
  -d '{"input": "Count to 100", "max_output_tokens": 50}'
With structured output (response_format)

curl -X POST http://127.0.0.1:8080/generate \
  -H "Content-Type: application/json" \
  -d '{"input": "List 3 colors", "response_format": {"type": "json_schema", "json_schema": {"name": "Colors", "schema": {"type": "object", "properties": {"colors": {"type": "array", "items": {"type": "string"}}}}}}}'
With authentication

curl -X POST http://127.0.0.1:8080/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer " \
  -d '{"input": "Hello"}'


#### Streaming (SSE)

Add "stream": true to get Server-Sent Events with OpenAI-compatible chunks:

`bash curl -N -X POST http://127.0.0.1:8080/generate \ -H "Content-Type: application/json" \ -d '{"input": "Write a haiku", "stream": true}'`

Response:`data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"}}]} data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"..."}}]} data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop"}]} data: [DONE]`

`How It Works`

This package bundles a small native helper (fm-proxy) that communicates with Apple's Foundation Models framework over stdio. The helper is spawned on first request and stays alive to keep the model warm.

- No localhost server — npm package uses stdio, not HTTP - No user setup — justnpm install- Fails gracefully — checkcompatibility.check() and fall back to cloud

`Runtime Support`

JS API (createClient()):| Environment | Supported | |-------------|-----------| | Node.js | ✅ | | Electron (main process) | ✅ | | VS Code extensions | ✅ | | Electron (renderer) | ❌ Nochild_process| | Browser | ❌ |

HTTP Server (fm-proxy --serve`):
| Environment | Supported |
|-------------|-----------|
| Any HTTP client | ✅ |
| Browser (fetch) | ✅ |
| Electron (renderer) | ✅ |

License

MIT

apple-local-llm

Call Apple's on-device Foundation Models from JavaScript — no servers, no setup.

Works with Node.js, Electron, and VS Code extensions.

Requirements

- macOS 26+ (Tahoe)
- Apple Silicon (M Series)
- Apple Intelligence enabled in System Settings

Installation

``bash npm install apple-local-llm`

`Quick Start`

`$3`

`typescript import { createClient } from "apple-local-llm";

const client = createClient();

// Check compatibility first const compat = await client.compatibility.check(); if (!compat.compatible) { console.log("Not available:", compat.reasonCode); // Handle fallback to cloud API }

// Generate a response const result = await client.responses.create({ input: "What is the capital of France?", });

if (result.ok) { console.log(result.text); // "The capital of France is Paris." }`

`$3`

`typescript for await (const chunk of client.stream({ input: "Count from 1 to 5." })) { if ("delta" in chunk) { process.stdout.write(chunk.delta); } }`

`API Reference`

`$3`

Creates a new client instance.

Defaults: - Helper auto-shuts down after 5 minutes of inactivity - Helper auto-restarts up to 3 times on crash (with exponential backoff) - Request timeout: 60 seconds (configurable viatimeoutMs)

You can also import and instantiate the class directly:`typescript import { AppleLocalLLMClient } from "apple-local-llm"; const client = new AppleLocalLLMClient(options);`

`$3`

Check if the local model is available. Always call this before making requests.

`typescript const result = await client.compatibility.check(); // { compatible: true } // or { compatible: false, reasonCode: "AI_DISABLED" }`

`$3`

Get detailed model capabilities (calls the helper).

`typescript const caps = await client.capabilities.get(); // { available: true, model: "apple-on-device" } // or { available: false, reasonCode: "AI_DISABLED" }`

`$3`

Generate a response.

Structured Output Example:

> response_format is not supported with streaming.

Returns ResponseResulton success, or an error object:`typescript // Success: { ok: true, text: "...", request_id: "..." } // Error: { ok: false, error: { code: "...", detail: "..." } }`

Note: The return type is a discriminated union, not the exported ResponseResult interface.

`$3`

Async generator for streaming responses.

`$3`

Cancel an in-progress request.

`typescript const result = await client.responses.cancel("req_123"); // { ok: true } or { ok: false, error: { code: "NOT_RUNNING", detail: "..." } }`

`$3`

Gracefully shut down the helper process.

`typescript await client.shutdown();`

`TypeScript Types`

All types are exported:

`typescript import type { ClientOptions, ReasonCode, CompatibilityResult, CapabilitiesResult, ResponsesCreateParams, ResponseResult, JSONSchema, ResponseFormat, } from "apple-local-llm";`

`CLI Usage`

The fm-proxy binary can also be used directly from the command line:

`bash

`Simple prompt`


fm-proxy "What is the capital of France?"
Streaming output

fm-proxy --stream "Tell me a story"
fm-proxy -s "Tell me a story"
Limit output tokens

fm-proxy --max-tokens=50 "Count to 100"
Start HTTP server

fm-proxy --serve
fm-proxy --serve --port=3000
Other options

fm-proxy --help      # Show usage (or -h)
fm-proxy --version   # Show version (or -v)
fm-proxy --stdio     # stdio mode (used internally by npm package)

$3

Run fm-proxy --serve to start a local HTTP server:

`bash fm-proxy --serve --port=8080`

Endpoints:

| Endpoint | Method | Description | |----------|--------|-------------| |/health| GET | Health check and availability status | |/generate | POST | Text generation (supports streaming) |

Options:

| Option | Description | |--------|-------------| |--port=| Set server port (default: 8080) | |--auth-token= | Require Bearer token for /generate |

You can also set AUTH_TOKEN environment variable instead of --auth-token.

CORS: All endpoints support CORS with Access-Control-Allow-Origin: *.

Examples:

`bash

`Health check`


curl http://127.0.0.1:8080/health
Response: {"status":"ok","model":"apple-on-device","available":true}
Simple generation

curl -X POST http://127.0.0.1:8080/generate \
  -H "Content-Type: application/json" \
  -d '{"input": "What is 2+2?"}'
Response: {"text":"2+2 equals 4."}
With max_output_tokens

curl -X POST http://127.0.0.1:8080/generate \
  -H "Content-Type: application/json" \
  -d '{"input": "Count to 100", "max_output_tokens": 50}'
With structured output (response_format)

curl -X POST http://127.0.0.1:8080/generate \
  -H "Content-Type: application/json" \
  -d '{"input": "List 3 colors", "response_format": {"type": "json_schema", "json_schema": {"name": "Colors", "schema": {"type": "object", "properties": {"colors": {"type": "array", "items": {"type": "string"}}}}}}}'
With authentication

curl -X POST http://127.0.0.1:8080/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer " \
  -d '{"input": "Hello"}'


#### Streaming (SSE)

Add "stream": true to get Server-Sent Events with OpenAI-compatible chunks:

`bash curl -N -X POST http://127.0.0.1:8080/generate \ -H "Content-Type: application/json" \ -d '{"input": "Write a haiku", "stream": true}'`

`How It Works`

- No localhost server — npm package uses stdio, not HTTP - No user setup — justnpm install- Fails gracefully — checkcompatibility.check() and fall back to cloud

`Runtime Support`

HTTP Server (fm-proxy --serve`):
| Environment | Supported |
|-------------|-----------|
| Any HTTP client | ✅ |
| Browser (fetch) | ✅ |
| Electron (renderer) | ✅ |

License

MIT