Skip to content

Client SDK API

Terminal window
npm install webllm
# or
yarn add webllm
# or
pnpm add webllm
import { generateText } from 'webllm';
const result = await generateText({
prompt: 'Explain quantum computing in simple terms',
});
console.log(result.text);

Generate text using the WebLLM extension with Vercel AI SDK-compatible interface.

interface GenerateTextOptions {
// Intelligent model selection (recommended)
task?:
| 'general'
| 'summarization'
| 'translation'
| 'qa'
| 'coding'
| 'creative'
| 'extraction';
hints?: ModelHints;
// Expert overrides (optional)
model?: string; // Specific model name (e.g., 'claude-sonnet-4', 'gpt-4o')
provider?: string; // Specific provider ('anthropic', 'openai', etc.)
// Content
system?: string; // System prompt
prompt?: string; // User prompt (alternative to messages)
messages?: Message[]; // Conversation messages
// Generation parameters
temperature?: number; // Randomness (0.0-1.0, default: 0.7)
maxTokens?: number; // Maximum tokens to generate
topP?: number; // Nucleus sampling (0.0-1.0)
topK?: number; // Top-k sampling
frequencyPenalty?: number; // Reduce repetition (0.0-2.0)
presencePenalty?: number; // Encourage new topics (0.0-2.0)
stopSequences?: string[]; // Stop generation at these strings
}
interface ModelHints {
speed?: 'fastest' | 'fast' | 'balanced' | 'quality';
quality?: 'draft' | 'standard' | 'high' | 'best';
maxModelSize?: number; // Max model size in GB
maxMemory?: number; // Max memory in GB
capabilities?: {
multilingual?: boolean;
codeGeneration?: boolean;
reasoning?: boolean;
longContext?: boolean;
math?: boolean;
functionCalling?: boolean;
};
modelId?: string; // Force specific model
excludeModels?: string[]; // Exclude models from selection
}
interface Message {
role: 'user' | 'assistant' | 'system';
content: string;
}
CancellablePromise<GenerateTextResult>;
// CancellablePromise extends Promise with cancellation support
interface CancellablePromise<T> extends Promise<T> {
cancel(): void; // Cancel the in-flight request
readonly requestId: string; // Unique identifier for this request
}
interface GenerateTextResult {
text: string; // Generated text
finishReason: string; // Reason for completion ('stop', 'length', etc.)
usage: {
promptTokens: number; // Tokens in the prompt
completionTokens: number; // Tokens in the completion
totalTokens: number; // Total tokens used
};
model?: string; // Model that was used
provider?: string; // Provider that was used
requestId?: string; // Unique request ID
timestamp?: number; // Unix timestamp
}

Simple prompt:

const result = await generateText({
prompt: 'Write a haiku about coding',
});

With system prompt:

const result = await generateText({
system: 'You are a helpful translator.',
prompt: 'Translate to Spanish: Hello, how are you?',
});

Multi-turn conversation:

const result = await generateText({
messages: [
{ role: 'user', content: 'What is the capital of France?' },
{ role: 'assistant', content: 'The capital of France is Paris.' },
{ role: 'user', content: 'What is its population?' },
],
});

With request cancellation:

// All requests return CancellablePromise - cancel anytime before resolution
const request = generateText({ prompt: 'Write a long story' });
// Cancel after 5 seconds if still running
setTimeout(() => request.cancel(), 5000);
try {
const result = await request;
console.log(result.text);
} catch (error) {
if (error.name === 'AbortError') {
console.log('Request was cancelled');
}
}

With generation parameters:

const result = await generateText({
prompt: 'Write a creative story',
temperature: 0.9,
maxTokens: 500,
topP: 0.95,
frequencyPenalty: 0.5,
presencePenalty: 0.5,
});

Task-based intelligent routing:

const result = await generateText({
task: 'coding',
hints: {
quality: 'best',
capabilities: {
reasoning: true,
codeGeneration: true,
},
},
prompt: 'Write a React component for a todo list',
});

Force specific provider:

const result = await generateText({
provider: 'anthropic',
model: 'claude-sonnet-4',
prompt: 'Explain quantum computing',
});

Shows an interactive modal that guides users through extension installation.

None

Promise<void>;
  • Error if installation cancelled by user
  • Error if browser not supported
import { promptInstall, generateText } from 'webllm';
try {
await promptInstall();
// Extension is now ready
const result = await generateText({ prompt: 'Hello' });
} catch (error) {
console.error('Installation failed:', error.message);
}

Waits for the WebLLM extension to become available and ready.

  • timeout (number, optional) - Maximum time to wait in milliseconds (default: 30000)
Promise<void>;
  • Error if timeout expires
  • Error if browser not supported
import { webLlmReady, generateText } from 'webllm';
try {
await webLlmReady(10000); // Wait up to 10 seconds
const result = await generateText({ prompt: 'Hello' });
} catch (error) {
console.error('Extension not available:', error.message);
}

Synchronously checks if the WebLLM extension is currently installed and available.

None

boolean;
import { isAvailable } from 'webllm';
if (isAvailable()) {
console.log('Extension is installed');
} else {
console.log('Extension not found');
}

Returns information about browser compatibility with WebLLM.

None

BrowserInfo;
interface BrowserInfo {
isSupported: boolean; // Whether WebLLM is supported on this browser
browserName: string; // Browser name (e.g., 'Chrome', 'Edge', 'Firefox')
reason?: string; // Reason if not supported
installUrl?: string; // URL to install the extension
}
import { getBrowserInfo } from 'webllm';
const info = getBrowserInfo();
console.log('Browser:', info.browserName);
console.log('Supported:', info.isSupported);
if (!info.isSupported) {
console.log('Reason:', info.reason);
} else if (info.installUrl) {
console.log('Install from:', info.installUrl);
}

OpenAI-compatible chat completions interface.

interface ChatCompletionOptions {
model?: string; // Model name (optional)
messages: ChatCompletionMessage[]; // Conversation messages
temperature?: number; // Randomness (0.0-2.0)
max_tokens?: number; // Maximum tokens to generate
top_p?: number; // Nucleus sampling (0.0-1.0)
frequency_penalty?: number; // Reduce repetition (0.0-2.0)
presence_penalty?: number; // Encourage new topics (0.0-2.0)
stop?: string | string[]; // Stop sequences
stream?: boolean; // Enable streaming (not yet supported)
}
interface ChatCompletionMessage {
role: 'user' | 'assistant' | 'system';
content: string;
name?: string; // Optional message author name
}
Promise<ChatCompletionResponse>;
interface ChatCompletionResponse {
id: string;
object: 'chat.completion';
created: number; // Unix timestamp
model: string;
choices: Array<{
index: number;
message: ChatCompletionMessage;
finish_reason: string;
}>;
usage: {
prompt_tokens: number;
completion_tokens: number;
total_tokens: number;
};
}

Basic chat:

import { webllm } from 'webllm';
const completion = await webllm.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(completion.choices[0].message.content);

With system message:

const completion = await webllm.chat.completions.create({
model: 'claude-sonnet-4',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Explain quantum computing' },
],
temperature: 0.7,
max_tokens: 500,
});

Drop-in OpenAI replacement:

// Import as openai for seamless replacement
import { webllm as openai } from 'webllm';
const completion = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello!' }],
});

import { WebLLMClient } from 'webllm';
const client = new WebLLMClient();

The constructor automatically:

  • Initializes readiness promise tracking
  • Listens for webllm:ready events
  • Checks if extension is already installed

All functions above are available as instance methods:

const client = new WebLLMClient();
// Installation & setup
await client.promptInstall();
await client.webLlmReady(10000);
const available = client.isAvailable();
const browserInfo = client.getBrowserInfo();
// Text generation
const result = await client.generateText({ prompt: 'Hello' });
// OpenAI-compatible
const completion = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
});

Full TypeScript definitions are included:

import type {
BrowserInfo,
CancellablePromise,
GenerateTextOptions,
GenerateTextResult,
ChatCompletionOptions,
ChatCompletionResponse,
Message,
ModelHints,
Usage,
} from 'webllm';

try {
await generateText({ prompt: 'Hello' });
} catch (error) {
if (error.message.includes('extension not available')) {
await promptInstall();
}
}
try {
await promptInstall();
} catch (error) {
if (error.message.includes('not supported')) {
const info = getBrowserInfo();
console.log(info.reason);
}
}
try {
await promptInstall();
} catch (error) {
if (error.message.includes('cancelled')) {
console.log('User cancelled installation');
}
}
const request = generateText({ prompt: 'Hello' });
// Cancel the request
request.cancel();
try {
await request;
} catch (error) {
if (error.name === 'AbortError') {
console.log('Request was cancelled');
}
}

  1. Check browser compatibility first - Use getBrowserInfo() before prompting for installation
  2. Use promptInstall() for required features - Best UX for features that need AI
  3. Use webLlmReady() for optional features - Progressive enhancement with timeout
  4. Always provide fallbacks - App should work without AI features
  5. Handle installation cancellation - User might click cancel
  6. Cache readiness state - Don’t prompt repeatedly in same session
  7. Use task-based routing - Let WebLLM select the best model for your use case