Client SDK API

Installation

npm install webllm
# or
yarn add webllm
# or
pnpm add webllm

Basic Usage

import { generateText } from 'webllm';

const result = await generateText({
  prompt: 'Explain quantum computing in simple terms',
});

console.log(result.text);

Core Functions

`generateText(options)`

Generate text using the WebLLM extension with Vercel AI SDK-compatible interface.

Parameters

interface GenerateTextOptions {
  // Intelligent model selection (recommended)
  task?:
    | 'general'
    | 'summarization'
    | 'translation'
    | 'qa'
    | 'coding'
    | 'creative'
    | 'extraction';
  hints?: ModelHints;

  // Expert overrides (optional)
  model?: string; // Specific model name (e.g., 'claude-sonnet-4', 'gpt-4o')
  provider?: string; // Specific provider ('anthropic', 'openai', etc.)

  // Content
  system?: string; // System prompt
  prompt?: string; // User prompt (alternative to messages)
  messages?: Message[]; // Conversation messages

  // Generation parameters
  temperature?: number; // Randomness (0.0-1.0, default: 0.7)
  maxTokens?: number; // Maximum tokens to generate
  topP?: number; // Nucleus sampling (0.0-1.0)
  topK?: number; // Top-k sampling
  frequencyPenalty?: number; // Reduce repetition (0.0-2.0)
  presencePenalty?: number; // Encourage new topics (0.0-2.0)
  stopSequences?: string[]; // Stop generation at these strings
}

interface ModelHints {
  speed?: 'fastest' | 'fast' | 'balanced' | 'quality';
  quality?: 'draft' | 'standard' | 'high' | 'best';
  maxModelSize?: number; // Max model size in GB
  maxMemory?: number; // Max memory in GB
  capabilities?: {
    multilingual?: boolean;
    codeGeneration?: boolean;
    reasoning?: boolean;
    longContext?: boolean;
    math?: boolean;
    functionCalling?: boolean;
  };
  modelId?: string; // Force specific model
  excludeModels?: string[]; // Exclude models from selection
}

interface Message {
  role: 'user' | 'assistant' | 'system';
  content: string;
}

Returns

CancellablePromise<GenerateTextResult>;

// CancellablePromise extends Promise with cancellation support
interface CancellablePromise<T> extends Promise<T> {
  cancel(): void; // Cancel the in-flight request
  readonly requestId: string; // Unique identifier for this request
}

interface GenerateTextResult {
  text: string; // Generated text
  finishReason: string; // Reason for completion ('stop', 'length', etc.)
  usage: {
    promptTokens: number; // Tokens in the prompt
    completionTokens: number; // Tokens in the completion
    totalTokens: number; // Total tokens used
  };
  model?: string; // Model that was used
  provider?: string; // Provider that was used
  requestId?: string; // Unique request ID
  timestamp?: number; // Unix timestamp
}

Examples

Simple prompt:

const result = await generateText({
  prompt: 'Write a haiku about coding',
});

With system prompt:

const result = await generateText({
  system: 'You are a helpful translator.',
  prompt: 'Translate to Spanish: Hello, how are you?',
});

Multi-turn conversation:

const result = await generateText({
  messages: [
    { role: 'user', content: 'What is the capital of France?' },
    { role: 'assistant', content: 'The capital of France is Paris.' },
    { role: 'user', content: 'What is its population?' },
  ],
});

With request cancellation:

// All requests return CancellablePromise - cancel anytime before resolution
const request = generateText({ prompt: 'Write a long story' });

// Cancel after 5 seconds if still running
setTimeout(() => request.cancel(), 5000);

try {
  const result = await request;
  console.log(result.text);
} catch (error) {
  if (error.name === 'AbortError') {
    console.log('Request was cancelled');
  }
}

With generation parameters:

const result = await generateText({
  prompt: 'Write a creative story',
  temperature: 0.9,
  maxTokens: 500,
  topP: 0.95,
  frequencyPenalty: 0.5,
  presencePenalty: 0.5,
});

Task-based intelligent routing:

const result = await generateText({
  task: 'coding',
  hints: {
    quality: 'best',
    capabilities: {
      reasoning: true,
      codeGeneration: true,
    },
  },
  prompt: 'Write a React component for a todo list',
});

Force specific provider:

const result = await generateText({
  provider: 'anthropic',
  model: 'claude-sonnet-4',
  prompt: 'Explain quantum computing',
});

`promptInstall()`

Shows an interactive modal that guides users through extension installation.

Parameters

None

Returns

Promise<void>;

Throws

Error if installation cancelled by user
Error if browser not supported

Example

import { promptInstall, generateText } from 'webllm';

try {
  await promptInstall();
  // Extension is now ready
  const result = await generateText({ prompt: 'Hello' });
} catch (error) {
  console.error('Installation failed:', error.message);
}

`webLlmReady(timeout?)`

Waits for the WebLLM extension to become available and ready.

Parameters

timeout (number, optional) - Maximum time to wait in milliseconds (default: 30000)

Returns

Promise<void>;

Throws

Error if timeout expires
Error if browser not supported

Example

import { webLlmReady, generateText } from 'webllm';

try {
  await webLlmReady(10000); // Wait up to 10 seconds
  const result = await generateText({ prompt: 'Hello' });
} catch (error) {
  console.error('Extension not available:', error.message);
}

`isAvailable()`

Synchronously checks if the WebLLM extension is currently installed and available.

Parameters

None

Returns

boolean;

Example

import { isAvailable } from 'webllm';

if (isAvailable()) {
  console.log('Extension is installed');
} else {
  console.log('Extension not found');
}

`getBrowserInfo()`

Returns information about browser compatibility with WebLLM.

Parameters

None

Returns

BrowserInfo;

interface BrowserInfo {
  isSupported: boolean; // Whether WebLLM is supported on this browser
  browserName: string; // Browser name (e.g., 'Chrome', 'Edge', 'Firefox')
  reason?: string; // Reason if not supported
  installUrl?: string; // URL to install the extension
}

Example

import { getBrowserInfo } from 'webllm';

const info = getBrowserInfo();

console.log('Browser:', info.browserName);
console.log('Supported:', info.isSupported);

if (!info.isSupported) {
  console.log('Reason:', info.reason);
} else if (info.installUrl) {
  console.log('Install from:', info.installUrl);
}

OpenAI SDK Compatible API

`webllm.chat.completions.create(options)`

OpenAI-compatible chat completions interface.

Parameters

interface ChatCompletionOptions {
  model?: string; // Model name (optional)
  messages: ChatCompletionMessage[]; // Conversation messages
  temperature?: number; // Randomness (0.0-2.0)
  max_tokens?: number; // Maximum tokens to generate
  top_p?: number; // Nucleus sampling (0.0-1.0)
  frequency_penalty?: number; // Reduce repetition (0.0-2.0)
  presence_penalty?: number; // Encourage new topics (0.0-2.0)
  stop?: string | string[]; // Stop sequences
  stream?: boolean; // Enable streaming (not yet supported)
}

interface ChatCompletionMessage {
  role: 'user' | 'assistant' | 'system';
  content: string;
  name?: string; // Optional message author name
}

Returns

Promise<ChatCompletionResponse>;

interface ChatCompletionResponse {
  id: string;
  object: 'chat.completion';
  created: number; // Unix timestamp
  model: string;
  choices: Array<{
    index: number;
    message: ChatCompletionMessage;
    finish_reason: string;
  }>;
  usage: {
    prompt_tokens: number;
    completion_tokens: number;
    total_tokens: number;
  };
}

Examples

Basic chat:

import { webllm } from 'webllm';

const completion = await webllm.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello!' }],
});

console.log(completion.choices[0].message.content);

With system message:

const completion = await webllm.chat.completions.create({
  model: 'claude-sonnet-4',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain quantum computing' },
  ],
  temperature: 0.7,
  max_tokens: 500,
});

Drop-in OpenAI replacement:

// Import as openai for seamless replacement
import { webllm as openai } from 'webllm';

const completion = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello!' }],
});

WebLLMClient Class

Constructor

import { WebLLMClient } from 'webllm';

const client = new WebLLMClient();

The constructor automatically:

Initializes readiness promise tracking
Listens for webllm:ready events
Checks if extension is already installed

Instance Methods

All functions above are available as instance methods:

const client = new WebLLMClient();

// Installation & setup
await client.promptInstall();
await client.webLlmReady(10000);
const available = client.isAvailable();
const browserInfo = client.getBrowserInfo();

// Text generation
const result = await client.generateText({ prompt: 'Hello' });

// OpenAI-compatible
const completion = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
});

TypeScript Support

Full TypeScript definitions are included:

import type {
  BrowserInfo,
  CancellablePromise,
  GenerateTextOptions,
  GenerateTextResult,
  ChatCompletionOptions,
  ChatCompletionResponse,
  Message,
  ModelHints,
  Usage,
} from 'webllm';

Error Handling

Extension Not Available

try {
  await generateText({ prompt: 'Hello' });
} catch (error) {
  if (error.message.includes('extension not available')) {
    await promptInstall();
  }
}

Browser Not Supported

try {
  await promptInstall();
} catch (error) {
  if (error.message.includes('not supported')) {
    const info = getBrowserInfo();
    console.log(info.reason);
  }
}

Installation Cancelled

try {
  await promptInstall();
} catch (error) {
  if (error.message.includes('cancelled')) {
    console.log('User cancelled installation');
  }
}

Request Cancelled

const request = generateText({ prompt: 'Hello' });

// Cancel the request
request.cancel();

try {
  await request;
} catch (error) {
  if (error.name === 'AbortError') {
    console.log('Request was cancelled');
  }
}

Best Practices

Check browser compatibility first - Use getBrowserInfo() before prompting for installation
Use promptInstall() for required features - Best UX for features that need AI
Use webLlmReady() for optional features - Progressive enhancement with timeout
Always provide fallbacks - App should work without AI features
Handle installation cancellation - User might click cancel
Cache readiness state - Don’t prompt repeatedly in same session
Use task-based routing - Let WebLLM select the best model for your use case

SDK Getting Started - Integration tutorial
Vercel AI Provider - Using with Vercel AI SDK
Best Practices - Production tips
Gateway Service - Server-side token generation

Client SDK API

Installation

Basic Usage

Core Functions

generateText(options)

Parameters

Returns

Examples

promptInstall()

Parameters

Returns

Throws

Example

webLlmReady(timeout?)

Parameters

Returns

Throws

Example

isAvailable()

Parameters

Returns

Example

getBrowserInfo()

Parameters

Returns

Example

OpenAI SDK Compatible API

webllm.chat.completions.create(options)

Parameters

Returns

Examples

WebLLMClient Class

Constructor

Instance Methods

TypeScript Support

Error Handling

Extension Not Available

Browser Not Supported

Installation Cancelled

Request Cancelled

Best Practices

Related Documentation

`generateText(options)`

`promptInstall()`

`webLlmReady(timeout?)`

`isAvailable()`

`getBrowserInfo()`

`webllm.chat.completions.create(options)`