Skip to content

Vercel AI Provider

The webllm-ai-provider package lets you use Vercel AI SDK with intelligent model selection instead of hardcoding model names. This means:

  • Zero server costs - AI runs on user’s machines
  • Complete privacy - Data never leaves the browser
  • No model names needed - Describe what you need, WebLLM picks the best model
  • User choice - Let users configure their preferred providers

Unlike traditional providers where you specify exact models like 'claude-3-5-sonnet' or 'gpt-4', WebLLM uses task types and hints to automatically select the best available model.

You describe what you want to do (task + hints), and WebLLM intelligently routes to the best provider based on user configuration.

Terminal window
npm install webllm

Compare approaches for browser-native AI execution:

Before (Traditional API with Costs):

import Anthropic from '@anthropic-ai/sdk';
// Costs you money, requires API key management
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_KEY });
const result = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
messages: [{ role: 'user', content: 'Explain TypeScript' }]
});

After (WebLLM - Zero Cost, Intelligent Selection):

import { generateText } from 'webllm';
// Zero cost, intelligent model selection
const result = await generateText({
task: 'qa',
prompt: 'Explain TypeScript',
hints: { quality: 'high' }
});
Your App (Vercel AI SDK)
webllm({ task: 'qa', hints: { quality: 'high' } })
WebLLM Extension (intelligent selection)
Selected Provider:
- Anthropic (user's API key)
- OpenAI (user's API key)
- Local models (WebGPU)
Response back to your app
  1. Developer specifies task type and hints
  2. WebLLM receives the request in the user’s browser
  3. Analyzes requirements - task type, speed/quality preferences, capabilities needed
  4. Scores available models - based on task compatibility and user configuration
  5. Selects best match - highest-scoring available model
  6. Executes request via selected provider
  7. Returns response with selection metadata

Describe what you want to do:

  • general - General conversation, Q&A
  • summarization - Text summarization
  • translation - Language translation
  • qa - Question answering
  • coding - Code generation/assistance
  • creative - Creative writing
  • extraction - Structured data extraction

Guide selection with speed and quality preferences:

import { generateText } from 'webllm';
const result = await generateText({
task: 'coding',
hints: {
speed: 'balanced',
quality: 'high'
},
prompt: 'Write a React component'
});

Speed Options:

  • fastest - Smallest, fastest models
  • fast - Quick with decent quality
  • balanced - Balance of speed and quality
  • quality - Prioritize quality over speed

Quality Options:

  • draft - Good enough for prototyping
  • standard - Production quality
  • high - High quality results
  • best - Best available, regardless of speed

Specify required capabilities:

const result = await generateText({
task: 'coding',
hints: {
capabilities: {
codeGeneration: true,
reasoning: true,
longContext: true
}
},
prompt: 'Refactor this large codebase...'
});

Available Capabilities:

  • multilingual - Good multilingual support
  • codeGeneration - Strong coding abilities
  • reasoning - Chain-of-thought reasoning
  • longContext - Needs 32k+ context window
  • math - Mathematical problem solving
  • functionCalling - Native function calling support

Limit model size for mobile or resource-constrained devices:

const result = await generateText({
model: webllm({
task: 'qa',
hints: {
maxModelSize: 2, // Max 2GB model
maxMemory: 4 // Max 4GB RAM
}
}),
prompt: 'Quick question...'
});

Prioritize speed for quick customer interactions:

import { generateText } from 'webllm';
const result = await generateText({
task: 'general',
hints: { speed: 'fastest' },
prompt: 'How do I reset my password?'
});

Prioritize quality for deep technical analysis:

import { generateText } from 'webllm';
const result = await generateText({
task: 'coding',
hints: {
quality: 'best',
capabilities: {
reasoning: true,
codeGeneration: true
}
},
prompt: 'Analyze this microservices architecture...'
});

Balance capability and creativity:

import { generateText } from 'webllm';
const result = await generateText({
task: 'creative',
hints: { quality: 'high' },
prompt: 'Write a short story about AI...',
temperature: 0.9
});
import { webllm } from 'webllm-ai-provider';
import { streamText } from 'ai';
const { textStream } = await streamText({
model: webllm({
task: 'creative',
hints: { quality: 'best' }
}),
prompt: 'Write a story about robots...'
});
for await (const chunk of textStream) {
process.stdout.write(chunk);
}
'use client';
import { useState } from 'react';
import { generateText } from 'ai';
import { webllm } from 'webllm-ai-provider';
export default function ChatPage() {
const [response, setResponse] = useState('');
const [isLoading, setIsLoading] = useState(false);
const handleGenerate = async () => {
setIsLoading(true);
const result = await generateText({
model: webllm({ task: 'qa' }),
prompt: 'Explain quantum computing simply'
});
setResponse(result.text);
setIsLoading(false);
};
return (
<div>
<button onClick={handleGenerate} disabled={isLoading}>
{isLoading ? 'Generating...' : 'Generate'}
</button>
{response && <p>{response}</p>}
</div>
);
}
'use client';
import { useChat } from 'ai/react';
export default function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat({
api: '/api/chat',
});
return (
<div>
{messages.map(m => (
<div key={m.id}>
{m.role}: {m.content}
</div>
))}
<form onSubmit={handleSubmit}>
<input value={input} onChange={handleInputChange} />
<button type="submit">Send</button>
</form>
</div>
);
}
app/api/chat/route.ts
import { streamText } from 'ai';
import { webllm } from 'webllm-ai-provider';
export async function POST(req: Request) {
const { messages } = await req.json();
const result = await streamText({
model: webllm({
task: 'general',
hints: { speed: 'balanced', quality: 'high' }
}),
messages,
});
return result.toDataStreamResponse();
}

Before:

import { anthropic } from '@ai-sdk/anthropic';
const result = await generateText({
model: anthropic('claude-3-5-sonnet-20241022'),
prompt: 'Hello!',
temperature: 0.7,
maxTokens: 1000
});

After:

import { webllm } from 'webllm-ai-provider';
const result = await generateText({
model: webllm({ task: 'general', hints: { quality: 'high' } }),
prompt: 'Hello!',
temperature: 0.7,
maxTokens: 1000
});

Before:

import { openai } from '@ai-sdk/openai';
const result = await generateText({
model: openai('gpt-4'),
prompt: 'Write code'
});

After:

import { webllm } from 'webllm-ai-provider';
const result = await generateText({
model: webllm({ task: 'coding', hints: { quality: 'best' } }),
prompt: 'Write code'
});

How WebLLM selects models based on task and hints:

TaskHintsAvailable ModelsSelected
qaquality: 'high'Claude Haiku, Sonnet, OpusClaude Sonnet
codingspeed: 'fastest'GPT-4, GPT-3.5, Llama 3.2 1BLlama 3.2 1B
creativequality: 'best'All providersClaude Opus or GPT-4
summarizationspeed: 'fast'All providersGPT-3.5 or Claude Haiku

Creates a WebLLM language model instance with intelligent selection.

Parameters:

  • settings (optional): Configuration object
    • task (string): Task type
    • hints (object): Model selection hints
      • speed (string): Speed preference
      • quality (string): Quality preference
      • maxModelSize (number): Max model size in GB
      • maxMemory (number): Max memory in GB
      • capabilities (object): Required capabilities
      • modelId (string): Force specific model (expert override)
      • excludeModels (string[]): Exclude specific models

Returns: LanguageModelV1 instance compatible with Vercel AI SDK

import { isAvailable } from 'webllm';
if (!isAvailable()) {
console.error('WebLLM extension not installed');
// Show installation prompt
}

This provider only works in browser environments. For server-side AI, use traditional providers:

// Server-side: use direct API provider
import { anthropic } from '@ai-sdk/anthropic';
export async function POST(req: Request) {
const result = await generateText({
model: anthropic('claude-3-5-sonnet-20241022'),
prompt: 'Server-side generation'
});
return Response.json(result);
}
  1. Always check availability before using WebLLM
  2. Use task types to guide intelligent selection
  3. Provide hints for better model matching
  4. Let users know they’re using browser-native AI
  5. Handle errors gracefully when providers are unavailable