Vercel AI Provider
Overview
Section titled “Overview”The webllm-ai-provider package lets you use Vercel AI SDK with intelligent model selection instead of hardcoding model names. This means:
- Zero server costs - AI runs on user’s machines
- Complete privacy - Data never leaves the browser
- No model names needed - Describe what you need, WebLLM picks the best model
- User choice - Let users configure their preferred providers
Key Concept: Task-Based Selection
Section titled “Key Concept: Task-Based Selection”Unlike traditional providers where you specify exact models like 'claude-3-5-sonnet' or 'gpt-4', WebLLM uses task types and hints to automatically select the best available model.
You describe what you want to do (task + hints), and WebLLM intelligently routes to the best provider based on user configuration.
Quick Start
Section titled “Quick Start”Installation
Section titled “Installation”npm install webllmnpm install webllm-ai-provider aiBasic Usage
Section titled “Basic Usage”Compare approaches for browser-native AI execution:
Before (Traditional API with Costs):
import Anthropic from '@anthropic-ai/sdk';
// Costs you money, requires API key managementconst anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_KEY });const result = await anthropic.messages.create({ model: 'claude-3-5-sonnet-20241022', messages: [{ role: 'user', content: 'Explain TypeScript' }]});After (WebLLM - Zero Cost, Intelligent Selection):
import { generateText } from 'webllm';
// Zero cost, intelligent model selectionconst result = await generateText({ task: 'qa', prompt: 'Explain TypeScript', hints: { quality: 'high' }});Before (Traditional Provider with API Costs):
import { anthropic } from '@ai-sdk/anthropic';import { generateText } from 'ai';
// Costs you money, uses your API key, requires serverconst result = await generateText({ model: anthropic('claude-3-5-sonnet-20241022'), prompt: 'Explain TypeScript'});After (WebLLM - Zero Cost, Intelligent Selection):
import { webllm } from 'webllm-ai-provider';import { generateText } from 'ai';
// Zero cost, intelligent model selectionconst result = await generateText({ model: webllm({ task: 'qa', hints: { quality: 'high' } }), prompt: 'Explain TypeScript'});How It Works
Section titled “How It Works”Architecture Flow
Section titled “Architecture Flow”Your App (Vercel AI SDK) ↓webllm({ task: 'qa', hints: { quality: 'high' } }) ↓WebLLM Extension (intelligent selection) ↓Selected Provider: - Anthropic (user's API key) - OpenAI (user's API key) - Local models (WebGPU) ↓Response back to your appRequest Routing Process
Section titled “Request Routing Process”- Developer specifies task type and hints
- WebLLM receives the request in the user’s browser
- Analyzes requirements - task type, speed/quality preferences, capabilities needed
- Scores available models - based on task compatibility and user configuration
- Selects best match - highest-scoring available model
- Executes request via selected provider
- Returns response with selection metadata
Task Types
Section titled “Task Types”Describe what you want to do:
general- General conversation, Q&Asummarization- Text summarizationtranslation- Language translationqa- Question answeringcoding- Code generation/assistancecreative- Creative writingextraction- Structured data extraction
Model Hints
Section titled “Model Hints”Performance Preferences
Section titled “Performance Preferences”Guide selection with speed and quality preferences:
import { generateText } from 'webllm';
const result = await generateText({ task: 'coding', hints: { speed: 'balanced', quality: 'high' }, prompt: 'Write a React component'});import { webllm } from 'webllm-ai-provider';import { generateText } from 'ai';
const result = await generateText({ model: webllm({ task: 'coding', hints: { speed: 'balanced', quality: 'high' } }), prompt: 'Write a React component'});Speed Options:
fastest- Smallest, fastest modelsfast- Quick with decent qualitybalanced- Balance of speed and qualityquality- Prioritize quality over speed
Quality Options:
draft- Good enough for prototypingstandard- Production qualityhigh- High quality resultsbest- Best available, regardless of speed
Capability Requirements
Section titled “Capability Requirements”Specify required capabilities:
const result = await generateText({ task: 'coding', hints: { capabilities: { codeGeneration: true, reasoning: true, longContext: true } }, prompt: 'Refactor this large codebase...'});const result = await generateText({ model: webllm({ task: 'coding', hints: { capabilities: { codeGeneration: true, reasoning: true, longContext: true } } }), prompt: 'Refactor this large codebase...'});Available Capabilities:
multilingual- Good multilingual supportcodeGeneration- Strong coding abilitiesreasoning- Chain-of-thought reasoninglongContext- Needs 32k+ context windowmath- Mathematical problem solvingfunctionCalling- Native function calling support
Resource Constraints
Section titled “Resource Constraints”Limit model size for mobile or resource-constrained devices:
const result = await generateText({ model: webllm({ task: 'qa', hints: { maxModelSize: 2, // Max 2GB model maxMemory: 4 // Max 4GB RAM } }), prompt: 'Quick question...'});Common Use Cases
Section titled “Common Use Cases”Fast Customer Support
Section titled “Fast Customer Support”Prioritize speed for quick customer interactions:
import { generateText } from 'webllm';
const result = await generateText({ task: 'general', hints: { speed: 'fastest' }, prompt: 'How do I reset my password?'});import { generateText } from 'ai';import { webllm } from 'webllm-ai-provider';
const result = await generateText({ model: webllm({ task: 'general', hints: { speed: 'fastest' } }), prompt: 'How do I reset my password?'});Complex Code Analysis
Section titled “Complex Code Analysis”Prioritize quality for deep technical analysis:
import { generateText } from 'webllm';
const result = await generateText({ task: 'coding', hints: { quality: 'best', capabilities: { reasoning: true, codeGeneration: true } }, prompt: 'Analyze this microservices architecture...'});import { generateText } from 'ai';import { webllm } from 'webllm-ai-provider';
const result = await generateText({ model: webllm({ task: 'coding', hints: { quality: 'best', capabilities: { reasoning: true, codeGeneration: true } } }), prompt: 'Analyze this microservices architecture...'});Creative Writing
Section titled “Creative Writing”Balance capability and creativity:
import { generateText } from 'webllm';
const result = await generateText({ task: 'creative', hints: { quality: 'high' }, prompt: 'Write a short story about AI...', temperature: 0.9});import { generateText } from 'ai';import { webllm } from 'webllm-ai-provider';
const result = await generateText({ model: webllm({ task: 'creative', hints: { quality: 'high' } }), prompt: 'Write a short story about AI...', temperature: 0.9});Streaming Responses
Section titled “Streaming Responses”import { webllm } from 'webllm-ai-provider';import { streamText } from 'ai';
const { textStream } = await streamText({ model: webllm({ task: 'creative', hints: { quality: 'best' } }), prompt: 'Write a story about robots...'});
for await (const chunk of textStream) { process.stdout.write(chunk);}React Integration
Section titled “React Integration”Client Component
Section titled “Client Component”'use client';
import { useState } from 'react';import { generateText } from 'ai';import { webllm } from 'webllm-ai-provider';
export default function ChatPage() { const [response, setResponse] = useState(''); const [isLoading, setIsLoading] = useState(false);
const handleGenerate = async () => { setIsLoading(true);
const result = await generateText({ model: webllm({ task: 'qa' }), prompt: 'Explain quantum computing simply' });
setResponse(result.text); setIsLoading(false); };
return ( <div> <button onClick={handleGenerate} disabled={isLoading}> {isLoading ? 'Generating...' : 'Generate'} </button> {response && <p>{response}</p>} </div> );}With useChat Hook
Section titled “With useChat Hook”'use client';
import { useChat } from 'ai/react';
export default function Chat() { const { messages, input, handleInputChange, handleSubmit } = useChat({ api: '/api/chat', });
return ( <div> {messages.map(m => ( <div key={m.id}> {m.role}: {m.content} </div> ))}
<form onSubmit={handleSubmit}> <input value={input} onChange={handleInputChange} /> <button type="submit">Send</button> </form> </div> );}API Route (Next.js App Router)
Section titled “API Route (Next.js App Router)”import { streamText } from 'ai';import { webllm } from 'webllm-ai-provider';
export async function POST(req: Request) { const { messages } = await req.json();
const result = await streamText({ model: webllm({ task: 'general', hints: { speed: 'balanced', quality: 'high' } }), messages, });
return result.toDataStreamResponse();}Migration Guide
Section titled “Migration Guide”From Anthropic
Section titled “From Anthropic”Before:
import { anthropic } from '@ai-sdk/anthropic';
const result = await generateText({ model: anthropic('claude-3-5-sonnet-20241022'), prompt: 'Hello!', temperature: 0.7, maxTokens: 1000});After:
import { webllm } from 'webllm-ai-provider';
const result = await generateText({ model: webllm({ task: 'general', hints: { quality: 'high' } }), prompt: 'Hello!', temperature: 0.7, maxTokens: 1000});From OpenAI
Section titled “From OpenAI”Before:
import { openai } from '@ai-sdk/openai';
const result = await generateText({ model: openai('gpt-4'), prompt: 'Write code'});After:
import { webllm } from 'webllm-ai-provider';
const result = await generateText({ model: webllm({ task: 'coding', hints: { quality: 'best' } }), prompt: 'Write code'});Selection Examples
Section titled “Selection Examples”How WebLLM selects models based on task and hints:
| Task | Hints | Available Models | Selected |
|---|---|---|---|
qa | quality: 'high' | Claude Haiku, Sonnet, Opus | Claude Sonnet |
coding | speed: 'fastest' | GPT-4, GPT-3.5, Llama 3.2 1B | Llama 3.2 1B |
creative | quality: 'best' | All providers | Claude Opus or GPT-4 |
summarization | speed: 'fast' | All providers | GPT-3.5 or Claude Haiku |
API Reference
Section titled “API Reference”webllm(settings?)
Section titled “webllm(settings?)”Creates a WebLLM language model instance with intelligent selection.
Parameters:
settings(optional): Configuration objecttask(string): Task typehints(object): Model selection hintsspeed(string): Speed preferencequality(string): Quality preferencemaxModelSize(number): Max model size in GBmaxMemory(number): Max memory in GBcapabilities(object): Required capabilitiesmodelId(string): Force specific model (expert override)excludeModels(string[]): Exclude specific models
Returns: LanguageModelV1 instance compatible with Vercel AI SDK
Troubleshooting
Section titled “Troubleshooting”Extension Not Found
Section titled “Extension Not Found”import { isAvailable } from 'webllm';
if (!isAvailable()) { console.error('WebLLM extension not installed'); // Show installation prompt}Node.js Environment
Section titled “Node.js Environment”This provider only works in browser environments. For server-side AI, use traditional providers:
// Server-side: use direct API providerimport { anthropic } from '@ai-sdk/anthropic';
export async function POST(req: Request) { const result = await generateText({ model: anthropic('claude-3-5-sonnet-20241022'), prompt: 'Server-side generation' });
return Response.json(result);}Best Practices
Section titled “Best Practices”- Always check availability before using WebLLM
- Use task types to guide intelligent selection
- Provide hints for better model matching
- Let users know they’re using browser-native AI
- Handle errors gracefully when providers are unavailable