Provider Management
Overview
Section titled “Overview”WebLLM supports multiple AI providers with automatic fallback. Users configure their preferred providers and priorities, and the extension handles the rest.
20+ Supported Providers
Section titled “20+ Supported Providers”API Providers
Section titled “API Providers”- Anthropic (Claude) - Requires API key from console.anthropic.com
- OpenAI (GPT) - Requires API key from platform.openai.com
- Custom OpenAI-compatible - Any API following OpenAI format
Local Providers
Section titled “Local Providers”- Local Models - Run entirely in browser via WebGPU/WASM
- Llama 3.2 1B (~1.2GB)
- Phi-3 Mini (~2GB)
- Other ONNX-compatible models
Priority System
Section titled “Priority System”Users configure provider priority in extension settings:
Priority Order (drag to reorder):1. 🟢 Local Model (Llama 3.2 1B) [Enabled]2. 🔑 Anthropic API [Enabled]3. 🔑 OpenAI API [Disabled]How Priority Works
Section titled “How Priority Works”When a request comes in:
- Try highest priority provider first
- If unavailable (no API key, model not downloaded, rate limited), try next
- Continue until success or all providers exhausted
- Return error only if all providers fail
Example Scenarios
Section titled “Example Scenarios”Scenario 1: Local Model Available
- Request → Local Model (free, instant)
- API providers never called
- No cost, maximum privacy
Scenario 2: Local Model Unavailable
- Request → Local Model (not downloaded) → Skip
- → Anthropic API (has key) → Success
- Uses user’s API key, user pays
Scenario 3: All Providers Need Setup
- Request → Extension prompts user to configure
- User adds API key or downloads model
- Request retried automatically
Provider Configuration
Section titled “Provider Configuration”Adding API Provider
Section titled “Adding API Provider”- Open extension settings
- Click “Add Provider” or configure existing
- Select provider (Anthropic, OpenAI, or Custom)
- Enter API key
- (Optional) Test connection
- Enable and set priority
Adding Local Model
Section titled “Adding Local Model”- Open extension settings
- Go to “Model Management”
- Browse available models
- Click “Download” (downloads to IndexedDB)
- Once downloaded, appears in provider list
- Enable and set priority
Custom OpenAI-Compatible API
Section titled “Custom OpenAI-Compatible API”For services like:
- Together AI
- Anyscale
- Local LM Studio
- Self-hosted vLLM
Configuration:
Provider: Custom OpenAI-compatibleBase URL: https://api.together.xyz/v1API Key: your-api-keyModel ID: meta-llama/Llama-3-8b-chat-hfProvider Availability
Section titled “Provider Availability”A provider is considered available if:
For API Providers
Section titled “For API Providers”- ✅ API key is configured
- ✅ Provider is enabled
- ✅ Not rate-limited
- ✅ Internet connection available
For Local Providers
Section titled “For Local Providers”- ✅ Model is downloaded
- ✅ Provider is enabled
- ✅ Sufficient memory available
- ✅ WebGPU/WASM support detected
Automatic Fallback
Section titled “Automatic Fallback”The extension handles failures gracefully:
// User doesn't see this complexityconst result = await llm.summarize(text);
// Behind the scenes:// 1. Try local model → Out of memory// 2. Try Anthropic → Rate limited// 3. Try OpenAI → Success ✓Failure Reasons
Section titled “Failure Reasons”Common reasons for fallback:
-
Local Model
- Not downloaded
- Out of memory
- GPU not available
-
API Provider
- Invalid API key
- Rate limit exceeded
- Network error
- Insufficient credits
Provider Interface
Section titled “Provider Interface”All providers implement the same interface:
interface Provider { name: string; type: 'api' | 'local';
// Check if ready to use isAvailable(): Promise<boolean>;
// Execute request execute(request: LLMRequest): Promise<LLMResponse>;
// Streaming support stream?(request: LLMRequest): Promise<ReadableStream>;}Normalized Response
Section titled “Normalized Response”Regardless of provider, responses use standard format:
{ content: string, // Generated text usage: { inputTokens: number, outputTokens: number, cost?: number // If known }, metadata: { provider: string, // Which provider was used model: string, // Which model latency: number // Response time in ms }}Cost Tracking
Section titled “Cost Tracking”When using API providers with known pricing:
// Response includes cost informationconst result = await llm.generate(prompt);
console.log(result.usage);// {// inputTokens: 150,// outputTokens: 200,// cost: 0.0012 // $0.0012// }View accumulated costs in extension:
- Per-origin spending
- Daily/weekly/monthly totals
- By provider breakdown
Best Practices
Section titled “Best Practices”For Users
Section titled “For Users”- Start with local models - Free and private
- Add API key as backup - For complex tasks
- Monitor costs - Check spending in settings
- Revoke unused permissions - Keep control
For Developers
Section titled “For Developers”- Respect user’s choices - Don’t require specific provider
- Handle unavailability - Extension might not be installed
- Degrade gracefully - Offer fallback UX if LLM unavailable
- Be transparent - Tell users what you’ll use AI for
Provider Limits
Section titled “Provider Limits”Rate Limits
Section titled “Rate Limits”API providers have rate limits:
- Extension tracks and respects them
- Shows cooldown timer to user
- Automatically tries next provider
Token Limits
Section titled “Token Limits”Per-request maximums:
- Local models: Typically 2048-4096 tokens
- API models: Varies by model (8k-200k tokens)
- Extension validates before sending
Context Windows
Section titled “Context Windows”Different models support different context sizes:
- Extension warns if prompt too long
- Truncates or chunks if necessary
- Shows warning to user
Next Steps
Section titled “Next Steps”- Learn about Data & Privacy
- Explore Extension Architecture
- See Developer SDK