Configuring Your AI Providers
WebLLM supports multiple AI providers, and you can use as many as you want. This guide will help you configure providers to match your needs.
Understanding Providers
Section titled “Understanding Providers”A provider is where the AI processing happens. WebLLM supports three types:
🏠 Local Providers
Section titled “🏠 Local Providers”- AI runs on your computer
- Free forever
- Private - data never leaves your device
- Works offline
- Good for basic tasks
☁️ Cloud Providers
Section titled “☁️ Cloud Providers”- AI runs on provider’s servers (using your API key)
- Premium capabilities (Claude, GPT-4, etc.)
- Your data goes to provider you choose
- Requires internet connection
- You pay provider directly (usually $0.01-0.10 per request)
🔀 Hybrid
Section titled “🔀 Hybrid”- Use both local and cloud
- Automatic routing based on task complexity
- Best of both worlds
Setting Up Local Models
Section titled “Setting Up Local Models”Local models are free and completely private. Perfect for getting started!
Download a Local Model
Section titled “Download a Local Model”- Open WebLLM extension
- Click “Providers” tab
- Click “Add Provider” → “Local Model”
- Choose a model to download:
Recommended Models
Section titled “Recommended Models”Llama 3.2 1B - 1.2GB
- Best for: Getting started, quick tasks
- Speed: Very fast
- Quality: Good for summaries, simple Q&A
- Requirements: 4GB RAM
Llama 3.2 3B - 3.5GB
- Best for: Better quality responses
- Speed: Fast
- Quality: Great for most tasks
- Requirements: 6GB RAM
Phi-3 Mini - 2.3GB
- Best for: Code and technical content
- Speed: Fast
- Quality: Excellent for programming
- Requirements: 4GB RAM
Phi-3 Medium - 7.6GB
- Best for: Complex reasoning
- Speed: Moderate
- Quality: Very high quality
- Requirements: 12GB RAM
- Click “Download” on your chosen model
- Wait for download to complete (one-time only)
- Click “Activate”
✅ Local model ready!
Using Multiple Local Models
Section titled “Using Multiple Local Models”You can download multiple local models for different purposes:
- Quick model (1B) for fast, simple tasks
- Quality model (3B+) for important tasks
- Code model (Phi-3) for programming tasks
WebLLM can automatically choose the right model for each task.
Managing Downloaded Models
Section titled “Managing Downloaded Models”View your models:
- Providers → Local Models → See all downloaded models
Delete a model:
- Click the model → Click “Delete” → Confirm
- Frees up disk space
Update a model:
- Models auto-update when new versions are available
- You’ll get a notification
Setting Up Cloud Providers
Section titled “Setting Up Cloud Providers”Cloud providers give you access to state-of-the-art AI models.
Anthropic (Claude)
Section titled “Anthropic (Claude)”Claude excels at long conversations, analysis, and following instructions carefully.
Get an API Key:
- Go to console.anthropic.com
- Sign up or log in
- Go to API Keys
- Click “Create Key”
- Copy the key (starts with
sk-ant-)
Add to WebLLM:
- Open WebLLM → Providers
- Click “Add Provider” → “Anthropic”
- Paste your API key
- (Optional) Set model preference:
- Claude 3.5 Sonnet - Best balance (recommended)
- Claude 3 Opus - Highest quality
- Claude 3 Haiku - Fastest, cheapest
- (Optional) Set monthly spending limit
- Click “Save”
✅ Claude ready to use!
OpenAI (GPT-4, ChatGPT)
Section titled “OpenAI (GPT-4, ChatGPT)”GPT-4 is excellent for creative writing, general knowledge, and versatility.
Get an API Key:
- Go to platform.openai.com
- Sign up or log in
- Go to API Keys
- Click “Create new secret key”
- Copy the key (starts with
sk-)
Add to WebLLM:
- Open WebLLM → Providers
- Click “Add Provider” → “OpenAI”
- Paste your API key
- (Optional) Set model preference:
- GPT-4 Turbo - Best balance (recommended)
- GPT-4 - Highest quality
- GPT-3.5 Turbo - Fastest, cheapest
- (Optional) Set monthly spending limit
- Click “Save”
✅ OpenAI ready to use!
Other Providers
Section titled “Other Providers”WebLLM also supports:
- Google (Gemini) - Coming soon
- Mistral - Coming soon
- Custom providers - For advanced users
Provider Priority & Fallback
Section titled “Provider Priority & Fallback”When you have multiple providers, WebLLM needs to know which one to use. You set the priority order.
Setting Priority
Section titled “Setting Priority”- Open WebLLM → Providers
- Drag providers to reorder them
- Top = Highest Priority
Example priority order:
- Local (Llama 3.2 1B) - Try first
- Claude 3.5 Sonnet - Fallback for complex tasks
- GPT-4 Turbo - Fallback if Claude fails
How Fallback Works
Section titled “How Fallback Works”WebLLM tries providers in order:
- Try first provider (e.g., local model)
- If it can’t handle the request → Try next provider
- Continue until one succeeds
Automatic fallback happens when:
- Local model doesn’t have enough capabilities
- Provider is offline or rate-limited
- Request exceeds provider’s limits
- Provider returns an error
Smart Routing (Advanced)
Section titled “Smart Routing (Advanced)”Enable smart routing to automatically choose the best provider:
- Settings → Providers → Enable “Smart Routing”
- WebLLM will:
- Analyze each request
- Route simple tasks to fast/cheap providers
- Route complex tasks to powerful providers
Example:
- “Summarize this email” → Local model (fast, free)
- “Write a business proposal” → Claude (higher quality)
- “Fix this code bug” → GPT-4 (best for code)
Spending Limits & Cost Control
Section titled “Spending Limits & Cost Control”Protect yourself from unexpected API costs.
Set Spending Limits
Section titled “Set Spending Limits”For each cloud provider:
- Providers → [Provider Name] → Settings
- Set limits:
- Daily limit - Max spending per day
- Monthly limit - Max spending per month
- Per-request limit - Max cost per single request
- Click “Save”
What happens when limit is reached:
- Provider is temporarily disabled
- WebLLM falls back to next provider
- You get a notification
- Limit resets automatically (daily/monthly)
Cost Estimation
Section titled “Cost Estimation”Before configuring cloud providers, estimate your costs:
Typical costs per request:
- Short query (10-50 tokens): $0.001 - 0.01
- Medium task (100-500 tokens): $0.01 - 0.05
- Long generation (1000+ tokens): $0.05 - 0.20
Example monthly costs:
- 10 requests/day (casual use): $3-10/month
- 50 requests/day (regular use): $15-50/month
- 200 requests/day (heavy use): $60-200/month
Save money by:
- Using local models for simple tasks
- Choosing cheaper models (Haiku, GPT-3.5)
- Setting smart routing rules
- Monitoring usage regularly
View Usage & Costs
Section titled “View Usage & Costs”Track your spending:
- Open WebLLM → Usage
- See breakdown by:
- Provider
- Date
- Website
- Model used
- Export as CSV for analysis
Provider Settings
Section titled “Provider Settings”Per-Provider Configuration
Section titled “Per-Provider Configuration”Each provider has unique settings:
Local Models:
- Max context length - How much text to process
- Temperature - Creativity level (0-1)
- GPU acceleration - Use GPU if available
Cloud Providers:
- Model version - Which model to use
- Default parameters - Temperature, max tokens, etc.
- API endpoint - For custom deployments
- Timeout - How long to wait for responses
Global Settings
Section titled “Global Settings”Settings that apply to all providers:
- Settings → Providers
- Configure:
- Request timeout - Max wait time (default: 30s)
- Retry attempts - How many times to retry on failure
- Cache responses - Save responses to speed up repeated requests
- Streaming - Show responses word-by-word as they generate
Common Provider Setups
Section titled “Common Provider Setups”Setup 1: Maximum Privacy
Section titled “Setup 1: Maximum Privacy”Goal: Keep everything local, no cloud services
Configuration:
- Download Llama 3.2 3B and Phi-3 Medium
- No cloud providers
- Enable GPU acceleration
Best for: Privacy-conscious users, sensitive data
Setup 2: Best Quality
Section titled “Setup 2: Best Quality”Goal: Best AI capabilities, cost is not a concern
Configuration:
- Priority 1: Claude 3 Opus
- Priority 2: GPT-4 (fallback)
- Optional: Local model for offline use
Best for: Professional use, critical applications
Setup 3: Cost-Effective Balance
Section titled “Setup 3: Cost-Effective Balance”Goal: Great AI without breaking the bank
Configuration:
- Priority 1: Local (Llama 3.2 3B) - Free for most tasks
- Priority 2: Claude 3 Haiku - Cheap cloud fallback
- Priority 3: Claude 3.5 Sonnet - For complex tasks only
- Enable smart routing
- Set monthly spending limit: $20
Best for: Regular users, personal projects
Setup 4: Offline-First
Section titled “Setup 4: Offline-First”Goal: Work offline, cloud when connected
Configuration:
- Download multiple local models
- Add cloud providers (Claude, GPT-4)
- Enable “Offline mode” - Only use local unless online
- Cloud providers auto-activate when internet available
Best for: Travel, unreliable internet, privacy + convenience
Troubleshooting Providers
Section titled “Troubleshooting Providers”Local model not working
Section titled “Local model not working”Check:
- Model fully downloaded? (Providers → Local → Check download status)
- Enough RAM? (Close other apps)
- GPU acceleration enabled? (Settings → Local Models → GPU)
Try:
- Restart browser
- Re-download model
- Try a smaller model
API key not working
Section titled “API key not working”Check:
- Key copied correctly? (No extra spaces)
- Account has credit? (Check provider dashboard)
- Key has right permissions? (Some keys are restricted)
Try:
- Regenerate API key
- Check provider’s status page
- Verify billing info with provider
Fallback not working
Section titled “Fallback not working”Check:
- Multiple providers configured?
- Providers in correct priority order?
- Fallback enabled? (Settings → Providers → Enable Fallback)
Try:
- Test each provider individually
- Check provider status
- Review error logs (Extension → History → Errors)
Next Steps
Section titled “Next Steps”Now that your providers are configured:
➡️ Using WebLLM Websites - Learn to use WebLLM on websites ➡️ Privacy & Data Control - Understand how your data is protected ➡️ Advanced Configuration - Fine-tune for your needs
Questions
Section titled “Questions”Can I use providers without API keys?
Section titled “Can I use providers without API keys?”Yes! Local models require no API keys or accounts. They’re completely free and private.
How do I get free API credits?
Section titled “How do I get free API credits?”Most providers offer free credits for new accounts:
- OpenAI: $5-18 free credits (varies)
- Anthropic: $5 free credits
- Check provider websites for current offers
Can I use my ChatGPT Plus subscription?
Section titled “Can I use my ChatGPT Plus subscription?”Not directly - ChatGPT Plus is for the ChatGPT interface. You need a separate API key from platform.openai.com.
However, API access is often cheaper than Plus for typical usage!
Which provider is best?
Section titled “Which provider is best?”It depends on your needs:
- Best quality: Claude 3 Opus, GPT-4
- Best value: Local models (free), Claude Haiku
- Best for code: GPT-4, Phi-3
- Best privacy: Local models
- Best offline: Local models
Recommendation: Use multiple providers with smart routing!
Can I add my own custom provider?
Section titled “Can I add my own custom provider?”Yes! Advanced users can add custom providers:
- Providers → Add Provider → Custom
- Enter API endpoint, authentication, and model info
- Save and test
Your providers are set up! Ready to use AI on your terms.