Configuring Your AI Providers

WebLLM supports multiple AI providers, and you can use as many as you want. This guide will help you configure providers to match your needs.

Understanding Providers

A provider is where the AI processing happens. WebLLM supports three types:

🏠 Local Providers

AI runs on your computer
Free forever
Private - data never leaves your device
Works offline
Good for basic tasks

☁️ Cloud Providers

AI runs on provider’s servers (using your API key)
Premium capabilities (Claude, GPT-4, etc.)
Your data goes to provider you choose
Requires internet connection
You pay provider directly (usually $0.01-0.10 per request)

🔀 Hybrid

Use both local and cloud
Automatic routing based on task complexity
Best of both worlds

Setting Up Local Models

Local models are free and completely private. Perfect for getting started!

Download a Local Model

Open WebLLM extension
Click “Providers” tab
Click “Add Provider” → “Local Model”
Choose a model to download:

Recommended Models

Llama 3.2 1B - 1.2GB

Best for: Getting started, quick tasks
Speed: Very fast
Quality: Good for summaries, simple Q&A
Requirements: 4GB RAM

Llama 3.2 3B - 3.5GB

Best for: Better quality responses
Speed: Fast
Quality: Great for most tasks
Requirements: 6GB RAM

Phi-3 Mini - 2.3GB

Best for: Code and technical content
Speed: Fast
Quality: Excellent for programming
Requirements: 4GB RAM

Phi-3 Medium - 7.6GB

Best for: Complex reasoning
Speed: Moderate
Quality: Very high quality
Requirements: 12GB RAM

Click “Download” on your chosen model
Wait for download to complete (one-time only)
Click “Activate”

✅ Local model ready!

Using Multiple Local Models

You can download multiple local models for different purposes:

Quick model (1B) for fast, simple tasks
Quality model (3B+) for important tasks
Code model (Phi-3) for programming tasks

WebLLM can automatically choose the right model for each task.

Managing Downloaded Models

View your models:

Providers → Local Models → See all downloaded models

Delete a model:

Click the model → Click “Delete” → Confirm
Frees up disk space

Update a model:

Models auto-update when new versions are available
You’ll get a notification

Setting Up Cloud Providers

Cloud providers give you access to state-of-the-art AI models.

Anthropic (Claude)

Claude excels at long conversations, analysis, and following instructions carefully.

Get an API Key:

Go to console.anthropic.com
Sign up or log in
Go to API Keys
Click “Create Key”
Copy the key (starts with sk-ant-)

Add to WebLLM:

Open WebLLM → Providers
Click “Add Provider” → “Anthropic”
Paste your API key
(Optional) Set model preference:
- Claude 3.5 Sonnet - Best balance (recommended)
- Claude 3 Opus - Highest quality
- Claude 3 Haiku - Fastest, cheapest
(Optional) Set monthly spending limit
Click “Save”

✅ Claude ready to use!

OpenAI (GPT-4, ChatGPT)

GPT-4 is excellent for creative writing, general knowledge, and versatility.

Get an API Key:

Go to platform.openai.com
Sign up or log in
Go to API Keys
Click “Create new secret key”
Copy the key (starts with sk-)

Add to WebLLM:

Open WebLLM → Providers
Click “Add Provider” → “OpenAI”
Paste your API key
(Optional) Set model preference:
- GPT-4 Turbo - Best balance (recommended)
- GPT-4 - Highest quality
- GPT-3.5 Turbo - Fastest, cheapest
(Optional) Set monthly spending limit
Click “Save”

✅ OpenAI ready to use!

Other Providers

WebLLM also supports:

Google (Gemini) - Coming soon
Mistral - Coming soon
Custom providers - For advanced users

Provider Priority & Fallback

When you have multiple providers, WebLLM needs to know which one to use. You set the priority order.

Setting Priority

Open WebLLM → Providers
Drag providers to reorder them
Top = Highest Priority

Example priority order:

Local (Llama 3.2 1B) - Try first
Claude 3.5 Sonnet - Fallback for complex tasks
GPT-4 Turbo - Fallback if Claude fails

How Fallback Works

WebLLM tries providers in order:

Try first provider (e.g., local model)
If it can’t handle the request → Try next provider
Continue until one succeeds

Automatic fallback happens when:

Local model doesn’t have enough capabilities
Provider is offline or rate-limited
Request exceeds provider’s limits
Provider returns an error

Smart Routing (Advanced)

Enable smart routing to automatically choose the best provider:

Settings → Providers → Enable “Smart Routing”
WebLLM will:
- Analyze each request
- Route simple tasks to fast/cheap providers
- Route complex tasks to powerful providers

Example:

“Summarize this email” → Local model (fast, free)
“Write a business proposal” → Claude (higher quality)
“Fix this code bug” → GPT-4 (best for code)

Spending Limits & Cost Control

Protect yourself from unexpected API costs.

Set Spending Limits

For each cloud provider:

Providers → [Provider Name] → Settings
Set limits:
- Daily limit - Max spending per day
- Monthly limit - Max spending per month
- Per-request limit - Max cost per single request
Click “Save”

What happens when limit is reached:

Provider is temporarily disabled
WebLLM falls back to next provider
You get a notification
Limit resets automatically (daily/monthly)

Cost Estimation

Before configuring cloud providers, estimate your costs:

Typical costs per request:

Short query (10-50 tokens): $0.001 - 0.01
Medium task (100-500 tokens): $0.01 - 0.05
Long generation (1000+ tokens): $0.05 - 0.20

Example monthly costs:

10 requests/day (casual use): $3-10/month
50 requests/day (regular use): $15-50/month
200 requests/day (heavy use): $60-200/month

Save money by:

Using local models for simple tasks
Choosing cheaper models (Haiku, GPT-3.5)
Setting smart routing rules
Monitoring usage regularly

View Usage & Costs

Track your spending:

Open WebLLM → Usage
See breakdown by:
- Provider
- Date
- Website
- Model used
Export as CSV for analysis

Provider Settings

Per-Provider Configuration

Each provider has unique settings:

Local Models:

Max context length - How much text to process
Temperature - Creativity level (0-1)
GPU acceleration - Use GPU if available

Cloud Providers:

Model version - Which model to use
Default parameters - Temperature, max tokens, etc.
API endpoint - For custom deployments
Timeout - How long to wait for responses

Global Settings

Settings that apply to all providers:

Settings → Providers
Configure:
- Request timeout - Max wait time (default: 30s)
- Retry attempts - How many times to retry on failure
- Cache responses - Save responses to speed up repeated requests
- Streaming - Show responses word-by-word as they generate

Common Provider Setups

Setup 1: Maximum Privacy

Goal: Keep everything local, no cloud services

Configuration:

Download Llama 3.2 3B and Phi-3 Medium
No cloud providers
Enable GPU acceleration

Best for: Privacy-conscious users, sensitive data

Setup 2: Best Quality

Goal: Best AI capabilities, cost is not a concern

Configuration:

Priority 1: Claude 3 Opus
Priority 2: GPT-4 (fallback)
Optional: Local model for offline use

Best for: Professional use, critical applications

Setup 3: Cost-Effective Balance

Goal: Great AI without breaking the bank

Configuration:

Priority 1: Local (Llama 3.2 3B) - Free for most tasks
Priority 2: Claude 3 Haiku - Cheap cloud fallback
Priority 3: Claude 3.5 Sonnet - For complex tasks only
Enable smart routing
Set monthly spending limit: $20

Best for: Regular users, personal projects

Setup 4: Offline-First

Goal: Work offline, cloud when connected

Configuration:

Download multiple local models
Add cloud providers (Claude, GPT-4)
Enable “Offline mode” - Only use local unless online
Cloud providers auto-activate when internet available

Best for: Travel, unreliable internet, privacy + convenience

Troubleshooting Providers

Local model not working

Check:

Model fully downloaded? (Providers → Local → Check download status)
Enough RAM? (Close other apps)
GPU acceleration enabled? (Settings → Local Models → GPU)

Try:

Restart browser
Re-download model
Try a smaller model

API key not working

Check:

Key copied correctly? (No extra spaces)
Account has credit? (Check provider dashboard)
Key has right permissions? (Some keys are restricted)

Try:

Regenerate API key
Check provider’s status page
Verify billing info with provider

Fallback not working

Check:

Multiple providers configured?
Providers in correct priority order?
Fallback enabled? (Settings → Providers → Enable Fallback)

Try:

Test each provider individually
Check provider status
Review error logs (Extension → History → Errors)

Next Steps

Now that your providers are configured:

➡️ Using WebLLM Websites - Learn to use WebLLM on websites ➡️ Privacy & Data Control - Understand how your data is protected ➡️ Advanced Configuration - Fine-tune for your needs

Questions

Can I use providers without API keys?

Yes! Local models require no API keys or accounts. They’re completely free and private.

How do I get free API credits?

Most providers offer free credits for new accounts:

OpenAI: $5-18 free credits (varies)
Anthropic: $5 free credits
Check provider websites for current offers

Can I use my ChatGPT Plus subscription?

Not directly - ChatGPT Plus is for the ChatGPT interface. You need a separate API key from platform.openai.com.

However, API access is often cheaper than Plus for typical usage!

Which provider is best?

It depends on your needs:

Best quality: Claude 3 Opus, GPT-4
Best value: Local models (free), Claude Haiku
Best for code: GPT-4, Phi-3
Best privacy: Local models
Best offline: Local models

Recommendation: Use multiple providers with smart routing!

Can I add my own custom provider?

Yes! Advanced users can add custom providers:

Providers → Add Provider → Custom
Enter API endpoint, authentication, and model info
Save and test

Your providers are set up! Ready to use AI on your terms.