Skip to content

Configuring Your AI Providers

WebLLM supports multiple AI providers, and you can use as many as you want. This guide will help you configure providers to match your needs.

A provider is where the AI processing happens. WebLLM supports three types:

  • AI runs on your computer
  • Free forever
  • Private - data never leaves your device
  • Works offline
  • Good for basic tasks
  • AI runs on provider’s servers (using your API key)
  • Premium capabilities (Claude, GPT-4, etc.)
  • Your data goes to provider you choose
  • Requires internet connection
  • You pay provider directly (usually $0.01-0.10 per request)
  • Use both local and cloud
  • Automatic routing based on task complexity
  • Best of both worlds

Local models are free and completely private. Perfect for getting started!

  1. Open WebLLM extension
  2. Click “Providers” tab
  3. Click “Add Provider” → “Local Model”
  4. Choose a model to download:

Llama 3.2 1B - 1.2GB

  • Best for: Getting started, quick tasks
  • Speed: Very fast
  • Quality: Good for summaries, simple Q&A
  • Requirements: 4GB RAM

Llama 3.2 3B - 3.5GB

  • Best for: Better quality responses
  • Speed: Fast
  • Quality: Great for most tasks
  • Requirements: 6GB RAM

Phi-3 Mini - 2.3GB

  • Best for: Code and technical content
  • Speed: Fast
  • Quality: Excellent for programming
  • Requirements: 4GB RAM

Phi-3 Medium - 7.6GB

  • Best for: Complex reasoning
  • Speed: Moderate
  • Quality: Very high quality
  • Requirements: 12GB RAM
  1. Click “Download” on your chosen model
  2. Wait for download to complete (one-time only)
  3. Click “Activate”

Local model ready!

You can download multiple local models for different purposes:

  • Quick model (1B) for fast, simple tasks
  • Quality model (3B+) for important tasks
  • Code model (Phi-3) for programming tasks

WebLLM can automatically choose the right model for each task.

View your models:

  • Providers → Local Models → See all downloaded models

Delete a model:

  • Click the model → Click “Delete” → Confirm
  • Frees up disk space

Update a model:

  • Models auto-update when new versions are available
  • You’ll get a notification

Cloud providers give you access to state-of-the-art AI models.

Claude excels at long conversations, analysis, and following instructions carefully.

Get an API Key:

  1. Go to console.anthropic.com
  2. Sign up or log in
  3. Go to API Keys
  4. Click “Create Key”
  5. Copy the key (starts with sk-ant-)

Add to WebLLM:

  1. Open WebLLM → Providers
  2. Click “Add Provider” → “Anthropic”
  3. Paste your API key
  4. (Optional) Set model preference:
    • Claude 3.5 Sonnet - Best balance (recommended)
    • Claude 3 Opus - Highest quality
    • Claude 3 Haiku - Fastest, cheapest
  5. (Optional) Set monthly spending limit
  6. Click “Save”

Claude ready to use!

GPT-4 is excellent for creative writing, general knowledge, and versatility.

Get an API Key:

  1. Go to platform.openai.com
  2. Sign up or log in
  3. Go to API Keys
  4. Click “Create new secret key”
  5. Copy the key (starts with sk-)

Add to WebLLM:

  1. Open WebLLM → Providers
  2. Click “Add Provider” → “OpenAI”
  3. Paste your API key
  4. (Optional) Set model preference:
    • GPT-4 Turbo - Best balance (recommended)
    • GPT-4 - Highest quality
    • GPT-3.5 Turbo - Fastest, cheapest
  5. (Optional) Set monthly spending limit
  6. Click “Save”

OpenAI ready to use!

WebLLM also supports:

  • Google (Gemini) - Coming soon
  • Mistral - Coming soon
  • Custom providers - For advanced users

When you have multiple providers, WebLLM needs to know which one to use. You set the priority order.

  1. Open WebLLM → Providers
  2. Drag providers to reorder them
  3. Top = Highest Priority

Example priority order:

  1. Local (Llama 3.2 1B) - Try first
  2. Claude 3.5 Sonnet - Fallback for complex tasks
  3. GPT-4 Turbo - Fallback if Claude fails

WebLLM tries providers in order:

  1. Try first provider (e.g., local model)
  2. If it can’t handle the request → Try next provider
  3. Continue until one succeeds

Automatic fallback happens when:

  • Local model doesn’t have enough capabilities
  • Provider is offline or rate-limited
  • Request exceeds provider’s limits
  • Provider returns an error

Enable smart routing to automatically choose the best provider:

  1. Settings → Providers → Enable “Smart Routing”
  2. WebLLM will:
    • Analyze each request
    • Route simple tasks to fast/cheap providers
    • Route complex tasks to powerful providers

Example:

  • “Summarize this email” → Local model (fast, free)
  • “Write a business proposal” → Claude (higher quality)
  • “Fix this code bug” → GPT-4 (best for code)

Protect yourself from unexpected API costs.

For each cloud provider:

  1. Providers → [Provider Name] → Settings
  2. Set limits:
    • Daily limit - Max spending per day
    • Monthly limit - Max spending per month
    • Per-request limit - Max cost per single request
  3. Click “Save”

What happens when limit is reached:

  • Provider is temporarily disabled
  • WebLLM falls back to next provider
  • You get a notification
  • Limit resets automatically (daily/monthly)

Before configuring cloud providers, estimate your costs:

Typical costs per request:

  • Short query (10-50 tokens): $0.001 - 0.01
  • Medium task (100-500 tokens): $0.01 - 0.05
  • Long generation (1000+ tokens): $0.05 - 0.20

Example monthly costs:

  • 10 requests/day (casual use): $3-10/month
  • 50 requests/day (regular use): $15-50/month
  • 200 requests/day (heavy use): $60-200/month

Save money by:

  • Using local models for simple tasks
  • Choosing cheaper models (Haiku, GPT-3.5)
  • Setting smart routing rules
  • Monitoring usage regularly

Track your spending:

  1. Open WebLLM → Usage
  2. See breakdown by:
    • Provider
    • Date
    • Website
    • Model used
  3. Export as CSV for analysis

Each provider has unique settings:

Local Models:

  • Max context length - How much text to process
  • Temperature - Creativity level (0-1)
  • GPU acceleration - Use GPU if available

Cloud Providers:

  • Model version - Which model to use
  • Default parameters - Temperature, max tokens, etc.
  • API endpoint - For custom deployments
  • Timeout - How long to wait for responses

Settings that apply to all providers:

  1. Settings → Providers
  2. Configure:
    • Request timeout - Max wait time (default: 30s)
    • Retry attempts - How many times to retry on failure
    • Cache responses - Save responses to speed up repeated requests
    • Streaming - Show responses word-by-word as they generate

Goal: Keep everything local, no cloud services

Configuration:

  1. Download Llama 3.2 3B and Phi-3 Medium
  2. No cloud providers
  3. Enable GPU acceleration

Best for: Privacy-conscious users, sensitive data

Goal: Best AI capabilities, cost is not a concern

Configuration:

  1. Priority 1: Claude 3 Opus
  2. Priority 2: GPT-4 (fallback)
  3. Optional: Local model for offline use

Best for: Professional use, critical applications

Goal: Great AI without breaking the bank

Configuration:

  1. Priority 1: Local (Llama 3.2 3B) - Free for most tasks
  2. Priority 2: Claude 3 Haiku - Cheap cloud fallback
  3. Priority 3: Claude 3.5 Sonnet - For complex tasks only
  4. Enable smart routing
  5. Set monthly spending limit: $20

Best for: Regular users, personal projects

Goal: Work offline, cloud when connected

Configuration:

  1. Download multiple local models
  2. Add cloud providers (Claude, GPT-4)
  3. Enable “Offline mode” - Only use local unless online
  4. Cloud providers auto-activate when internet available

Best for: Travel, unreliable internet, privacy + convenience

Check:

  • Model fully downloaded? (Providers → Local → Check download status)
  • Enough RAM? (Close other apps)
  • GPU acceleration enabled? (Settings → Local Models → GPU)

Try:

  • Restart browser
  • Re-download model
  • Try a smaller model

Check:

  • Key copied correctly? (No extra spaces)
  • Account has credit? (Check provider dashboard)
  • Key has right permissions? (Some keys are restricted)

Try:

  • Regenerate API key
  • Check provider’s status page
  • Verify billing info with provider

Check:

  • Multiple providers configured?
  • Providers in correct priority order?
  • Fallback enabled? (Settings → Providers → Enable Fallback)

Try:

  • Test each provider individually
  • Check provider status
  • Review error logs (Extension → History → Errors)

Now that your providers are configured:

➡️ Using WebLLM Websites - Learn to use WebLLM on websites ➡️ Privacy & Data Control - Understand how your data is protected ➡️ Advanced Configuration - Fine-tune for your needs

Yes! Local models require no API keys or accounts. They’re completely free and private.

Most providers offer free credits for new accounts:

  • OpenAI: $5-18 free credits (varies)
  • Anthropic: $5 free credits
  • Check provider websites for current offers

Not directly - ChatGPT Plus is for the ChatGPT interface. You need a separate API key from platform.openai.com.

However, API access is often cheaper than Plus for typical usage!

It depends on your needs:

  • Best quality: Claude 3 Opus, GPT-4
  • Best value: Local models (free), Claude Haiku
  • Best for code: GPT-4, Phi-3
  • Best privacy: Local models
  • Best offline: Local models

Recommendation: Use multiple providers with smart routing!

Yes! Advanced users can add custom providers:

  1. Providers → Add Provider → Custom
  2. Enter API endpoint, authentication, and model info
  3. Save and test

Your providers are set up! Ready to use AI on your terms.