Provider Reference
WebLLM supports multiple AI providers, giving you the flexibility to choose between cloud-based APIs and local inference. This page provides a complete reference of all available providers.
Provider Types
Section titled “Provider Types”WebLLM supports two types of providers:
API Providers
Section titled “API Providers”API providers are cloud-based services that require API keys. They offer:
- Premium model capabilities (Claude, GPT-4, etc.)
- Fast response times with optimized infrastructure
- Pay-per-use pricing
- Regular model updates and improvements
Trade-offs: Requires internet connection and sends data to provider’s servers.
Local Providers
Section titled “Local Providers”Local providers run AI models on your own computer. They offer:
- Complete privacy - data never leaves your device
- No per-request costs
- Offline functionality
- Full control over model selection
Trade-offs: Requires downloading models and uses local compute resources.
All Providers
Section titled “All Providers”API Providers
Cloud-based AI services that require API keys. Pay-per-use pricing with premium capabilities.
OpenAI
APIAccess OpenAI's GPT models including GPT-4 and GPT-3.5. These models are excellent for creative writing, general knowledge tasks, and versatile applications.
Key Features
- Wide range of capabilities
- Fast response times
- Excellent for creative writing
- Strong general knowledge
- Good at code generation
Anthropic
APIAccess Claude, one of the most advanced AI models, via the Anthropic API. Claude excels at long conversations, careful instruction following, and complex reasoning tasks.
Key Features
- State-of-the-art language understanding
- Long context windows (up to 200K tokens)
- Strong safety and alignment
- Excellent at following instructions
- Great for analysis and reasoning
OpenRouter
APIOpenRouter provides unified access to hundreds of AI models from leading providers like Anthropic, Google, Meta, Mistral, and more. One API key for all models with transparent pricing and high availability.
Key Features
- Access to 300+ models from multiple providers
- Pay-as-you-go pricing with transparent costs
- No monthly fees or commitments
- Enterprise-grade infrastructure with automatic failover
- Simple integration with standardized API
- Immediate access to new models as they release
Community Resource Pool
APIAccess LLM inference powered by community contributors running Ollama and LM Studio. Free, open-source, and community-driven. Contributors share their local compute resources to create a decentralized inference network.
Key Features
- Completely free community-powered inference
- Multiple open-source models (Llama, Mistral, etc.)
- Distributed architecture for high availability
- Privacy-focused with optional request anonymization
- Support the community by contributing your resources
- No API keys required for basic usage
WebLLM Gateway
APIRoute requests through another WebLLM gateway instance. Enables federated inference across multiple gateway nodes, organization-hosted gateways, or fallback to other resources. Creates a true "web of computing nodes" where gateways can connect to each other.
Key Features
- Federation between gateway instances
- Organization-hosted private gateways
- Fallback to other gateways when primary fails
- Full streaming support
- Token-gated or API-key authentication
- Geographic routing to nearest gateway
Local Providers
Run AI models on your own computer. Completely free, private, and works offline.
Transformers.js
LOCALRun Hugging Face Transformers models directly in the browser or server-side using WebGPU/WebAssembly. No API keys required, fully local inference with support for chat, vision, embeddings, and transcription.
Key Features
- Runs fully in browser or Node.js
- WebGPU acceleration for fast inference
- Support for chat, vision, embeddings, transcription
- Model download progress tracking
- Web Worker support for off-main-thread execution
- No API keys or cloud dependency
Local Models (WebGPU)
LOCALRun downloaded Transformers.js models locally in your browser with WebGPU acceleration. Models are cached for offline use and all inference happens on your device.
Key Features
- Fully offline after model download
- WebGPU acceleration
- No API keys or cloud dependency
- Privacy-first: data never leaves your device
- Automatic model caching
ComfyUI
LOCALUse ComfyUI for powerful local image generation with custom node-based workflows. Supports Stable Diffusion, SDXL, Flux, and many other image models. Perfect for advanced users who want full control over their image generation pipeline.
Key Features
- Custom node-based workflows
- Supports SDXL, Flux, and more
- ControlNet and LoRA support
- Inpainting and outpainting
- Multiple samplers and schedulers
- Completely free and local
AUTOMATIC1111 / Forge
LOCALAUTOMATIC1111 Stable Diffusion WebUI is the most popular and feature-rich interface for Stable Diffusion. Also works with Forge and SD.Next which use the same API. Supports txt2img, img2img, inpainting, ControlNet, LoRA, and many extensions.
Key Features
- Most popular SD interface
- Extensive extension ecosystem
- ControlNet and LoRA support
- Inpainting and outpainting
- Upscaling and face restoration
- Compatible with Forge and SD.Next
- Completely free and local
Fooocus
LOCALFooocus is a streamlined image generation tool inspired by Midjourney. It features automatic prompt enhancement, style presets, and quality optimization. Great for users who want excellent results without complex settings.
Key Features
- Automatic prompt enhancement
- Style presets for easy customization
- Built-in quality optimization
- Simplified user interface
- SDXL-based for high quality
- Multiple performance modes
- Completely free and local
InvokeAI
LOCALInvokeAI is a professional-grade image generation suite with a node-based workflow system. Features a powerful canvas for inpainting, advanced model management, and a clean modern interface. Ideal for creative professionals.
Key Features
- Node-based workflow system
- Professional canvas with layers
- Advanced inpainting tools
- Unified model management
- Batch processing support
- Clean modern interface
- Completely free and local
Chrome AI (Gemini Nano)
LOCALUse Google's Gemini Nano model running directly in Chrome. Zero latency, completely private, and free. All inference happens on-device with no network requests. Requires Chrome 138+ with Chrome AI enabled.
Key Features
- Runs entirely on-device in Chrome
- Zero network latency
- Completely free to use
- Privacy-preserving - data never leaves device
- Streaming support
- System prompt support
- No API keys required
Choosing the Right Provider
Section titled “Choosing the Right Provider”For Maximum Privacy
Section titled “For Maximum Privacy”Use local providers (LM Studio or Ollama) exclusively. Your data never leaves your computer.
For Best Quality
Section titled “For Best Quality”Use API providers (Anthropic or OpenAI) for access to state-of-the-art models with the best capabilities.
For Cost-Effectiveness
Section titled “For Cost-Effectiveness”Use local providers for simple tasks and API providers only for complex tasks. Configure provider priority to try local first.
For Offline Work
Section titled “For Offline Work”Use local providers exclusively. Download models ahead of time for full offline functionality.
Provider Priority and Fallback
Section titled “Provider Priority and Fallback”WebLLM allows you to configure multiple providers with automatic fallback:
- Primary Provider - WebLLM tries this first
- Fallback Providers - If the primary fails or can’t handle the request, WebLLM tries these in order
- Automatic Selection - WebLLM can route requests based on task complexity
Example Configuration
Section titled “Example Configuration”// Users configure this in the extension UI:Priority Order:1. Ollama (local) - Try first for privacy and cost2. Anthropic (Claude) - Fallback for complex tasks3. OpenAI (GPT-4) - Final fallbackWhen a website uses WebLLM, the request automatically routes through this priority chain.
Configuration
Section titled “Configuration”All providers are configured through the WebLLM browser extension:
- Install Extension - Get the WebLLM extension for your browser
- Open Settings - Click the extension icon → Providers
- Add Provider - Choose from the list and configure
- Set Priority - Drag to reorder providers
- Test Connection - Verify your configuration works
See the Configuring Providers guide for detailed setup instructions.
For Developers
Section titled “For Developers”As a developer using WebLLM, you don’t need to configure providers. Your users configure their preferred providers in the extension.
Your code stays simple:
import { generateText } from 'webllm';
// This works regardless of which provider the user configuredconst result = await generateText({ prompt: 'Summarize this article...',});The user’s configured providers handle the request automatically.
Provider Hints (Optional)
Section titled “Provider Hints (Optional)”You can optionally provide hints to help WebLLM choose the best provider:
const result = await generateText({ prompt: 'Write a novel chapter...', preferences: { priority: ['quality'], // Prefer high-quality models taskHints: { type: 'creative', complexity: 'high' } }});This helps WebLLM route to appropriate providers (e.g., premium APIs for complex creative tasks, local models for simple tasks).
Adding New Providers
Section titled “Adding New Providers”WebLLM is extensible. New providers can be added by:
For Users
Section titled “For Users”The extension supports OpenAI-compatible APIs, allowing you to connect to:
- Custom OpenAI deployments
- Third-party API providers
- Self-hosted inference servers
For Contributors
Section titled “For Contributors”See the Provider System documentation to learn how to contribute new provider integrations.
Provider Data Source
Section titled “Provider Data Source”This documentation imports provider data directly from the source code (packages/server/src/providers/provider-registry.ts), ensuring it’s always up to date with the latest provider implementations.