Provider Reference

WebLLM supports multiple AI providers, giving you the flexibility to choose between cloud-based APIs and local inference. This page provides a complete reference of all available providers.

Provider Types

WebLLM supports two types of providers:

API Providers

API providers are cloud-based services that require API keys. They offer:

Premium model capabilities (Claude, GPT-4, etc.)
Fast response times with optimized infrastructure
Pay-per-use pricing
Regular model updates and improvements

Trade-offs: Requires internet connection and sends data to provider’s servers.

Local Providers

Local providers run AI models on your own computer. They offer:

Complete privacy - data never leaves your device
No per-request costs
Offline functionality
Full control over model selection

Trade-offs: Requires downloading models and uses local compute resources.

All Providers

API Providers

Cloud-based AI services that require API keys. Pay-per-use pricing with premium capabilities.

Access OpenAI's GPT models including GPT-4 and GPT-3.5. These models are excellent for creative writing, general knowledge tasks, and versatile applications.

Key Features

Wide range of capabilities
Fast response times
Excellent for creative writing
Strong general knowledge
Good at code generation

Pricing: paid

Pay-per-use pricing. New users often receive free credits.

Default Model: gpt-5-mini

Required Configuration:

API Key — Your OpenAI API key from platform.openai.com

Get API Key →

Access Claude, one of the most advanced AI models, via the Anthropic API. Claude excels at long conversations, careful instruction following, and complex reasoning tasks.

Key Features

State-of-the-art language understanding
Long context windows (up to 200K tokens)
Strong safety and alignment
Excellent at following instructions
Great for analysis and reasoning

Pricing: paid

Pay-per-use pricing. New users get $5 in free credits.

Default Model: claude-3-5-sonnet-20241022

Required Configuration:

API Key — Your Anthropic API key from console.anthropic.com

Get API Key →

Claude models via Google Cloud Vertex AI

Default Model: claude-3-5-sonnet-v2@20241022

Required Configuration:

GCP Project ID — Your Google Cloud project ID

Get API Key →

Gemini models via Google AI API

Default Model: gemini-2.0-flash-exp

Required Configuration:

API Key — Your Google AI API key from ai.google.dev

Get API Key →

Enterprise Gemini models via Google Cloud Vertex AI

Default Model: gemini-2.0-flash-exp

Required Configuration:

GCP Project ID — Your Google Cloud project ID

Get API Key →

Azure-hosted OpenAI models

Required Configuration:

API Key — Your Azure OpenAI API key
Resource Name — Your Azure OpenAI resource name

Get API Key →

AWS Bedrock models (Claude, Llama, Titan)

Required Configuration:

Get API Key →

Grok models by xAI

Default Model: grok-2

Required Configuration:

API Key — Your xAI API key

Get API Key →

Ultra-fast inference with browser search

Required Configuration:

API Key — Your Groq API key from console.groq.com

Get API Key →

Fast LLM inference platform

Required Configuration:

API Key — Your Fireworks API key

Get API Key →

Wide selection of open-source models

Required Configuration:

API Key — Your Together.ai API key

Get API Key →

Serverless AI inference on Cloudflare edge network

Required Configuration:

API Token — Your Cloudflare API token
Account ID — Your Cloudflare account ID

Get API Key →

Reasoning models with context caching

Default Model: deepseek-chat

Required Configuration:

API Key — Your DeepSeek API key from platform.deepseek.com

Get API Key →

Specialized hardware acceleration

Required Configuration:

API Key — Your Cerebras API key

Get API Key →

Cost-effective model hosting

Required Configuration:

API Key — Your DeepInfra API key

Get API Key →

Mistral Large and Codestral models

Default Model: mistral-large-latest

Required Configuration:

API Key — Your Mistral API key

Get API Key →

Command R+ models with RAG support

Required Configuration:

API Key — Your Cohere API key

Get API Key →

Search-augmented language models

Required Configuration:

API Key — Your Perplexity API key

Get API Key →

Community models and custom deployments

Required Configuration:

API Key — Your Replicate API token

Get API Key →

Model deployment platform

Required Configuration:

API Key — Your Baseten API key

Get API Key →

Inference API and serverless endpoints

Required Configuration:

API Key — Your Hugging Face API token

Get API Key →

OpenRouter provides unified access to hundreds of AI models from leading providers like Anthropic, Google, Meta, Mistral, and more. One API key for all models with transparent pricing and high availability.

Key Features

Access to 300+ models from multiple providers
Pay-as-you-go pricing with transparent costs
No monthly fees or commitments
Enterprise-grade infrastructure with automatic failover
Simple integration with standardized API
Immediate access to new models as they release

Pricing: paid

Pay-per-use pricing varies by model. Many free models available.

Default Model: anthropic/claude-3.5-sonnet

Required Configuration:

API Key — Your OpenRouter API key from openrouter.ai/keys

Get API Key →

AI gateway with observability, caching, and load balancing

Required Configuration:

API Key — Your Portkey API key from portkey.ai

Get API Key →

Access LLM inference powered by community contributors running Ollama and LM Studio. Free, open-source, and community-driven. Contributors share their local compute resources to create a decentralized inference network.

Key Features

Completely free community-powered inference
Multiple open-source models (Llama, Mistral, etc.)
Distributed architecture for high availability
Privacy-focused with optional request anonymization
Support the community by contributing your resources
No API keys required for basic usage

Pricing: free

Completely free. Powered by community contributions. Optional authenticated access for priority routing.

Default Model: auto

Required Configuration:

Get API Key →

Route requests through another WebLLM gateway instance. Enables federated inference across multiple gateway nodes, organization-hosted gateways, or fallback to other resources. Creates a true "web of computing nodes" where gateways can connect to each other.

Key Features

Federation between gateway instances
Organization-hosted private gateways
Fallback to other gateways when primary fails
Full streaming support
Token-gated or API-key authentication
Geographic routing to nearest gateway

Pricing: free

Cost depends on target gateway configuration. Self-hosted gateways are free.

Required Configuration:

Gateway URL — URL of the remote WebLLM gateway server

Get API Key →

Fast image and video generation

Required Configuration:

API Key — Your Fal API key

Get API Key →

FLUX image generation models

Required Configuration:

API Key — Your Black Forest Labs API key

Get API Key →

Video generation (Dream Machine)

Required Configuration:

API Key — Your Luma API key

Get API Key →

Voice synthesis and cloning

Required Configuration:

API Key — Your ElevenLabs API key

Get API Key →

Speech-to-text and transcription

Required Configuration:

API Key — Your AssemblyAI API key

Get API Key →

Real-time speech transcription

Required Configuration:

API Key — Your Deepgram API key

Get API Key →

Local Providers

Run AI models on your own computer. Completely free, private, and works offline.

Local models via LM Studio server

Configuration:

Host (text) — default: localhost
Port (number) — default: 1234

Learn More →

Local models via Ollama

Configuration:

Host (text) — default: localhost
Port (number) — default: 11434

Learn More →

Run Hugging Face Transformers models directly in the browser or server-side using WebGPU/WebAssembly. No API keys required, fully local inference with support for chat, vision, embeddings, and transcription.

Key Features

Runs fully in browser or Node.js
WebGPU acceleration for fast inference
Support for chat, vision, embeddings, transcription
Model download progress tracking
Web Worker support for off-main-thread execution
No API keys or cloud dependency

Pricing: free

Completely free. All computation happens locally on your device.

Configuration:

Model ID (text)
Device (text) — default: auto
Data Type (text) — default: auto

Learn More →

Run downloaded Transformers.js models locally in your browser with WebGPU acceleration. Models are cached for offline use and all inference happens on your device.

Key Features

Fully offline after model download
WebGPU acceleration
No API keys or cloud dependency
Privacy-first: data never leaves your device
Automatic model caching

Pricing: free

Completely free. All computation happens locally.

Configuration:

Model ID (text)

Learn More →

Use ComfyUI for powerful local image generation with custom node-based workflows. Supports Stable Diffusion, SDXL, Flux, and many other image models. Perfect for advanced users who want full control over their image generation pipeline.

Key Features

Custom node-based workflows
Supports SDXL, Flux, and more
ControlNet and LoRA support
Inpainting and outpainting
Multiple samplers and schedulers
Completely free and local

Pricing: free

Completely free. All computation happens locally on your device.

Configuration:

ComfyUI URL (url) — default: http://127.0.0.1:8188
Default Checkpoint (text)

Learn More →

AUTOMATIC1111 Stable Diffusion WebUI is the most popular and feature-rich interface for Stable Diffusion. Also works with Forge and SD.Next which use the same API. Supports txt2img, img2img, inpainting, ControlNet, LoRA, and many extensions.

Key Features

Most popular SD interface
Extensive extension ecosystem
ControlNet and LoRA support
Inpainting and outpainting
Upscaling and face restoration
Compatible with Forge and SD.Next
Completely free and local

Pricing: free

Completely free. All computation happens locally on your device.

Configuration:

Server URL (url) — default: http://127.0.0.1:7860
Default Checkpoint (text)

Learn More →

Fooocus is a streamlined image generation tool inspired by Midjourney. It features automatic prompt enhancement, style presets, and quality optimization. Great for users who want excellent results without complex settings.

Key Features

Automatic prompt enhancement
Style presets for easy customization
Built-in quality optimization
Simplified user interface
SDXL-based for high quality
Multiple performance modes
Completely free and local

Pricing: free

Completely free. All computation happens locally on your device.

Configuration:

Fooocus URL (url) — default: http://127.0.0.1:7865
Base Model (text)

Learn More →

InvokeAI is a professional-grade image generation suite with a node-based workflow system. Features a powerful canvas for inpainting, advanced model management, and a clean modern interface. Ideal for creative professionals.

Key Features

Node-based workflow system
Professional canvas with layers
Advanced inpainting tools
Unified model management
Batch processing support
Clean modern interface
Completely free and local

Pricing: free

Completely free. All computation happens locally on your device.

Configuration:

InvokeAI URL (url) — default: http://127.0.0.1:9090
Default Model (text)

Learn More →

Use Google's Gemini Nano model running directly in Chrome. Zero latency, completely private, and free. All inference happens on-device with no network requests. Requires Chrome 138+ with Chrome AI enabled.

Key Features

Runs entirely on-device in Chrome
Zero network latency
Completely free to use
Privacy-preserving - data never leaves device
Streaming support
System prompt support
No API keys required

Pricing: free

Completely free. All computation happens on-device in Chrome.

Learn More →

Choosing the Right Provider

For Maximum Privacy

Use local providers (LM Studio or Ollama) exclusively. Your data never leaves your computer.

For Best Quality

Use API providers (Anthropic or OpenAI) for access to state-of-the-art models with the best capabilities.

For Cost-Effectiveness

Use local providers for simple tasks and API providers only for complex tasks. Configure provider priority to try local first.

For Offline Work

Use local providers exclusively. Download models ahead of time for full offline functionality.

Provider Priority and Fallback

WebLLM allows you to configure multiple providers with automatic fallback:

Primary Provider - WebLLM tries this first
Fallback Providers - If the primary fails or can’t handle the request, WebLLM tries these in order
Automatic Selection - WebLLM can route requests based on task complexity

Example Configuration

// Users configure this in the extension UI:
Priority Order:
1. Ollama (local) - Try first for privacy and cost
2. Anthropic (Claude) - Fallback for complex tasks
3. OpenAI (GPT-4) - Final fallback

When a website uses WebLLM, the request automatically routes through this priority chain.

Configuration

All providers are configured through the WebLLM browser extension:

Install Extension - Get the WebLLM extension for your browser
Open Settings - Click the extension icon → Providers
Add Provider - Choose from the list and configure
Set Priority - Drag to reorder providers
Test Connection - Verify your configuration works

See the Configuring Providers guide for detailed setup instructions.

For Developers

As a developer using WebLLM, you don’t need to configure providers. Your users configure their preferred providers in the extension.

Your code stays simple:

import { generateText } from 'webllm';

// This works regardless of which provider the user configured
const result = await generateText({
  prompt: 'Summarize this article...',
});

The user’s configured providers handle the request automatically.

Provider Hints (Optional)

You can optionally provide hints to help WebLLM choose the best provider:

const result = await generateText({
  prompt: 'Write a novel chapter...',
  preferences: {
    priority: ['quality'], // Prefer high-quality models
    taskHints: {
      type: 'creative',
      complexity: 'high'
    }
  }
});

This helps WebLLM route to appropriate providers (e.g., premium APIs for complex creative tasks, local models for simple tasks).

Adding New Providers

WebLLM is extensible. New providers can be added by:

For Users

The extension supports OpenAI-compatible APIs, allowing you to connect to:

Custom OpenAI deployments
Third-party API providers
Self-hosted inference servers

For Contributors

See the Provider System documentation to learn how to contribute new provider integrations.

Provider Data Source

This documentation imports provider data directly from the source code (packages/server/src/providers/provider-registry.ts), ensuring it’s always up to date with the latest provider implementations.

Provider Reference

Provider Types

API Providers

Local Providers

All Providers

API Providers

OpenAI

Key Features

Anthropic

Key Features

Anthropic on Vertex AI

Google Generative AI

Google Vertex AI

Azure OpenAI

Amazon Bedrock

xAI Grok

Groq

Fireworks

Together.ai

Cloudflare Workers AI

DeepSeek

Cerebras

DeepInfra

Mistral AI

Cohere

Perplexity

Replicate

Baseten

Hugging Face

OpenRouter

Key Features

Portkey

Community Resource Pool

Key Features

WebLLM Gateway

Key Features

Fal

Black Forest Labs

Luma

ElevenLabs

AssemblyAI

Deepgram

Local Providers

LM Studio

Ollama

Transformers.js

Key Features

Local Models (WebGPU)

Key Features

ComfyUI

Key Features

AUTOMATIC1111 / Forge

Key Features

Fooocus

Key Features

InvokeAI

Key Features

Chrome AI (Gemini Nano)

Key Features

Choosing the Right Provider

For Maximum Privacy

For Best Quality

For Cost-Effectiveness

For Offline Work

Provider Priority and Fallback

Example Configuration

Configuration

For Developers

Provider Hints (Optional)

Adding New Providers

For Users

For Contributors

Provider Data Source