Skip to content

Provider Reference

WebLLM supports multiple AI providers, giving you the flexibility to choose between cloud-based APIs and local inference. This page provides a complete reference of all available providers.

WebLLM supports two types of providers:

API providers are cloud-based services that require API keys. They offer:

  • Premium model capabilities (Claude, GPT-4, etc.)
  • Fast response times with optimized infrastructure
  • Pay-per-use pricing
  • Regular model updates and improvements

Trade-offs: Requires internet connection and sends data to provider’s servers.

Local providers run AI models on your own computer. They offer:

  • Complete privacy - data never leaves your device
  • No per-request costs
  • Offline functionality
  • Full control over model selection

Trade-offs: Requires downloading models and uses local compute resources.

API Providers

Cloud-based AI services that require API keys. Pay-per-use pricing with premium capabilities.

OpenAI logo

OpenAI

API

Access OpenAI's GPT models including GPT-4 and GPT-3.5. These models are excellent for creative writing, general knowledge tasks, and versatile applications.

Key Features

  • Wide range of capabilities
  • Fast response times
  • Excellent for creative writing
  • Strong general knowledge
  • Good at code generation
Pricing: paid

Pay-per-use pricing. New users often receive free credits.

Default Model: gpt-5-mini
Required Configuration:
  • API Key — Your OpenAI API key from platform.openai.com
Get API Key →
Anthropic logo

Anthropic

API

Access Claude, one of the most advanced AI models, via the Anthropic API. Claude excels at long conversations, careful instruction following, and complex reasoning tasks.

Key Features

  • State-of-the-art language understanding
  • Long context windows (up to 200K tokens)
  • Strong safety and alignment
  • Excellent at following instructions
  • Great for analysis and reasoning
Pricing: paid

Pay-per-use pricing. New users get $5 in free credits.

Default Model: claude-3-5-sonnet-20241022
Required Configuration:
  • API Key — Your Anthropic API key from console.anthropic.com
Get API Key →
🤖

Anthropic on Vertex AI

API

Claude models via Google Cloud Vertex AI

Default Model: claude-3-5-sonnet-v2@20241022
Required Configuration:
  • GCP Project ID — Your Google Cloud project ID
Get API Key →

Google Generative AI

API

Gemini models via Google AI API

Default Model: gemini-2.0-flash-exp
Required Configuration:
  • API Key — Your Google AI API key from ai.google.dev
Get API Key →
☁️

Google Vertex AI

API

Enterprise Gemini models via Google Cloud Vertex AI

Default Model: gemini-2.0-flash-exp
Required Configuration:
  • GCP Project ID — Your Google Cloud project ID
Get API Key →
🔷

Azure OpenAI

API

Azure-hosted OpenAI models

Required Configuration:
  • API Key — Your Azure OpenAI API key
  • Resource Name — Your Azure OpenAI resource name
Get API Key →
🟧

Amazon Bedrock

API

AWS Bedrock models (Claude, Llama, Titan)

Required Configuration:
Get API Key →
🚀

xAI Grok

API

Grok models by xAI

Default Model: grok-2
Required Configuration:
  • API Key — Your xAI API key
Get API Key →

Groq

API

Ultra-fast inference with browser search

Required Configuration:
  • API Key — Your Groq API key from console.groq.com
Get API Key →
🎆

Fireworks

API

Fast LLM inference platform

Required Configuration:
  • API Key — Your Fireworks API key
Get API Key →
🤝

Together.ai

API

Wide selection of open-source models

Required Configuration:
  • API Key — Your Together.ai API key
Get API Key →
☁️

Cloudflare Workers AI

API

Serverless AI inference on Cloudflare edge network

Required Configuration:
  • API Token — Your Cloudflare API token
  • Account ID — Your Cloudflare account ID
Get API Key →
🔍

DeepSeek

API

Reasoning models with context caching

Default Model: deepseek-chat
Required Configuration:
  • API Key — Your DeepSeek API key from platform.deepseek.com
Get API Key →
🧠

Cerebras

API

Specialized hardware acceleration

Required Configuration:
  • API Key — Your Cerebras API key
Get API Key →
💰

DeepInfra

API

Cost-effective model hosting

Required Configuration:
  • API Key — Your DeepInfra API key
Get API Key →
🌫️

Mistral AI

API

Mistral Large and Codestral models

Default Model: mistral-large-latest
Required Configuration:
  • API Key — Your Mistral API key
Get API Key →
🔮

Cohere

API

Command R+ models with RAG support

Required Configuration:
  • API Key — Your Cohere API key
Get API Key →
🔎

Perplexity

API

Search-augmented language models

Required Configuration:
  • API Key — Your Perplexity API key
Get API Key →
🔁

Replicate

API

Community models and custom deployments

Required Configuration:
  • API Key — Your Replicate API token
Get API Key →
🏗️

Baseten

API

Model deployment platform

Required Configuration:
  • API Key — Your Baseten API key
Get API Key →
🤗

Hugging Face

API

Inference API and serverless endpoints

Required Configuration:
  • API Key — Your Hugging Face API token
Get API Key →
OpenRouter logo

OpenRouter

API

OpenRouter provides unified access to hundreds of AI models from leading providers like Anthropic, Google, Meta, Mistral, and more. One API key for all models with transparent pricing and high availability.

Key Features

  • Access to 300+ models from multiple providers
  • Pay-as-you-go pricing with transparent costs
  • No monthly fees or commitments
  • Enterprise-grade infrastructure with automatic failover
  • Simple integration with standardized API
  • Immediate access to new models as they release
Pricing: paid

Pay-per-use pricing varies by model. Many free models available.

Default Model: anthropic/claude-3.5-sonnet
Required Configuration:
  • API Key — Your OpenRouter API key from openrouter.ai/keys
Get API Key →
🔑

Portkey

API

AI gateway with observability, caching, and load balancing

Required Configuration:
  • API Key — Your Portkey API key from portkey.ai
Get API Key →
Community Resource Pool logo

Community Resource Pool

API

Access LLM inference powered by community contributors running Ollama and LM Studio. Free, open-source, and community-driven. Contributors share their local compute resources to create a decentralized inference network.

Key Features

  • Completely free community-powered inference
  • Multiple open-source models (Llama, Mistral, etc.)
  • Distributed architecture for high availability
  • Privacy-focused with optional request anonymization
  • Support the community by contributing your resources
  • No API keys required for basic usage
Pricing: free

Completely free. Powered by community contributions. Optional authenticated access for priority routing.

Default Model: auto
Required Configuration:
Get API Key →
🔗

WebLLM Gateway

API

Route requests through another WebLLM gateway instance. Enables federated inference across multiple gateway nodes, organization-hosted gateways, or fallback to other resources. Creates a true "web of computing nodes" where gateways can connect to each other.

Key Features

  • Federation between gateway instances
  • Organization-hosted private gateways
  • Fallback to other gateways when primary fails
  • Full streaming support
  • Token-gated or API-key authentication
  • Geographic routing to nearest gateway
Pricing: free

Cost depends on target gateway configuration. Self-hosted gateways are free.

Required Configuration:
  • Gateway URL — URL of the remote WebLLM gateway server
Get API Key →
🖼️

Fal

API

Fast image and video generation

Required Configuration:
  • API Key — Your Fal API key
Get API Key →
🌲

Black Forest Labs

API

FLUX image generation models

Required Configuration:
  • API Key — Your Black Forest Labs API key
Get API Key →
🎬

Luma

API

Video generation (Dream Machine)

Required Configuration:
  • API Key — Your Luma API key
Get API Key →
🎙️

ElevenLabs

API

Voice synthesis and cloning

Required Configuration:
  • API Key — Your ElevenLabs API key
Get API Key →
📝

AssemblyAI

API

Speech-to-text and transcription

Required Configuration:
  • API Key — Your AssemblyAI API key
Get API Key →
🎧

Deepgram

API

Real-time speech transcription

Required Configuration:
  • API Key — Your Deepgram API key
Get API Key →

Local Providers

Run AI models on your own computer. Completely free, private, and works offline.

💻

LM Studio

LOCAL

Local models via LM Studio server

Configuration:
  • Host (text) — default: localhost
  • Port (number) — default: 1234
Learn More →
🦙

Ollama

LOCAL

Local models via Ollama

Configuration:
  • Host (text) — default: localhost
  • Port (number) — default: 11434
Learn More →
Transformers.js logo

Transformers.js

LOCAL

Run Hugging Face Transformers models directly in the browser or server-side using WebGPU/WebAssembly. No API keys required, fully local inference with support for chat, vision, embeddings, and transcription.

Key Features

  • Runs fully in browser or Node.js
  • WebGPU acceleration for fast inference
  • Support for chat, vision, embeddings, transcription
  • Model download progress tracking
  • Web Worker support for off-main-thread execution
  • No API keys or cloud dependency
Pricing: free

Completely free. All computation happens locally on your device.

Configuration:
  • Model ID (text)
  • Device (text) — default: auto
  • Data Type (text) — default: auto
Learn More →
🧠

Local Models (WebGPU)

LOCAL

Run downloaded Transformers.js models locally in your browser with WebGPU acceleration. Models are cached for offline use and all inference happens on your device.

Key Features

  • Fully offline after model download
  • WebGPU acceleration
  • No API keys or cloud dependency
  • Privacy-first: data never leaves your device
  • Automatic model caching
Pricing: free

Completely free. All computation happens locally.

Configuration:
  • Model ID (text)
Learn More →
🎨

ComfyUI

LOCAL

Use ComfyUI for powerful local image generation with custom node-based workflows. Supports Stable Diffusion, SDXL, Flux, and many other image models. Perfect for advanced users who want full control over their image generation pipeline.

Key Features

  • Custom node-based workflows
  • Supports SDXL, Flux, and more
  • ControlNet and LoRA support
  • Inpainting and outpainting
  • Multiple samplers and schedulers
  • Completely free and local
Pricing: free

Completely free. All computation happens locally on your device.

Configuration:
  • ComfyUI URL (url) — default: http://127.0.0.1:8188
  • Default Checkpoint (text)
Learn More →
🖼️

AUTOMATIC1111 / Forge

LOCAL

AUTOMATIC1111 Stable Diffusion WebUI is the most popular and feature-rich interface for Stable Diffusion. Also works with Forge and SD.Next which use the same API. Supports txt2img, img2img, inpainting, ControlNet, LoRA, and many extensions.

Key Features

  • Most popular SD interface
  • Extensive extension ecosystem
  • ControlNet and LoRA support
  • Inpainting and outpainting
  • Upscaling and face restoration
  • Compatible with Forge and SD.Next
  • Completely free and local
Pricing: free

Completely free. All computation happens locally on your device.

Configuration:
  • Server URL (url) — default: http://127.0.0.1:7860
  • Default Checkpoint (text)
Learn More →

Fooocus

LOCAL

Fooocus is a streamlined image generation tool inspired by Midjourney. It features automatic prompt enhancement, style presets, and quality optimization. Great for users who want excellent results without complex settings.

Key Features

  • Automatic prompt enhancement
  • Style presets for easy customization
  • Built-in quality optimization
  • Simplified user interface
  • SDXL-based for high quality
  • Multiple performance modes
  • Completely free and local
Pricing: free

Completely free. All computation happens locally on your device.

Configuration:
  • Fooocus URL (url) — default: http://127.0.0.1:7865
  • Base Model (text)
Learn More →
🎭

InvokeAI

LOCAL

InvokeAI is a professional-grade image generation suite with a node-based workflow system. Features a powerful canvas for inpainting, advanced model management, and a clean modern interface. Ideal for creative professionals.

Key Features

  • Node-based workflow system
  • Professional canvas with layers
  • Advanced inpainting tools
  • Unified model management
  • Batch processing support
  • Clean modern interface
  • Completely free and local
Pricing: free

Completely free. All computation happens locally on your device.

Configuration:
  • InvokeAI URL (url) — default: http://127.0.0.1:9090
  • Default Model (text)
Learn More →
Chrome AI (Gemini Nano) logo

Chrome AI (Gemini Nano)

LOCAL

Use Google's Gemini Nano model running directly in Chrome. Zero latency, completely private, and free. All inference happens on-device with no network requests. Requires Chrome 138+ with Chrome AI enabled.

Key Features

  • Runs entirely on-device in Chrome
  • Zero network latency
  • Completely free to use
  • Privacy-preserving - data never leaves device
  • Streaming support
  • System prompt support
  • No API keys required
Pricing: free

Completely free. All computation happens on-device in Chrome.

Learn More →

Use local providers (LM Studio or Ollama) exclusively. Your data never leaves your computer.

Use API providers (Anthropic or OpenAI) for access to state-of-the-art models with the best capabilities.

Use local providers for simple tasks and API providers only for complex tasks. Configure provider priority to try local first.

Use local providers exclusively. Download models ahead of time for full offline functionality.

WebLLM allows you to configure multiple providers with automatic fallback:

  1. Primary Provider - WebLLM tries this first
  2. Fallback Providers - If the primary fails or can’t handle the request, WebLLM tries these in order
  3. Automatic Selection - WebLLM can route requests based on task complexity
// Users configure this in the extension UI:
Priority Order:
1. Ollama (local) - Try first for privacy and cost
2. Anthropic (Claude) - Fallback for complex tasks
3. OpenAI (GPT-4) - Final fallback

When a website uses WebLLM, the request automatically routes through this priority chain.

All providers are configured through the WebLLM browser extension:

  1. Install Extension - Get the WebLLM extension for your browser
  2. Open Settings - Click the extension icon → Providers
  3. Add Provider - Choose from the list and configure
  4. Set Priority - Drag to reorder providers
  5. Test Connection - Verify your configuration works

See the Configuring Providers guide for detailed setup instructions.

As a developer using WebLLM, you don’t need to configure providers. Your users configure their preferred providers in the extension.

Your code stays simple:

import { generateText } from 'webllm';
// This works regardless of which provider the user configured
const result = await generateText({
prompt: 'Summarize this article...',
});

The user’s configured providers handle the request automatically.

You can optionally provide hints to help WebLLM choose the best provider:

const result = await generateText({
prompt: 'Write a novel chapter...',
preferences: {
priority: ['quality'], // Prefer high-quality models
taskHints: {
type: 'creative',
complexity: 'high'
}
}
});

This helps WebLLM route to appropriate providers (e.g., premium APIs for complex creative tasks, local models for simple tasks).

WebLLM is extensible. New providers can be added by:

The extension supports OpenAI-compatible APIs, allowing you to connect to:

  • Custom OpenAI deployments
  • Third-party API providers
  • Self-hosted inference servers

See the Provider System documentation to learn how to contribute new provider integrations.

This documentation imports provider data directly from the source code (packages/server/src/providers/provider-registry.ts), ensuring it’s always up to date with the latest provider implementations.