Skip to content

Architecture Overview

WebLLM consists of several layers that work together to provide standardized AI access to web applications.

The protocol defines standard interfaces for AI interactions:

interface LLMProvider {
readonly name: string;
readonly version: string;
readonly capabilities: ModelCapabilities;
createSession(config: LLMSessionConfig): Promise<LLMSession>;
listModels(): Promise<ModelInfo[]>;
}
interface LLMSession {
generate(request: LLMRequest): Promise<LLMResponse>;
stream(request: LLMRequest): Promise<ReadableStream<LLMResponse>>;
abort(): void;
close(): void;
}
  • Authentication - API key management, OAuth flows
  • Transport - HTTP/2, WebSocket for streaming
  • Encryption - TLS 1.3 minimum, end-to-end encryption options
  • Privacy - Local-first processing, data retention policies

The extension bridges web applications to AI providers:

Web Page <-> Content Script <-> Background Worker
|
├─> API Providers
├─> Local Models (WASM/WebGPU)
└─> Native Messaging
  • Content Script - Injects navigator.llm API into pages
  • Background Service Worker - Orchestrates requests and manages providers
  • Settings UI - User configuration interface
  • Local Database - IndexedDB for conversation history and settings

Each provider exposes its capabilities:

interface ModelCapabilities {
maxTokens: number;
supportedModalities: string[]; // 'text', 'image', 'audio'
supportsStreaming: boolean;
supportsTools: boolean;
pricing?: ModelPricing;
}
interface ModelInfo {
id: string;
provider: string;
capabilities: ModelCapabilities;
location: 'cloud' | 'local' | 'hybrid';
}

Integrated with browser permissions API:

// Request permission
const permission = await navigator.permissions.query({
name: 'llm',
provider: 'anthropic',
purpose: 'chat',
});
if (permission.state === 'granted') {
const provider = await navigator.llm.getProvider('anthropic');
}
  1. Web application calls WebLLM API
  2. Content script intercepts and validates request
  3. Background worker checks permissions
  4. Provider manager selects appropriate provider based on priority
  5. Provider executes request (API call or local inference)
  6. Response flows back through the chain
  7. Local DB stores interaction (based on retention policy)
// 1. Developer calls API
const summary = await llm.summarize(articleText);
// 2. Content script receives request
window.postMessage({
type: 'WEBLLM_REQUEST',
action: 'summarize',
input: articleText,
});
// 3. Background worker processes
// - Checks permission for origin
// - Shows usage notification
// - Selects provider (local > API key > fallback)
// - Executes request
// - Stores in local DB
// - Returns result
// 4. Result returned to developer
console.log(summary); // "This article discusses..."

Users configure provider priority:

  1. Local Model (Llama 3.2 1B) - Free, private, fast
  2. Personal API Key (Anthropic) - User’s account
  3. Fallback Provider (OpenAI) - Application-provided

If local model fails (out of memory, model not downloaded), automatically fall back to next provider.

All providers implement the same interface:

class AnthropicProvider {
async execute(request) {
const response = await fetch('https://api.anthropic.com/v1/messages', {
method: 'POST',
headers: {
'x-api-key': this.apiKey,
'anthropic-version': '2023-06-01',
},
body: JSON.stringify({
model: request.model,
messages: request.messages,
max_tokens: request.maxTokens,
}),
});
return this.normalize(await response.json());
}
normalize(response) {
// Convert to standard format
return {
content: response.content[0].text,
usage: {
inputTokens: response.usage.input_tokens,
outputTokens: response.usage.output_tokens,
},
};
}
}
  • Each origin has separate permissions
  • API keys never exposed to web pages
  • Content scripts run in isolated world
  • First use - User prompted for permission
  • Notification - Visual indicator when AI is used
  • Revocation - Users can revoke access anytime
  • Granular control - Allow/block specific actions or models
  • Local storage - Encrypted IndexedDB
  • Retention policies - Auto-delete after N days
  • User control - Clear all data anytime
  • No tracking - No data sent to extension developer

Current: Node Daemon Architecture (Available Now)

Section titled “Current: Node Daemon Architecture (Available Now)”

WebLLM now includes a production-ready Node.js daemon that validates the architecture browsers and operating systems will eventually implement natively:

┌─────────────────────────────────────────────────────────┐
│ Browser (Client) │
│ @webllm/client (Auto-detects daemon or extension) │
└────────────────────┬────────────────────────────────────┘
│ HTTP/SSE + Bearer Token
┌─────────────────────────────────────────────────────────┐
│ WebLLM Daemon (localhost:54321) │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Token Authentication + CORS Protection │ │
│ └─────────────────┬─────────────────────────────────┘ │
│ ┌─────────────────▼─────────────────────────────────┐ │
│ │ HTTP/SSE Endpoints + Config Management │ │
│ └─────────────────┬─────────────────────────────────┘ │
│ ┌─────────────────▼─────────────────────────────────┐ │
│ │ LLMServer - Provider Management & Orchestration │ │
│ └───────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

Key Features:

  • Token-based Authentication: Cryptographically secure Bearer tokens
  • CORS Protection: Configurable origin whitelist with wildcard support
  • Configuration API: Secure endpoints for managing provider secrets
  • Transport Fallback: Client automatically tries daemon first, falls back to extension
  • Port 54321: High port range reduces conflicts with development servers

See the Node Daemon README for setup instructions.

This daemon architecture demonstrates what browsers and operating systems will need to implement natively. The WebLLM daemon serves as a reference implementation of the protocols that browser vendors and OS makers will eventually build into their platforms.

What Browser Makers Need to Implement:

  1. Daemon Process Management

    • Background service similar to extension service workers
    • Runs independently of any specific tab or window
    • Survives browser restarts and updates
    • Secure IPC mechanism for browser-to-daemon communication
  2. Authentication & Security Layer

    • Token generation and validation (like the current daemon)
    • Integration with browser’s credential manager
    • Per-origin permissions and access control
    • Secure storage for API keys and tokens
  3. Provider Management System

    • Configuration API for managing AI providers
    • Secure storage for provider credentials
    • Provider priority and fallback logic
    • Local model download and caching
  4. Communication Protocols

    • HTTP/SSE endpoints for requests (or equivalent IPC)
    • Streaming support for real-time responses
    • Progress tracking for long operations
    • Resource quotas and rate limiting
  5. Native API Surface

    • navigator.llm API implementation
    • Permission prompts integrated with browser UI
    • Settings UI for provider configuration
    • Usage tracking and privacy dashboard

What OS Makers Need to Implement:

  1. System-Level Daemon Service

    • Runs as operating system service (systemd, launchd, Windows Service)
    • Accessible to all applications, not just browsers
    • Unified configuration across all apps
    • Secure inter-process communication
  2. Credential Management

    • Integration with OS keychain/credential manager
    • Secure API key storage (Keychain on macOS, Credential Manager on Windows)
    • Per-application access permissions
    • Biometric authentication support
  3. Model Management

    • Centralized model storage and caching
    • Download management with progress tracking
    • Automatic updates for models
    • Storage optimization and cleanup
  4. System Integration

    • System tray/menu bar interface
    • Native settings panel
    • Usage statistics and monitoring
    • Power management and resource limits
  5. Platform APIs

    • Native SDKs (Objective-C/Swift for Apple, C++/C# for Windows)
    • Language bindings (Python, Java, JavaScript)
    • Consistent API across platforms
    • Hardware acceleration support (Neural Engine, DirectML, CUDA)

Benefits of Native Implementation:

  • No Installation Required: Works out-of-the-box, no extension needed
  • Better Performance: Direct access to system resources and GPU
  • Enhanced Security: OS-level credential storage and sandboxing
  • Unified Experience: One configuration for all apps on the system
  • Lower Overhead: Native implementation vs. JavaScript runtime
  • Platform Integration: Native UI, notifications, and system features

The current Node.js daemon validates this architecture and serves as a working example that browser and OS vendors can reference when implementing native support.

  • Extension base structure
  • IndexedDB schema
  • Settings UI
  • API key management
  • Single provider support
  • Content script injection
  • Permission system
  • Basic summarization
  • Provider priority system
  • Local model support
  • Provider fallback
  • Extended task types
  • SDK packages
  • Vercel AI adapter
  • Documentation
  • Example applications
  • W3C proposal
  • Browser vendor discussions
  • Security audit
  • Origin trials