Architecture Overview

System Architecture

WebLLM consists of several layers that work together to provide standardized AI access to web applications.

Core Components

1. Protocol Specification Layer

The protocol defines standard interfaces for AI interactions:

interface LLMProvider {
  readonly name: string;
  readonly version: string;
  readonly capabilities: ModelCapabilities;

  createSession(config: LLMSessionConfig): Promise<LLMSession>;
  listModels(): Promise<ModelInfo[]>;
}

interface LLMSession {
  generate(request: LLMRequest): Promise<LLMResponse>;
  stream(request: LLMRequest): Promise<ReadableStream<LLMResponse>>;
  abort(): void;
  close(): void;
}

2. Transport & Security Layer

Authentication - API key management, OAuth flows
Transport - HTTP/2, WebSocket for streaming
Encryption - TLS 1.3 minimum, end-to-end encryption options
Privacy - Local-first processing, data retention policies

3. Browser Extension (Phase 1)

The extension bridges web applications to AI providers:

Web Page <-> Content Script <-> Background Worker
                                      |
                                      ├─> API Providers
                                      ├─> Local Models (WASM/WebGPU)
                                      └─> Native Messaging

Extension Components

Content Script - Injects navigator.llm API into pages
Background Service Worker - Orchestrates requests and manages providers
Settings UI - User configuration interface
Local Database - IndexedDB for conversation history and settings

4. Model Discovery & Capabilities

Each provider exposes its capabilities:

interface ModelCapabilities {
  maxTokens: number;
  supportedModalities: string[]; // 'text', 'image', 'audio'
  supportsStreaming: boolean;
  supportsTools: boolean;
  pricing?: ModelPricing;
}

interface ModelInfo {
  id: string;
  provider: string;
  capabilities: ModelCapabilities;
  location: 'cloud' | 'local' | 'hybrid';
}

5. Permission Model

Integrated with browser permissions API:

// Request permission
const permission = await navigator.permissions.query({
  name: 'llm',
  provider: 'anthropic',
  purpose: 'chat',
});

if (permission.state === 'granted') {
  const provider = await navigator.llm.getProvider('anthropic');
}

Data Flow

Request Flow

Web application calls WebLLM API
Content script intercepts and validates request
Background worker checks permissions
Provider manager selects appropriate provider based on priority
Provider executes request (API call or local inference)
Response flows back through the chain
Local DB stores interaction (based on retention policy)

Example: Summarization Request

// 1. Developer calls API
const summary = await llm.summarize(articleText);

// 2. Content script receives request
window.postMessage({
  type: 'WEBLLM_REQUEST',
  action: 'summarize',
  input: articleText,
});

// 3. Background worker processes
// - Checks permission for origin
// - Shows usage notification
// - Selects provider (local > API key > fallback)
// - Executes request
// - Stores in local DB
// - Returns result

// 4. Result returned to developer
console.log(summary); // "This article discusses..."

Provider Management

Priority System

Users configure provider priority:

Local Model (Llama 3.2 1B) - Free, private, fast
Personal API Key (Anthropic) - User’s account
Fallback Provider (OpenAI) - Application-provided

If local model fails (out of memory, model not downloaded), automatically fall back to next provider.

Provider Interface

All providers implement the same interface:

class AnthropicProvider {
  async execute(request) {
    const response = await fetch('https://api.anthropic.com/v1/messages', {
      method: 'POST',
      headers: {
        'x-api-key': this.apiKey,
        'anthropic-version': '2023-06-01',
      },
      body: JSON.stringify({
        model: request.model,
        messages: request.messages,
        max_tokens: request.maxTokens,
      }),
    });
    return this.normalize(await response.json());
  }

  normalize(response) {
    // Convert to standard format
    return {
      content: response.content[0].text,
      usage: {
        inputTokens: response.usage.input_tokens,
        outputTokens: response.usage.output_tokens,
      },
    };
  }
}

Security Architecture

Origin Isolation

Each origin has separate permissions
API keys never exposed to web pages
Content scripts run in isolated world

Permission System

First use - User prompted for permission
Notification - Visual indicator when AI is used
Revocation - Users can revoke access anytime
Granular control - Allow/block specific actions or models

Data Protection

Local storage - Encrypted IndexedDB
Retention policies - Auto-delete after N days
User control - Clear all data anytime
No tracking - No data sent to extension developer

Future: Native Browser Integration

Current: Node Daemon Architecture (Available Now)

WebLLM now includes a production-ready Node.js daemon that validates the architecture browsers and operating systems will eventually implement natively:

┌─────────────────────────────────────────────────────────┐
│                    Browser (Client)                     │
│  @webllm/client (Auto-detects daemon or extension)     │
└────────────────────┬────────────────────────────────────┘
                     │ HTTP/SSE + Bearer Token
                     ▼
┌─────────────────────────────────────────────────────────┐
│              WebLLM Daemon (localhost:54321)            │
│  ┌───────────────────────────────────────────────────┐  │
│  │ Token Authentication + CORS Protection           │  │
│  └─────────────────┬─────────────────────────────────┘  │
│  ┌─────────────────▼─────────────────────────────────┐  │
│  │ HTTP/SSE Endpoints + Config Management           │  │
│  └─────────────────┬─────────────────────────────────┘  │
│  ┌─────────────────▼─────────────────────────────────┐  │
│  │ LLMServer - Provider Management & Orchestration   │  │
│  └───────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

Key Features:

Token-based Authentication: Cryptographically secure Bearer tokens
CORS Protection: Configurable origin whitelist with wildcard support
Configuration API: Secure endpoints for managing provider secrets
Transport Fallback: Client automatically tries daemon first, falls back to extension
Port 54321: High port range reduces conflicts with development servers

See the Node Daemon README for setup instructions.

Next: Browser & OS Implementation

This daemon architecture demonstrates what browsers and operating systems will need to implement natively. The WebLLM daemon serves as a reference implementation of the protocols that browser vendors and OS makers will eventually build into their platforms.

What Browser Makers Need to Implement:

Daemon Process Management
- Background service similar to extension service workers
- Runs independently of any specific tab or window
- Survives browser restarts and updates
- Secure IPC mechanism for browser-to-daemon communication
Authentication & Security Layer
- Token generation and validation (like the current daemon)
- Integration with browser’s credential manager
- Per-origin permissions and access control
- Secure storage for API keys and tokens
Provider Management System
- Configuration API for managing AI providers
- Secure storage for provider credentials
- Provider priority and fallback logic
- Local model download and caching
Communication Protocols
- HTTP/SSE endpoints for requests (or equivalent IPC)
- Streaming support for real-time responses
- Progress tracking for long operations
- Resource quotas and rate limiting
Native API Surface
- navigator.llm API implementation
- Permission prompts integrated with browser UI
- Settings UI for provider configuration
- Usage tracking and privacy dashboard

What OS Makers Need to Implement:

System-Level Daemon Service
- Runs as operating system service (systemd, launchd, Windows Service)
- Accessible to all applications, not just browsers
- Unified configuration across all apps
- Secure inter-process communication
Credential Management
- Integration with OS keychain/credential manager
- Secure API key storage (Keychain on macOS, Credential Manager on Windows)
- Per-application access permissions
- Biometric authentication support
Model Management
- Centralized model storage and caching
- Download management with progress tracking
- Automatic updates for models
- Storage optimization and cleanup
System Integration
- System tray/menu bar interface
- Native settings panel
- Usage statistics and monitoring
- Power management and resource limits
Platform APIs
- Native SDKs (Objective-C/Swift for Apple, C++/C# for Windows)
- Language bindings (Python, Java, JavaScript)
- Consistent API across platforms
- Hardware acceleration support (Neural Engine, DirectML, CUDA)

Benefits of Native Implementation:

No Installation Required: Works out-of-the-box, no extension needed
Better Performance: Direct access to system resources and GPU
Enhanced Security: OS-level credential storage and sandboxing
Unified Experience: One configuration for all apps on the system
Lower Overhead: Native implementation vs. JavaScript runtime
Platform Integration: Native UI, notifications, and system features

The current Node.js daemon validates this architecture and serves as a working example that browser and OS vendors can reference when implementing native support.

Implementation Phases

Phase 0: Foundation (Current)

Extension base structure
IndexedDB schema
Settings UI
API key management

Phase 1: MVP

Single provider support
Content script injection
Permission system
Basic summarization

Phase 2: Multi-Provider

Provider priority system
Local model support
Provider fallback
Extended task types

Phase 3: Developer Adoption

SDK packages
Vercel AI adapter
Documentation
Example applications

Phase 4: Standardization

W3C proposal
Browser vendor discussions
Security audit
Origin trials

Learn More

Browser Extension Architecture - Deep dive into Phase 1 implementation
Provider Management - How provider selection works
Data & Privacy - Data handling and retention