Architecture Overview
System Architecture
Section titled “System Architecture”WebLLM consists of several layers that work together to provide standardized AI access to web applications.
Core Components
Section titled “Core Components”1. Protocol Specification Layer
Section titled “1. Protocol Specification Layer”The protocol defines standard interfaces for AI interactions:
interface LLMProvider { readonly name: string; readonly version: string; readonly capabilities: ModelCapabilities;
createSession(config: LLMSessionConfig): Promise<LLMSession>; listModels(): Promise<ModelInfo[]>;}
interface LLMSession { generate(request: LLMRequest): Promise<LLMResponse>; stream(request: LLMRequest): Promise<ReadableStream<LLMResponse>>; abort(): void; close(): void;}2. Transport & Security Layer
Section titled “2. Transport & Security Layer”- Authentication - API key management, OAuth flows
- Transport - HTTP/2, WebSocket for streaming
- Encryption - TLS 1.3 minimum, end-to-end encryption options
- Privacy - Local-first processing, data retention policies
3. Browser Extension (Phase 1)
Section titled “3. Browser Extension (Phase 1)”The extension bridges web applications to AI providers:
Web Page <-> Content Script <-> Background Worker | ├─> API Providers ├─> Local Models (WASM/WebGPU) └─> Native MessagingExtension Components
Section titled “Extension Components”- Content Script - Injects
navigator.llmAPI into pages - Background Service Worker - Orchestrates requests and manages providers
- Settings UI - User configuration interface
- Local Database - IndexedDB for conversation history and settings
4. Model Discovery & Capabilities
Section titled “4. Model Discovery & Capabilities”Each provider exposes its capabilities:
interface ModelCapabilities { maxTokens: number; supportedModalities: string[]; // 'text', 'image', 'audio' supportsStreaming: boolean; supportsTools: boolean; pricing?: ModelPricing;}
interface ModelInfo { id: string; provider: string; capabilities: ModelCapabilities; location: 'cloud' | 'local' | 'hybrid';}5. Permission Model
Section titled “5. Permission Model”Integrated with browser permissions API:
// Request permissionconst permission = await navigator.permissions.query({ name: 'llm', provider: 'anthropic', purpose: 'chat',});
if (permission.state === 'granted') { const provider = await navigator.llm.getProvider('anthropic');}Data Flow
Section titled “Data Flow”Request Flow
Section titled “Request Flow”- Web application calls WebLLM API
- Content script intercepts and validates request
- Background worker checks permissions
- Provider manager selects appropriate provider based on priority
- Provider executes request (API call or local inference)
- Response flows back through the chain
- Local DB stores interaction (based on retention policy)
Example: Summarization Request
Section titled “Example: Summarization Request”// 1. Developer calls APIconst summary = await llm.summarize(articleText);
// 2. Content script receives requestwindow.postMessage({ type: 'WEBLLM_REQUEST', action: 'summarize', input: articleText,});
// 3. Background worker processes// - Checks permission for origin// - Shows usage notification// - Selects provider (local > API key > fallback)// - Executes request// - Stores in local DB// - Returns result
// 4. Result returned to developerconsole.log(summary); // "This article discusses..."Provider Management
Section titled “Provider Management”Priority System
Section titled “Priority System”Users configure provider priority:
- Local Model (Llama 3.2 1B) - Free, private, fast
- Personal API Key (Anthropic) - User’s account
- Fallback Provider (OpenAI) - Application-provided
If local model fails (out of memory, model not downloaded), automatically fall back to next provider.
Provider Interface
Section titled “Provider Interface”All providers implement the same interface:
class AnthropicProvider { async execute(request) { const response = await fetch('https://api.anthropic.com/v1/messages', { method: 'POST', headers: { 'x-api-key': this.apiKey, 'anthropic-version': '2023-06-01', }, body: JSON.stringify({ model: request.model, messages: request.messages, max_tokens: request.maxTokens, }), }); return this.normalize(await response.json()); }
normalize(response) { // Convert to standard format return { content: response.content[0].text, usage: { inputTokens: response.usage.input_tokens, outputTokens: response.usage.output_tokens, }, }; }}Security Architecture
Section titled “Security Architecture”Origin Isolation
Section titled “Origin Isolation”- Each origin has separate permissions
- API keys never exposed to web pages
- Content scripts run in isolated world
Permission System
Section titled “Permission System”- First use - User prompted for permission
- Notification - Visual indicator when AI is used
- Revocation - Users can revoke access anytime
- Granular control - Allow/block specific actions or models
Data Protection
Section titled “Data Protection”- Local storage - Encrypted IndexedDB
- Retention policies - Auto-delete after N days
- User control - Clear all data anytime
- No tracking - No data sent to extension developer
Future: Native Browser Integration
Section titled “Future: Native Browser Integration”Current: Node Daemon Architecture (Available Now)
Section titled “Current: Node Daemon Architecture (Available Now)”WebLLM now includes a production-ready Node.js daemon that validates the architecture browsers and operating systems will eventually implement natively:
┌─────────────────────────────────────────────────────────┐│ Browser (Client) ││ @webllm/client (Auto-detects daemon or extension) │└────────────────────┬────────────────────────────────────┘ │ HTTP/SSE + Bearer Token ▼┌─────────────────────────────────────────────────────────┐│ WebLLM Daemon (localhost:54321) ││ ┌───────────────────────────────────────────────────┐ ││ │ Token Authentication + CORS Protection │ ││ └─────────────────┬─────────────────────────────────┘ ││ ┌─────────────────▼─────────────────────────────────┐ ││ │ HTTP/SSE Endpoints + Config Management │ ││ └─────────────────┬─────────────────────────────────┘ ││ ┌─────────────────▼─────────────────────────────────┐ ││ │ LLMServer - Provider Management & Orchestration │ ││ └───────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────┘Key Features:
- Token-based Authentication: Cryptographically secure Bearer tokens
- CORS Protection: Configurable origin whitelist with wildcard support
- Configuration API: Secure endpoints for managing provider secrets
- Transport Fallback: Client automatically tries daemon first, falls back to extension
- Port 54321: High port range reduces conflicts with development servers
See the Node Daemon README for setup instructions.
Next: Browser & OS Implementation
Section titled “Next: Browser & OS Implementation”This daemon architecture demonstrates what browsers and operating systems will need to implement natively. The WebLLM daemon serves as a reference implementation of the protocols that browser vendors and OS makers will eventually build into their platforms.
What Browser Makers Need to Implement:
-
Daemon Process Management
- Background service similar to extension service workers
- Runs independently of any specific tab or window
- Survives browser restarts and updates
- Secure IPC mechanism for browser-to-daemon communication
-
Authentication & Security Layer
- Token generation and validation (like the current daemon)
- Integration with browser’s credential manager
- Per-origin permissions and access control
- Secure storage for API keys and tokens
-
Provider Management System
- Configuration API for managing AI providers
- Secure storage for provider credentials
- Provider priority and fallback logic
- Local model download and caching
-
Communication Protocols
- HTTP/SSE endpoints for requests (or equivalent IPC)
- Streaming support for real-time responses
- Progress tracking for long operations
- Resource quotas and rate limiting
-
Native API Surface
navigator.llmAPI implementation- Permission prompts integrated with browser UI
- Settings UI for provider configuration
- Usage tracking and privacy dashboard
What OS Makers Need to Implement:
-
System-Level Daemon Service
- Runs as operating system service (systemd, launchd, Windows Service)
- Accessible to all applications, not just browsers
- Unified configuration across all apps
- Secure inter-process communication
-
Credential Management
- Integration with OS keychain/credential manager
- Secure API key storage (Keychain on macOS, Credential Manager on Windows)
- Per-application access permissions
- Biometric authentication support
-
Model Management
- Centralized model storage and caching
- Download management with progress tracking
- Automatic updates for models
- Storage optimization and cleanup
-
System Integration
- System tray/menu bar interface
- Native settings panel
- Usage statistics and monitoring
- Power management and resource limits
-
Platform APIs
- Native SDKs (Objective-C/Swift for Apple, C++/C# for Windows)
- Language bindings (Python, Java, JavaScript)
- Consistent API across platforms
- Hardware acceleration support (Neural Engine, DirectML, CUDA)
Benefits of Native Implementation:
- No Installation Required: Works out-of-the-box, no extension needed
- Better Performance: Direct access to system resources and GPU
- Enhanced Security: OS-level credential storage and sandboxing
- Unified Experience: One configuration for all apps on the system
- Lower Overhead: Native implementation vs. JavaScript runtime
- Platform Integration: Native UI, notifications, and system features
The current Node.js daemon validates this architecture and serves as a working example that browser and OS vendors can reference when implementing native support.
Implementation Phases
Section titled “Implementation Phases”Phase 0: Foundation (Current)
Section titled “Phase 0: Foundation (Current)”- Extension base structure
- IndexedDB schema
- Settings UI
- API key management
Phase 1: MVP
Section titled “Phase 1: MVP”- Single provider support
- Content script injection
- Permission system
- Basic summarization
Phase 2: Multi-Provider
Section titled “Phase 2: Multi-Provider”- Provider priority system
- Local model support
- Provider fallback
- Extended task types
Phase 3: Developer Adoption
Section titled “Phase 3: Developer Adoption”- SDK packages
- Vercel AI adapter
- Documentation
- Example applications
Phase 4: Standardization
Section titled “Phase 4: Standardization”- W3C proposal
- Browser vendor discussions
- Security audit
- Origin trials
Learn More
Section titled “Learn More”- Browser Extension Architecture - Deep dive into Phase 1 implementation
- Provider Management - How provider selection works
- Data & Privacy - Data handling and retention