Browser Extension Architecture
Overview
Section titled “Overview”The WebLLM browser extension serves as a polyfill for the future native API, providing standardized AI access before browsers implement native support.
Extension Components
Section titled “Extension Components”1. Extension Popup UI
Section titled “1. Extension Popup UI”The popup provides user settings and configuration:
Settings├── Data Retention│ ├── Keep history: [7 days | 30 days | 90 days | Forever]│ ├── Auto-delete after: [checkbox]│ └── Clear all data now [button]├── Provider Configuration│ ├── Priority Order (drag-to-reorder)│ │ 1. 🟢 Local Model (Llama 3.2 1B)│ │ 2. 🔑 Anthropic API (configured)│ │ 3. 🔑 OpenAI API (not configured)│ └── Add Provider [+]├── Model Management│ ├── Download Local Models│ └── Installed Models└── Permissions ├── Allowed Sites └── Blocked Sites2. Content Script
Section titled “2. Content Script”Injected into every web page, provides the navigator.llm API:
// Injected into page contextnavigator.llm = { async request(config) { return new Promise((resolve, reject) => { const messageId = crypto.randomUUID();
// Send to background worker window.postMessage({ type: 'WEBLLM_REQUEST', messageId, config, origin: window.location.origin }, '*');
// Wait for response window.addEventListener('message', function handler(event) { if (event.data.type === 'WEBLLM_RESPONSE' && event.data.messageId === messageId) { window.removeEventListener('message', handler); if (event.data.error) reject(event.data.error); else resolve(event.data.result); } }); }); }};3. Background Service Worker
Section titled “3. Background Service Worker”The core orchestrator that:
- Manages permissions
- Routes requests to providers
- Stores conversation history
- Handles provider fallback
- Shows usage notifications
class WebLLMService { async handleRequest(request, origin) { // 1. Check permissions const hasPermission = await this.permissionManager.check(origin); if (!hasPermission) { const granted = await this.requestPermission(origin); if (!granted) throw new Error('Permission denied'); }
// 2. Show notification this.notifyUsage(origin, request.action);
// 3. Get provider const provider = await this.providerManager.getNextAvailable();
// 4. Execute request const response = await provider.execute(request);
// 5. Store in local DB await this.db.store({ origin, request, response, timestamp: Date.now(), provider: provider.name });
// 6. Schedule cleanup this.scheduleCleanup();
return response; }}4. Local Database (IndexedDB)
Section titled “4. Local Database (IndexedDB)”Stores:
- Conversations - Request/response history with expiration
- Providers - User’s provider configurations
- Permissions - Per-origin access grants
- Settings - User preferences
{ conversations: { id: string, origin: string, timestamp: number, messages: Message[], provider: string, expiresAt: number },
providers: { id: string, type: 'api' | 'local', priority: number, config: { apiKey?: string, baseUrl?: string, modelPath?: string }, enabled: boolean },
permissions: { origin: string, granted: boolean, grantedAt: number, lastUsed: number },
settings: { retentionDays: number, autoDelete: boolean, defaultProvider: string }}5. Provider Manager
Section titled “5. Provider Manager”Handles provider selection and fallback:
class ProviderManager { async getNextAvailable() { // Get providers by priority const providers = await db.providers .where('enabled').equals(true) .sortBy('priority');
// Try each in order for (const provider of providers) { if (await this.isAvailable(provider)) { return this.instantiate(provider); } }
throw new Error('No providers available'); }
async isAvailable(provider) { if (provider.type === 'local') { return await this.checkLocalModel(provider); } else { return provider.config.apiKey?.length > 0; } }}User Experience Flow
Section titled “User Experience Flow”First Time Use
Section titled “First Time Use”- User visits a website that uses WebLLM
- Website calls
llm.summarize(...) - Extension shows permission prompt: “Allow example.com to use WebLLM?”
- If no providers configured, shows setup wizard
- User configures first provider (API key or local model)
- Request proceeds
Subsequent Uses
Section titled “Subsequent Uses”- Website calls API
- Small notification appears: “example.com is using WebLLM”
- Request processed automatically
- Notification auto-dismisses after 5 seconds
Permission Notification Design
Section titled “Permission Notification Design”.webllm-notification { position: fixed; top: 16px; right: 16px; background: white; border: 1px solid #e0e0e0; border-radius: 8px; padding: 12px 16px; box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15); z-index: 999999; animation: slideIn 0.3s ease;}Local Model Support
Section titled “Local Model Support”The extension can run AI models locally using:
- WebGPU - Hardware-accelerated inference in browser
- WASM - CPU-based fallback
- ONNX Runtime - Optimized model execution
Model Download Flow
Section titled “Model Download Flow”- User opens extension settings
- Selects “Download Local Models”
- Sees available models (e.g., “Llama 3.2 1B - 1.2GB”)
- Clicks download, model downloads to IndexedDB
- Model becomes available in provider list
Local Inference
Section titled “Local Inference”class LocalModelProvider { constructor(config) { this.modelPath = config.modelPath; this.session = null; }
async initialize() { // Load ONNX model this.session = await ort.InferenceSession.create(this.modelPath, { executionProviders: ['webgpu', 'wasm'] }); }
async execute(request) { if (!this.session) await this.initialize();
// Tokenize const tokens = await this.tokenize(request.prompt);
// Run inference const outputs = await this.session.run({ input_ids: new ort.Tensor('int64', tokens, [1, tokens.length]) });
// Decode const text = await this.decode(outputs.logits);
return { content: text, usage: { inputTokens: tokens.length } }; }}Security Considerations
Section titled “Security Considerations”Content Script Isolation
Section titled “Content Script Isolation”- Runs in isolated world (no access to page globals)
- Only exposes
navigator.llmAPI - Cannot access page’s API keys or sensitive data
API Key Storage
Section titled “API Key Storage”- Stored in encrypted IndexedDB
- Only accessible to background worker
- Never exposed to content scripts or web pages
Origin Permissions
Section titled “Origin Permissions”- Separate permissions per origin
- Can revoke access anytime
- Blocked sites list prevents abuse
Request Validation
Section titled “Request Validation”- Validate all inputs before processing
- Rate limiting per origin
- Maximum token limits
- Timeout enforcement
Performance Optimization
Section titled “Performance Optimization”Request Batching
Section titled “Request Batching”- Queue rapid requests
- Batch when possible
- Reduce API calls
Caching
Section titled “Caching”- Cache common requests (within retention period)
- Deduplicate identical prompts
- Cache model outputs
Lazy Loading
Section titled “Lazy Loading”- Load providers on demand
- Initialize local models only when needed
- Unload unused models after timeout
Next Steps
Section titled “Next Steps”- Learn about Provider Management
- Understand Data & Privacy
- Explore the Developer SDK