Browser Extension Architecture

Overview

The WebLLM browser extension serves as a polyfill for the future native API, providing standardized AI access before browsers implement native support.

Extension Components

The popup provides user settings and configuration:

Settings
├── Data Retention
│   ├── Keep history: [7 days | 30 days | 90 days | Forever]
│   ├── Auto-delete after: [checkbox]
│   └── Clear all data now [button]
├── Provider Configuration
│   ├── Priority Order (drag-to-reorder)
│   │   1. 🟢 Local Model (Llama 3.2 1B)
│   │   2. 🔑 Anthropic API (configured)
│   │   3. 🔑 OpenAI API (not configured)
│   └── Add Provider [+]
├── Model Management
│   ├── Download Local Models
│   └── Installed Models
└── Permissions
    ├── Allowed Sites
    └── Blocked Sites

2. Content Script

Injected into every web page, provides the navigator.llm API:

// Injected into page context
navigator.llm = {
  async request(config) {
    return new Promise((resolve, reject) => {
      const messageId = crypto.randomUUID();

      // Send to background worker
      window.postMessage({
        type: 'WEBLLM_REQUEST',
        messageId,
        config,
        origin: window.location.origin
      }, '*');

      // Wait for response
      window.addEventListener('message', function handler(event) {
        if (event.data.type === 'WEBLLM_RESPONSE' &&
            event.data.messageId === messageId) {
          window.removeEventListener('message', handler);
          if (event.data.error) reject(event.data.error);
          else resolve(event.data.result);
        }
      });
    });
  }
};

3. Background Service Worker

The core orchestrator that:

Manages permissions
Routes requests to providers
Stores conversation history
Handles provider fallback
Shows usage notifications

class WebLLMService {
  async handleRequest(request, origin) {
    // 1. Check permissions
    const hasPermission = await this.permissionManager.check(origin);
    if (!hasPermission) {
      const granted = await this.requestPermission(origin);
      if (!granted) throw new Error('Permission denied');
    }

    // 2. Show notification
    this.notifyUsage(origin, request.action);

    // 3. Get provider
    const provider = await this.providerManager.getNextAvailable();

    // 4. Execute request
    const response = await provider.execute(request);

    // 5. Store in local DB
    await this.db.store({
      origin,
      request,
      response,
      timestamp: Date.now(),
      provider: provider.name
    });

    // 6. Schedule cleanup
    this.scheduleCleanup();

    return response;
  }
}

4. Local Database (IndexedDB)

Stores:

Conversations - Request/response history with expiration
Providers - User’s provider configurations
Permissions - Per-origin access grants
Settings - User preferences

{
  conversations: {
    id: string,
    origin: string,
    timestamp: number,
    messages: Message[],
    provider: string,
    expiresAt: number
  },

  providers: {
    id: string,
    type: 'api' | 'local',
    priority: number,
    config: {
      apiKey?: string,
      baseUrl?: string,
      modelPath?: string
    },
    enabled: boolean
  },

  permissions: {
    origin: string,
    granted: boolean,
    grantedAt: number,
    lastUsed: number
  },

  settings: {
    retentionDays: number,
    autoDelete: boolean,
    defaultProvider: string
  }
}

5. Provider Manager

Handles provider selection and fallback:

class ProviderManager {
  async getNextAvailable() {
    // Get providers by priority
    const providers = await db.providers
      .where('enabled').equals(true)
      .sortBy('priority');

    // Try each in order
    for (const provider of providers) {
      if (await this.isAvailable(provider)) {
        return this.instantiate(provider);
      }
    }

    throw new Error('No providers available');
  }

  async isAvailable(provider) {
    if (provider.type === 'local') {
      return await this.checkLocalModel(provider);
    } else {
      return provider.config.apiKey?.length > 0;
    }
  }
}

User Experience Flow

First Time Use

User visits a website that uses WebLLM
Website calls llm.summarize(...)
Extension shows permission prompt: “Allow example.com to use WebLLM?”
If no providers configured, shows setup wizard
User configures first provider (API key or local model)
Request proceeds

Subsequent Uses

Website calls API
Small notification appears: “example.com is using WebLLM”
Request processed automatically
Notification auto-dismisses after 5 seconds

Permission Notification Design

.webllm-notification {
  position: fixed;
  top: 16px;
  right: 16px;
  background: white;
  border: 1px solid #e0e0e0;
  border-radius: 8px;
  padding: 12px 16px;
  box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15);
  z-index: 999999;
  animation: slideIn 0.3s ease;
}

Local Model Support

The extension can run AI models locally using:

WebGPU - Hardware-accelerated inference in browser
WASM - CPU-based fallback
ONNX Runtime - Optimized model execution

Model Download Flow

User opens extension settings
Selects “Download Local Models”
Sees available models (e.g., “Llama 3.2 1B - 1.2GB”)
Clicks download, model downloads to IndexedDB
Model becomes available in provider list

Local Inference

class LocalModelProvider {
  constructor(config) {
    this.modelPath = config.modelPath;
    this.session = null;
  }

  async initialize() {
    // Load ONNX model
    this.session = await ort.InferenceSession.create(this.modelPath, {
      executionProviders: ['webgpu', 'wasm']
    });
  }

  async execute(request) {
    if (!this.session) await this.initialize();

    // Tokenize
    const tokens = await this.tokenize(request.prompt);

    // Run inference
    const outputs = await this.session.run({
      input_ids: new ort.Tensor('int64', tokens, [1, tokens.length])
    });

    // Decode
    const text = await this.decode(outputs.logits);

    return { content: text, usage: { inputTokens: tokens.length } };
  }
}

Security Considerations

Content Script Isolation

Runs in isolated world (no access to page globals)
Only exposes navigator.llm API
Cannot access page’s API keys or sensitive data

API Key Storage

Stored in encrypted IndexedDB
Only accessible to background worker
Never exposed to content scripts or web pages

Origin Permissions

Separate permissions per origin
Can revoke access anytime
Blocked sites list prevents abuse

Request Validation

Validate all inputs before processing
Rate limiting per origin
Maximum token limits
Timeout enforcement

Performance Optimization

Request Batching

Queue rapid requests
Batch when possible
Reduce API calls

Caching

Cache common requests (within retention period)
Deduplicate identical prompts
Cache model outputs

Lazy Loading

Load providers on demand
Initialize local models only when needed
Unload unused models after timeout

Next Steps

Learn about Provider Management
Understand Data & Privacy
Explore the Developer SDK