Desktop & Mobile Platform Support

While WebLLM starts in web browsers, our vision is to make it a universal protocol for AI access across all platforms. This page outlines our plans for desktop apps, mobile browsers, and native mobile applications.

The Vision: One Protocol, Everywhere

┌─────────────────────────────────────────────────┐
│            WebLLM Protocol                      │
│     (Standardized AI Access Interface)          │
└─────────────────────────────────────────────────┘
           │
    ┌──────┴──────────────┬──────────────┬────────────┐
    │                     │              │            │
┌───▼────┐    ┌──────────▼──┐   ┌───────▼─────┐  ┌──▼────┐
│  Web   │    │   Desktop   │   │   Mobile    │  │  IoT  │
│Browser │    │    Apps     │   │ Native Apps │  │ Edge  │
└────────┘    └─────────────┘   └─────────────┘  └───────┘

Users configure AI once. Works everywhere.

Desktop Applications

Electron Apps

Status: 📅 Planned Q2 2025

Electron apps can use WebLLM in two ways:

Option 1: Browser Extension (Available Now)

If users have the Chrome extension installed:

// In Electron renderer process
import { WebLLMClient } from 'webllm';

const client = new WebLLMClient();
const response = await client.generate({
  prompt: 'Hello from Electron!',
});

Pros:

Works today
No additional setup
User controls configuration

Cons:

Requires extension installed
Limited to Chromium-based Electron

Option 2: Native Integration (Planned)

WebLLM daemon that Electron apps connect to:

// Future: No extension needed
import { WebLLM } from '@webllm/electron';

const client = await WebLLM.connect();
const response = await client.generate({
  prompt: 'Hello from Electron!',
});

Architecture:

┌────────────────────┐
│   Electron App     │
│   (Your App)       │
└─────────┬──────────┘
          │ IPC/WebSocket
┌─────────▼──────────┐
│  WebLLM Daemon     │
│  (Background)      │
│  - Providers       │
│  - Models          │
│  - API Keys        │
└─────────┬──────────┘
          │
   ┌──────┴──────┐
   │             │
┌──▼───┐    ┌───▼────┐
│Local │    │ Cloud  │
│Models│    │Providers│
└──────┘    └────────┘

Benefits:

No extension required
Works across all Electron apps
Shared model cache
Better performance

Tauri Apps

Status: 📅 Planned Q2 2025

Similar architecture to Electron:

// Tauri backend (Rust)
use webllm::Client;

#[tauri::command]
async fn generate_text(prompt: String) -> Result<String, String> {
    let client = Client::connect().await?;
    let response = client.generate(&prompt).await?;
    Ok(response.text)
}

// Tauri frontend (JavaScript)
import { invoke } from '@tauri-apps/api';

const response = await invoke('generate_text', {
  prompt: 'Hello from Tauri!',
});

Benefits:

Smaller bundle size than Electron
Native performance
Cross-platform (Windows, macOS, Linux)

WebLLM Daemon

Status: ✅ Reference Implementation Available (Node.js), OS-Native Planned Q3 2025

WebLLM includes a production-ready Node.js daemon that serves as a reference implementation of what operating system vendors will eventually build natively. The current daemon runs on localhost:54321 and demonstrates the architecture, protocols, and security model needed for native OS integration.

Current Node.js Daemon (Available Now):

The daemon validates the architecture that OS makers will implement:

# Start the daemon
npm run daemon

# Or in development mode with hot reload
npm run daemon:dev

Key Features:

Token-based authentication (Bearer tokens stored in ~/.webllm/daemon.token)
CORS protection with origin whitelisting
HTTP/SSE endpoints for requests and streaming
Configuration API for managing provider secrets
Provider priority and fallback system
Local model support (via WebGPU)
Progress tracking and usage statistics

See the Node Daemon README for setup instructions.

Native OS Daemon (Future)

Operating system vendors (Apple, Microsoft, Linux distributions) will eventually implement WebLLM as a native system service. The current Node.js daemon serves as a reference for what they need to build.

What OS Makers Need to Implement:

A system service that provides WebLLM to all desktop apps:

Installation

macOS:

brew install webllm-daemon
webllm-daemon install

Windows:

winget install WebLLM.Daemon

Linux:

curl -fsSL https://webllm.org/install.sh | sh
sudo systemctl enable webllm-daemon

Features

Unified Configuration:

One place to manage providers
API keys stored in OS keychain
Models downloaded once, shared by all apps

System Integration:

System tray icon
Notifications
Usage statistics
Update management

Developer API:

Multiple connection methods:

HTTP API:

fetch('http://localhost:8765/v1/generate', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    prompt: 'Hello',
    model: 'default',
  }),
});

WebSocket:

const ws = new WebSocket('ws://localhost:8765/v1/stream');
ws.send(JSON.stringify({ prompt: 'Hello' }));
ws.onmessage = (msg) => console.log(msg.data);

Native Libraries:

const { WebLLM } = require('@webllm/node');
const client = await WebLLM.connect();

# Python
from webllm import Client
client = Client.connect()
response = client.generate("Hello")

// Rust
use webllm::Client;
let client = Client::connect().await?;

Security

Access Control:

Apps must request permission
User grants per-app access
Revocable anytime
Usage tracking

Sandboxing:

Daemon runs in isolated process
Limited system access
Secure IPC
Encrypted communication

OS Implementation Requirements

When operating system vendors implement native WebLLM support, they will need to build components equivalent to the current Node.js daemon:

1. System Service / Daemon Process

What to Implement:

Background service that starts at boot
Runs with limited privileges (not as root/admin)
Manages lifecycle (start, stop, restart, updates)
Crash recovery and automatic restart

Platform-Specific:

macOS: launchd service in /Library/LaunchDaemons/
Windows: Windows Service managed by Service Control Manager
Linux: systemd unit file in /etc/systemd/system/

Equivalent to: The Node daemon’s main process

2. Authentication & Authorization

What to Implement:

Per-application access tokens (like the daemon’s Bearer tokens)
Integration with OS permission system
Application identity verification
Token rotation and revocation
Audit logging

Platform-Specific:

macOS: TCC (Transparency, Consent, and Control) framework
Windows: AppContainer sandbox and capability checks
Linux: AppArmor or SELinux policies

Equivalent to: The daemon’s TokenManager and auth middleware

3. Credential Storage

What to Implement:

Secure storage for API keys and tokens
Per-user credential isolation
Biometric authentication support
Encrypted at rest
Protected from process memory dumps

Platform-Specific:

macOS: Keychain Services API
Windows: Credential Manager and Data Protection API (DPAPI)
Linux: Secret Service API (libsecret) or kernel keyring

Equivalent to: The daemon’s token storage in ~/.webllm/daemon.token

4. Inter-Process Communication (IPC)

What to Implement:

Secure message passing between apps and daemon
Request/response pattern (like HTTP)
Streaming support (like SSE)
Message authentication
Rate limiting and resource quotas

Platform-Specific:

macOS: XPC (inter-process communication framework)
Windows: Named Pipes or WinRT APIs
Linux: D-Bus or Unix domain sockets

Equivalent to: The daemon’s HTTP/SSE endpoints at localhost:54321

5. Provider Management

What to Implement:

Registry of AI providers (Anthropic, OpenAI, local models)
Configuration API for managing providers
Provider priority and fallback logic
Credential association (which app uses which provider)
Usage tracking and billing

Equivalent to: The daemon’s /config/providers endpoints and ProviderManager

6. Model Management

What to Implement:

Centralized model storage and caching
Download management with progress tracking
Model verification (checksums, signatures)
Automatic updates
Storage cleanup and quota management

Platform-Specific:

macOS: ~/Library/Application Support/WebLLM/models/
Windows: %LOCALAPPDATA%\WebLLM\models\
Linux: ~/.local/share/webllm/models/

Equivalent to: The daemon’s LocalModelManager

7. Hardware Acceleration

What to Implement:

GPU detection and selection
Inference optimization
Memory management
Thermal and power management
Fallback to CPU when needed

Platform-Specific:

macOS: Metal Performance Shaders, Neural Engine (ANE)
Windows: DirectML, ONNX Runtime, NPU support
Linux: CUDA, ROCm, Vulkan Compute

Equivalent to: The daemon’s WebGPU integration for local models

8. Native APIs

What to Implement:

Language-specific SDKs (C/C++, Swift, C#, Python, etc.)
Platform conventions (async/await, callbacks, promises)
Error handling and logging
Documentation and samples

Platform Examples:

macOS (Swift):

import WebLLMKit

let client = WLMClient()
let response = try await client.generate(
    prompt: "Hello",
    provider: .default
)

Windows (C#):

using WebLLM;

var client = new LLMClient();
var response = await client.GenerateAsync(
    prompt: "Hello",
    provider: Provider.Default
);

Linux (C++):

#include <webllm/client.h>

webllm::Client client;
auto response = client.generate({
    .prompt = "Hello",
    .provider = webllm::Provider::Default
});

9. System Integration

What to Implement:

System preferences/settings panel
Menu bar/system tray interface
Usage notifications
Privacy dashboard
System-wide shortcuts

Platform-Specific:

macOS: Settings bundle, menu bar extra, SwiftUI interfaces
Windows: Settings app integration, system tray icon, WinUI interfaces
Linux: GNOME Settings panel, KDE System Settings, GTK/Qt interfaces

Equivalent to: The extension’s side panel UI and settings

Reference Implementation Benefits

The current Node.js daemon provides several benefits to OS vendors:

Working Example: Fully functional implementation to study
Protocol Validation: Proven communication patterns and APIs
Security Model: Tested authentication and authorization approach
Performance Baseline: Benchmarks for native implementation comparison
Developer Experience: Reference for SDK design
Community Feedback: Real-world usage insights

Migration Path

Today (Node Daemon):

// Apps connect to localhost:54321
const response = await fetch('http://localhost:54321/api', {
  headers: { Authorization: 'Bearer token' },
});

Future (Native OS API):

// Apps use native OS API
const response = await os.webllm.generate({
  prompt: 'Hello',
});

The transition is transparent to end users - they just get better performance and tighter integration.

Mobile Browsers

iOS Safari

Status: 📅 Investigating

Challenges:

Safari doesn’t support Chrome extensions
Different extension architecture (Safari Web Extensions)
iOS restrictions on background processes

Approach 1: Safari Web Extension (Planned 2025)

Build WebLLM as Safari Web Extension:

// Similar API, different packaging
const client = new WebLLMClient();
const response = await client.generate({ prompt: 'Hello' });

Limitations:

iOS restrictions on local model size
Limited background execution
Must use iOS native APIs

Approach 2: iOS App + Safari Extension (Preferred)

Companion iOS app that:

Manages providers and models
Downloads and caches models
Safari extension connects to app

┌──────────────┐
│ Safari       │
│ (Web Page)   │
└──────┬───────┘
       │
┌──────▼───────┐
│Safari Ext.   │
└──────┬───────┘
       │ XPC
┌──────▼───────┐
│ WebLLM iOS   │
│ App          │
│ - Providers  │
│ - Models     │
│ - API Keys   │
└──────────────┘

Benefits:

Larger model support (app storage)
Background processing (app)
Better UX (native settings)
Works offline

Timeline: Q3-Q4 2025

Android Chrome

Status: 📅 Planned Q2 2025

Chrome on Android supports extensions (limited):

Approach: Chrome Extension (Like desktop)

// Same API as desktop!
import { WebLLMClient } from 'webllm';

const client = new WebLLMClient();
const response = await client.generate({
  prompt: 'Hello from Android!',
});

Considerations:

Mobile performance (smaller models)
Battery life (optimize inference)
Storage (model size limits)
Network (cellular data costs)

Optimizations:

Smaller quantized models
Battery-aware scheduling
WiFi-only downloads by default
Efficient inference

Timeline: Q2 2025

Other Mobile Browsers

Firefox Mobile: Q3 2025 Edge Mobile: Q2 2025 (Chromium-based) Brave Mobile: Q2 2025 (Chromium-based)

Native Mobile Apps

iOS SDK

Status: 📅 Planned Q4 2025

Native iOS integration:

import WebLLM

class ViewController: UIViewController {
    let client = WebLLMClient()

    func generateText() async {
        do {
            let response = try await client.generate(
                prompt: "Hello from iOS!",
                model: .default
            )
            print(response.text)
        } catch {
            print("Error: \(error)")
        }
    }
}

Architecture:

┌────────────────────┐
│   iOS App          │
│   (Your App)       │
└─────────┬──────────┘
          │
┌─────────▼──────────┐
│  WebLLM.framework  │
│  - Provider Mgmt   │
│  - Local Inference │
│  - API Key Storage │
└─────────┬──────────┘
          │
   ┌──────┴──────┐
   │             │
┌──▼────┐   ┌───▼────┐
│Core ML│   │ Cloud  │
│Models │   │ APIs   │
└───────┘   └────────┘

Integration:

Keychain for API keys:

// API keys stored in iOS Keychain
client.addProvider(.anthropic, apiKey: key)

Core ML for local inference:

// Use Apple's ML framework
let config = InferenceConfig(
    backend: .coreML,
    useNeuralEngine: true
)

iCloud sync:

// Sync settings across devices
client.enableiCloudSync = true

Widgets:

// Use WebLLM in widgets
struct AIWidget: Widget {
    var body: some WidgetConfiguration {
        StaticConfiguration(kind: "AI") { entry in
            AIWidgetView(client: WebLLMClient())
        }
    }
}

Android SDK

Status: 📅 Planned Q4 2025

Native Android integration:

import com.webllm.Client

class MainActivity : AppCompatActivity() {
    private val client = WebLLMClient()

    suspend fun generateText() {
        val response = client.generate(
            prompt = "Hello from Android!",
            model = Model.DEFAULT
        )
        println(response.text)
    }
}

Architecture:

┌────────────────────┐
│  Android App       │
│  (Your App)        │
└─────────┬──────────┘
          │
┌─────────▼──────────┐
│  WebLLM Library    │
│  (.aar)            │
└─────────┬──────────┘
          │
   ┌──────┴──────┐
   │             │
┌──▼──────┐ ┌───▼────┐
│TensorFlow│ │ Cloud  │
│Lite/NNAPI│ │ APIs   │
└──────────┘ └────────┘

Integration:

Android Keystore:

// Secure key storage
client.addProvider(Provider.ANTHROPIC, apiKey)

TensorFlow Lite:

// Local inference
val config = InferenceConfig(
    backend = Backend.TFLITE,
    useNNAPI = true,
    useGPU = true
)

WorkManager:

// Background tasks
val workRequest = OneTimeWorkRequestBuilder<ModelDownloadWorker>()
    .build()
WorkManager.getInstance(context).enqueue(workRequest)

Sync Across Devices

Users configure once, works everywhere:

iCloud (Apple ecosystem):

Settings sync via iCloud
API keys in Keychain
Downloaded models (wifi only)

Google account (Android/Chrome):

Settings sync via Google account
Encrypted API keys
Model preferences

Manual export/import:

Export configuration as file
Import on other devices
Secure transfer

Shared Model Cache

Apps on the same device share models:

Desktop:

~/Library/Application Support/WebLLM/models/
  └─ llama-3.2-1b/
  └─ phi-3-mini/

Multiple apps use same models = save storage.

Mobile:

iOS: Shared app group
Android: Content provider

Platform-Specific Optimizations

macOS

Apple Silicon optimizations:

Neural Engine for inference
Unified memory architecture
Metal Performance Shaders
Efficient Core/Performance Core usage

Integration:

Spotlight search with AI
Quick Actions in Finder
Shortcuts app integration
Menu bar utility

Windows

DirectML optimization:

GPU acceleration via DirectML
NPU support (AI PCs)
Windows ML integration

Integration:

PowerToys plugin
Context menu actions
Windows Terminal integration
Task bar utility

Linux

Flexibility:

Multiple GPU backends (CUDA, ROCm, Vulkan)
Wayland/X11 support
System tray integration

Integration:

GNOME extension
KDE plasmoid
Command-line tools
Systemd service

iOS

Battery optimization:

Background App Refresh awareness
Low Power Mode detection
Thermal management
Network-aware (WiFi vs cellular)

Integration:

Shortcuts app
Share sheet
Widgets
App Clips

Android

Battery optimization:

Doze mode awareness
JobScheduler integration
Background limits compliance

Integration:

Quick Settings tile
Share menu
Widgets
Accessibility services

Developer Experience

Unified SDK

One SDK, all platforms:

Installation:

# Web
npm install webllm

# Node.js / Electron
npm install @webllm/node

# iOS (CocoaPods)
pod 'WebLLM'

# Android (Gradle)
implementation 'com.webllm:client:1.0.0'

# Python
pip install webllm

API Consistency:

Same concepts across platforms:

// JavaScript (Web/Node/Electron)
const client = new WebLLMClient();
const response = await client.generate({ prompt });

// Swift (iOS/macOS)
let client = WebLLMClient()
let response = try await client.generate(prompt: prompt)

// Kotlin (Android)
val client = WebLLMClient()
val response = client.generate(prompt)

# Python
client = WebLLMClient()
response = client.generate(prompt)

Different syntax, same semantics.

Timeline

Platform	Status	Timeline	Notes
Web Browsers	✅ Available	Now	Chrome extension
Electron	📅 Planned	Q2 2025	Via daemon
Tauri	📅 Planned	Q2 2025	Via daemon
Desktop Daemon	📅 Planned	Q3 2025	macOS, Windows, Linux
Android Chrome	📅 Planned	Q2 2025	Chrome extension
Firefox Mobile	📅 Planned	Q3 2025	Mobile extension
iOS Safari	📅 Investigating	Q3-Q4 2025	App + extension
iOS SDK	📅 Planned	Q4 2025	Native framework
Android SDK	📅 Planned	Q4 2025	Native library

Use Cases

Desktop Apps

Code Editors:

AI autocomplete
Code explanation
Bug fixing
Documentation generation

Note-taking Apps:

Summarization
Text generation
Organization
Search enhancement

Email Clients:

Smart compose
Reply suggestions
Summarization
Translation

Mobile Apps

Reading Apps:

Article summarization
Translation
Text-to-speech
Comprehension help

Productivity Apps:

Task generation
Smart reminders
Note organization
Meeting summaries

Educational Apps:

Homework help
Explanations
Practice problems
Language learning

Challenges & Solutions

Challenge: Storage on Mobile

Models are large (1-10GB), mobile storage is limited.

Solutions:

Smaller quantized models (500MB-2GB)
On-demand downloading
Cloud storage options
WiFi-only by default
Model sharing between apps

Challenge: Battery Life

AI inference is power-intensive.

Solutions:

Efficient models (optimized for mobile)
Battery-aware scheduling
Thermal management
Offload to cloud when on charger
User controls (low power mode)

Challenge: Network Costs

Cloud API calls use data, expensive on cellular.

Solutions:

WiFi preference
Cellular data warnings
Local-first on cellular
Usage tracking
User limits

Challenge: Platform Fragmentation

Each platform is different.

Solutions:

Abstract platform differences
Platform-specific optimizations
Consistent API surface
Thorough testing
Platform-specific docs

WebLLM Roadmap - Overall roadmap
Native Browser Integration - Browser standardization
Technical Specification - API details

One protocol. Every platform. User control everywhere.

Desktop & Mobile Platform Support

The Vision: One Protocol, Everywhere

Desktop Applications

Electron Apps

Option 1: Browser Extension (Available Now)

Option 2: Native Integration (Planned)

Tauri Apps

WebLLM Daemon

Native OS Daemon (Future)

Installation

Features

Security

OS Implementation Requirements

1. System Service / Daemon Process

2. Authentication & Authorization

3. Credential Storage

4. Inter-Process Communication (IPC)

5. Provider Management

6. Model Management

7. Hardware Acceleration

8. Native APIs

9. System Integration

Reference Implementation Benefits

Migration Path

Mobile Browsers

iOS Safari

Android Chrome

Other Mobile Browsers

Native Mobile Apps

iOS SDK

Android SDK

Cross-Platform Sharing

Sync Across Devices

Shared Model Cache

Platform-Specific Optimizations

macOS

Windows

Linux

iOS

Android

Developer Experience

Unified SDK

Timeline

Use Cases

Desktop Apps

Mobile Apps

Challenges & Solutions

Challenge: Storage on Mobile

Challenge: Battery Life

Challenge: Network Costs

Challenge: Platform Fragmentation

Related