Skip to content

Desktop & Mobile Platform Support

While WebLLM starts in web browsers, our vision is to make it a universal protocol for AI access across all platforms. This page outlines our plans for desktop apps, mobile browsers, and native mobile applications.

┌─────────────────────────────────────────────────┐
│ WebLLM Protocol │
│ (Standardized AI Access Interface) │
└─────────────────────────────────────────────────┘
┌──────┴──────────────┬──────────────┬────────────┐
│ │ │ │
┌───▼────┐ ┌──────────▼──┐ ┌───────▼─────┐ ┌──▼────┐
│ Web │ │ Desktop │ │ Mobile │ │ IoT │
│Browser │ │ Apps │ │ Native Apps │ │ Edge │
└────────┘ └─────────────┘ └─────────────┘ └───────┘

Users configure AI once. Works everywhere.

Status: 📅 Planned Q2 2025

Electron apps can use WebLLM in two ways:

Option 1: Browser Extension (Available Now)

Section titled “Option 1: Browser Extension (Available Now)”

If users have the Chrome extension installed:

// In Electron renderer process
import { WebLLMClient } from 'webllm';
const client = new WebLLMClient();
const response = await client.generate({
prompt: 'Hello from Electron!',
});

Pros:

  • Works today
  • No additional setup
  • User controls configuration

Cons:

  • Requires extension installed
  • Limited to Chromium-based Electron

WebLLM daemon that Electron apps connect to:

// Future: No extension needed
import { WebLLM } from '@webllm/electron';
const client = await WebLLM.connect();
const response = await client.generate({
prompt: 'Hello from Electron!',
});

Architecture:

┌────────────────────┐
│ Electron App │
│ (Your App) │
└─────────┬──────────┘
│ IPC/WebSocket
┌─────────▼──────────┐
│ WebLLM Daemon │
│ (Background) │
│ - Providers │
│ - Models │
│ - API Keys │
└─────────┬──────────┘
┌──────┴──────┐
│ │
┌──▼───┐ ┌───▼────┐
│Local │ │ Cloud │
│Models│ │Providers│
└──────┘ └────────┘

Benefits:

  • No extension required
  • Works across all Electron apps
  • Shared model cache
  • Better performance

Status: 📅 Planned Q2 2025

Similar architecture to Electron:

// Tauri backend (Rust)
use webllm::Client;
#[tauri::command]
async fn generate_text(prompt: String) -> Result<String, String> {
let client = Client::connect().await?;
let response = client.generate(&prompt).await?;
Ok(response.text)
}
// Tauri frontend (JavaScript)
import { invoke } from '@tauri-apps/api';
const response = await invoke('generate_text', {
prompt: 'Hello from Tauri!',
});

Benefits:

  • Smaller bundle size than Electron
  • Native performance
  • Cross-platform (Windows, macOS, Linux)

Status: ✅ Reference Implementation Available (Node.js), OS-Native Planned Q3 2025

WebLLM includes a production-ready Node.js daemon that serves as a reference implementation of what operating system vendors will eventually build natively. The current daemon runs on localhost:54321 and demonstrates the architecture, protocols, and security model needed for native OS integration.

Current Node.js Daemon (Available Now):

The daemon validates the architecture that OS makers will implement:

Terminal window
# Start the daemon
npm run daemon
# Or in development mode with hot reload
npm run daemon:dev

Key Features:

  • Token-based authentication (Bearer tokens stored in ~/.webllm/daemon.token)
  • CORS protection with origin whitelisting
  • HTTP/SSE endpoints for requests and streaming
  • Configuration API for managing provider secrets
  • Provider priority and fallback system
  • Local model support (via WebGPU)
  • Progress tracking and usage statistics

See the Node Daemon README for setup instructions.

Operating system vendors (Apple, Microsoft, Linux distributions) will eventually implement WebLLM as a native system service. The current Node.js daemon serves as a reference for what they need to build.

What OS Makers Need to Implement:

A system service that provides WebLLM to all desktop apps:

macOS:

Terminal window
brew install webllm-daemon
webllm-daemon install

Windows:

Terminal window
winget install WebLLM.Daemon

Linux:

Terminal window
curl -fsSL https://webllm.org/install.sh | sh
sudo systemctl enable webllm-daemon

Unified Configuration:

  • One place to manage providers
  • API keys stored in OS keychain
  • Models downloaded once, shared by all apps

System Integration:

  • System tray icon
  • Notifications
  • Usage statistics
  • Update management

Developer API:

Multiple connection methods:

HTTP API:

fetch('http://localhost:8765/v1/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
prompt: 'Hello',
model: 'default',
}),
});

WebSocket:

const ws = new WebSocket('ws://localhost:8765/v1/stream');
ws.send(JSON.stringify({ prompt: 'Hello' }));
ws.onmessage = (msg) => console.log(msg.data);

Native Libraries:

Node.js
const { WebLLM } = require('@webllm/node');
const client = await WebLLM.connect();
# Python
from webllm import Client
client = Client.connect()
response = client.generate("Hello")
// Rust
use webllm::Client;
let client = Client::connect().await?;

Access Control:

  • Apps must request permission
  • User grants per-app access
  • Revocable anytime
  • Usage tracking

Sandboxing:

  • Daemon runs in isolated process
  • Limited system access
  • Secure IPC
  • Encrypted communication

When operating system vendors implement native WebLLM support, they will need to build components equivalent to the current Node.js daemon:

What to Implement:

  • Background service that starts at boot
  • Runs with limited privileges (not as root/admin)
  • Manages lifecycle (start, stop, restart, updates)
  • Crash recovery and automatic restart

Platform-Specific:

  • macOS: launchd service in /Library/LaunchDaemons/
  • Windows: Windows Service managed by Service Control Manager
  • Linux: systemd unit file in /etc/systemd/system/

Equivalent to: The Node daemon’s main process

What to Implement:

  • Per-application access tokens (like the daemon’s Bearer tokens)
  • Integration with OS permission system
  • Application identity verification
  • Token rotation and revocation
  • Audit logging

Platform-Specific:

  • macOS: TCC (Transparency, Consent, and Control) framework
  • Windows: AppContainer sandbox and capability checks
  • Linux: AppArmor or SELinux policies

Equivalent to: The daemon’s TokenManager and auth middleware

What to Implement:

  • Secure storage for API keys and tokens
  • Per-user credential isolation
  • Biometric authentication support
  • Encrypted at rest
  • Protected from process memory dumps

Platform-Specific:

  • macOS: Keychain Services API
  • Windows: Credential Manager and Data Protection API (DPAPI)
  • Linux: Secret Service API (libsecret) or kernel keyring

Equivalent to: The daemon’s token storage in ~/.webllm/daemon.token

What to Implement:

  • Secure message passing between apps and daemon
  • Request/response pattern (like HTTP)
  • Streaming support (like SSE)
  • Message authentication
  • Rate limiting and resource quotas

Platform-Specific:

  • macOS: XPC (inter-process communication framework)
  • Windows: Named Pipes or WinRT APIs
  • Linux: D-Bus or Unix domain sockets

Equivalent to: The daemon’s HTTP/SSE endpoints at localhost:54321

What to Implement:

  • Registry of AI providers (Anthropic, OpenAI, local models)
  • Configuration API for managing providers
  • Provider priority and fallback logic
  • Credential association (which app uses which provider)
  • Usage tracking and billing

Equivalent to: The daemon’s /config/providers endpoints and ProviderManager

What to Implement:

  • Centralized model storage and caching
  • Download management with progress tracking
  • Model verification (checksums, signatures)
  • Automatic updates
  • Storage cleanup and quota management

Platform-Specific:

  • macOS: ~/Library/Application Support/WebLLM/models/
  • Windows: %LOCALAPPDATA%\WebLLM\models\
  • Linux: ~/.local/share/webllm/models/

Equivalent to: The daemon’s LocalModelManager

What to Implement:

  • GPU detection and selection
  • Inference optimization
  • Memory management
  • Thermal and power management
  • Fallback to CPU when needed

Platform-Specific:

  • macOS: Metal Performance Shaders, Neural Engine (ANE)
  • Windows: DirectML, ONNX Runtime, NPU support
  • Linux: CUDA, ROCm, Vulkan Compute

Equivalent to: The daemon’s WebGPU integration for local models

What to Implement:

  • Language-specific SDKs (C/C++, Swift, C#, Python, etc.)
  • Platform conventions (async/await, callbacks, promises)
  • Error handling and logging
  • Documentation and samples

Platform Examples:

macOS (Swift):

import WebLLMKit
let client = WLMClient()
let response = try await client.generate(
prompt: "Hello",
provider: .default
)

Windows (C#):

using WebLLM;
var client = new LLMClient();
var response = await client.GenerateAsync(
prompt: "Hello",
provider: Provider.Default
);

Linux (C++):

#include <webllm/client.h>
webllm::Client client;
auto response = client.generate({
.prompt = "Hello",
.provider = webllm::Provider::Default
});

What to Implement:

  • System preferences/settings panel
  • Menu bar/system tray interface
  • Usage notifications
  • Privacy dashboard
  • System-wide shortcuts

Platform-Specific:

  • macOS: Settings bundle, menu bar extra, SwiftUI interfaces
  • Windows: Settings app integration, system tray icon, WinUI interfaces
  • Linux: GNOME Settings panel, KDE System Settings, GTK/Qt interfaces

Equivalent to: The extension’s side panel UI and settings

The current Node.js daemon provides several benefits to OS vendors:

  1. Working Example: Fully functional implementation to study
  2. Protocol Validation: Proven communication patterns and APIs
  3. Security Model: Tested authentication and authorization approach
  4. Performance Baseline: Benchmarks for native implementation comparison
  5. Developer Experience: Reference for SDK design
  6. Community Feedback: Real-world usage insights

Today (Node Daemon):

// Apps connect to localhost:54321
const response = await fetch('http://localhost:54321/api', {
headers: { Authorization: 'Bearer token' },
});

Future (Native OS API):

// Apps use native OS API
const response = await os.webllm.generate({
prompt: 'Hello',
});

The transition is transparent to end users - they just get better performance and tighter integration.

Status: 📅 Investigating

Challenges:

  • Safari doesn’t support Chrome extensions
  • Different extension architecture (Safari Web Extensions)
  • iOS restrictions on background processes

Approach 1: Safari Web Extension (Planned 2025)

Build WebLLM as Safari Web Extension:

// Similar API, different packaging
const client = new WebLLMClient();
const response = await client.generate({ prompt: 'Hello' });

Limitations:

  • iOS restrictions on local model size
  • Limited background execution
  • Must use iOS native APIs

Approach 2: iOS App + Safari Extension (Preferred)

Companion iOS app that:

  • Manages providers and models
  • Downloads and caches models
  • Safari extension connects to app
┌──────────────┐
│ Safari │
│ (Web Page) │
└──────┬───────┘
┌──────▼───────┐
│Safari Ext. │
└──────┬───────┘
│ XPC
┌──────▼───────┐
│ WebLLM iOS │
│ App │
│ - Providers │
│ - Models │
│ - API Keys │
└──────────────┘

Benefits:

  • Larger model support (app storage)
  • Background processing (app)
  • Better UX (native settings)
  • Works offline

Timeline: Q3-Q4 2025

Status: 📅 Planned Q2 2025

Chrome on Android supports extensions (limited):

Approach: Chrome Extension (Like desktop)

// Same API as desktop!
import { WebLLMClient } from 'webllm';
const client = new WebLLMClient();
const response = await client.generate({
prompt: 'Hello from Android!',
});

Considerations:

  • Mobile performance (smaller models)
  • Battery life (optimize inference)
  • Storage (model size limits)
  • Network (cellular data costs)

Optimizations:

  • Smaller quantized models
  • Battery-aware scheduling
  • WiFi-only downloads by default
  • Efficient inference

Timeline: Q2 2025

Firefox Mobile: Q3 2025 Edge Mobile: Q2 2025 (Chromium-based) Brave Mobile: Q2 2025 (Chromium-based)

Status: 📅 Planned Q4 2025

Native iOS integration:

import WebLLM
class ViewController: UIViewController {
let client = WebLLMClient()
func generateText() async {
do {
let response = try await client.generate(
prompt: "Hello from iOS!",
model: .default
)
print(response.text)
} catch {
print("Error: \(error)")
}
}
}

Architecture:

┌────────────────────┐
│ iOS App │
│ (Your App) │
└─────────┬──────────┘
┌─────────▼──────────┐
│ WebLLM.framework │
│ - Provider Mgmt │
│ - Local Inference │
│ - API Key Storage │
└─────────┬──────────┘
┌──────┴──────┐
│ │
┌──▼────┐ ┌───▼────┐
│Core ML│ │ Cloud │
│Models │ │ APIs │
└───────┘ └────────┘

Integration:

Keychain for API keys:

// API keys stored in iOS Keychain
client.addProvider(.anthropic, apiKey: key)

Core ML for local inference:

// Use Apple's ML framework
let config = InferenceConfig(
backend: .coreML,
useNeuralEngine: true
)

iCloud sync:

// Sync settings across devices
client.enableiCloudSync = true

Widgets:

// Use WebLLM in widgets
struct AIWidget: Widget {
var body: some WidgetConfiguration {
StaticConfiguration(kind: "AI") { entry in
AIWidgetView(client: WebLLMClient())
}
}
}

Status: 📅 Planned Q4 2025

Native Android integration:

import com.webllm.Client
class MainActivity : AppCompatActivity() {
private val client = WebLLMClient()
suspend fun generateText() {
val response = client.generate(
prompt = "Hello from Android!",
model = Model.DEFAULT
)
println(response.text)
}
}

Architecture:

┌────────────────────┐
│ Android App │
│ (Your App) │
└─────────┬──────────┘
┌─────────▼──────────┐
│ WebLLM Library │
│ (.aar) │
└─────────┬──────────┘
┌──────┴──────┐
│ │
┌──▼──────┐ ┌───▼────┐
│TensorFlow│ │ Cloud │
│Lite/NNAPI│ │ APIs │
└──────────┘ └────────┘

Integration:

Android Keystore:

// Secure key storage
client.addProvider(Provider.ANTHROPIC, apiKey)

TensorFlow Lite:

// Local inference
val config = InferenceConfig(
backend = Backend.TFLITE,
useNNAPI = true,
useGPU = true
)

WorkManager:

// Background tasks
val workRequest = OneTimeWorkRequestBuilder<ModelDownloadWorker>()
.build()
WorkManager.getInstance(context).enqueue(workRequest)

Users configure once, works everywhere:

iCloud (Apple ecosystem):

  • Settings sync via iCloud
  • API keys in Keychain
  • Downloaded models (wifi only)

Google account (Android/Chrome):

  • Settings sync via Google account
  • Encrypted API keys
  • Model preferences

Manual export/import:

  • Export configuration as file
  • Import on other devices
  • Secure transfer

Apps on the same device share models:

Desktop:

~/Library/Application Support/WebLLM/models/
└─ llama-3.2-1b/
└─ phi-3-mini/

Multiple apps use same models = save storage.

Mobile:

  • iOS: Shared app group
  • Android: Content provider

Apple Silicon optimizations:

  • Neural Engine for inference
  • Unified memory architecture
  • Metal Performance Shaders
  • Efficient Core/Performance Core usage

Integration:

  • Spotlight search with AI
  • Quick Actions in Finder
  • Shortcuts app integration
  • Menu bar utility

DirectML optimization:

  • GPU acceleration via DirectML
  • NPU support (AI PCs)
  • Windows ML integration

Integration:

  • PowerToys plugin
  • Context menu actions
  • Windows Terminal integration
  • Task bar utility

Flexibility:

  • Multiple GPU backends (CUDA, ROCm, Vulkan)
  • Wayland/X11 support
  • System tray integration

Integration:

  • GNOME extension
  • KDE plasmoid
  • Command-line tools
  • Systemd service

Battery optimization:

  • Background App Refresh awareness
  • Low Power Mode detection
  • Thermal management
  • Network-aware (WiFi vs cellular)

Integration:

  • Shortcuts app
  • Share sheet
  • Widgets
  • App Clips

Battery optimization:

  • Doze mode awareness
  • JobScheduler integration
  • Background limits compliance

Integration:

  • Quick Settings tile
  • Share menu
  • Widgets
  • Accessibility services

One SDK, all platforms:

Installation:

Terminal window
# Web
npm install webllm
# Node.js / Electron
npm install @webllm/node
# iOS (CocoaPods)
pod 'WebLLM'
# Android (Gradle)
implementation 'com.webllm:client:1.0.0'
# Python
pip install webllm

API Consistency:

Same concepts across platforms:

// JavaScript (Web/Node/Electron)
const client = new WebLLMClient();
const response = await client.generate({ prompt });
// Swift (iOS/macOS)
let client = WebLLMClient()
let response = try await client.generate(prompt: prompt)
// Kotlin (Android)
val client = WebLLMClient()
val response = client.generate(prompt)
# Python
client = WebLLMClient()
response = client.generate(prompt)

Different syntax, same semantics.

PlatformStatusTimelineNotes
Web Browsers✅ AvailableNowChrome extension
Electron📅 PlannedQ2 2025Via daemon
Tauri📅 PlannedQ2 2025Via daemon
Desktop Daemon📅 PlannedQ3 2025macOS, Windows, Linux
Android Chrome📅 PlannedQ2 2025Chrome extension
Firefox Mobile📅 PlannedQ3 2025Mobile extension
iOS Safari📅 InvestigatingQ3-Q4 2025App + extension
iOS SDK📅 PlannedQ4 2025Native framework
Android SDK📅 PlannedQ4 2025Native library

Code Editors:

  • AI autocomplete
  • Code explanation
  • Bug fixing
  • Documentation generation

Note-taking Apps:

  • Summarization
  • Text generation
  • Organization
  • Search enhancement

Email Clients:

  • Smart compose
  • Reply suggestions
  • Summarization
  • Translation

Reading Apps:

  • Article summarization
  • Translation
  • Text-to-speech
  • Comprehension help

Productivity Apps:

  • Task generation
  • Smart reminders
  • Note organization
  • Meeting summaries

Educational Apps:

  • Homework help
  • Explanations
  • Practice problems
  • Language learning

Models are large (1-10GB), mobile storage is limited.

Solutions:

  • Smaller quantized models (500MB-2GB)
  • On-demand downloading
  • Cloud storage options
  • WiFi-only by default
  • Model sharing between apps

AI inference is power-intensive.

Solutions:

  • Efficient models (optimized for mobile)
  • Battery-aware scheduling
  • Thermal management
  • Offload to cloud when on charger
  • User controls (low power mode)

Cloud API calls use data, expensive on cellular.

Solutions:

  • WiFi preference
  • Cellular data warnings
  • Local-first on cellular
  • Usage tracking
  • User limits

Each platform is different.

Solutions:

  • Abstract platform differences
  • Platform-specific optimizations
  • Consistent API surface
  • Thorough testing
  • Platform-specific docs

One protocol. Every platform. User control everywhere.