Desktop & Mobile Platform Support
While WebLLM starts in web browsers, our vision is to make it a universal protocol for AI access across all platforms. This page outlines our plans for desktop apps, mobile browsers, and native mobile applications.
The Vision: One Protocol, Everywhere
Section titled “The Vision: One Protocol, Everywhere”┌─────────────────────────────────────────────────┐│ WebLLM Protocol ││ (Standardized AI Access Interface) │└─────────────────────────────────────────────────┘ │ ┌──────┴──────────────┬──────────────┬────────────┐ │ │ │ │┌───▼────┐ ┌──────────▼──┐ ┌───────▼─────┐ ┌──▼────┐│ Web │ │ Desktop │ │ Mobile │ │ IoT ││Browser │ │ Apps │ │ Native Apps │ │ Edge │└────────┘ └─────────────┘ └─────────────┘ └───────┘Users configure AI once. Works everywhere.
Desktop Applications
Section titled “Desktop Applications”Electron Apps
Section titled “Electron Apps”Status: 📅 Planned Q2 2025
Electron apps can use WebLLM in two ways:
Option 1: Browser Extension (Available Now)
Section titled “Option 1: Browser Extension (Available Now)”If users have the Chrome extension installed:
// In Electron renderer processimport { WebLLMClient } from 'webllm';
const client = new WebLLMClient();const response = await client.generate({ prompt: 'Hello from Electron!',});Pros:
- Works today
- No additional setup
- User controls configuration
Cons:
- Requires extension installed
- Limited to Chromium-based Electron
Option 2: Native Integration (Planned)
Section titled “Option 2: Native Integration (Planned)”WebLLM daemon that Electron apps connect to:
// Future: No extension neededimport { WebLLM } from '@webllm/electron';
const client = await WebLLM.connect();const response = await client.generate({ prompt: 'Hello from Electron!',});Architecture:
┌────────────────────┐│ Electron App ││ (Your App) │└─────────┬──────────┘ │ IPC/WebSocket┌─────────▼──────────┐│ WebLLM Daemon ││ (Background) ││ - Providers ││ - Models ││ - API Keys │└─────────┬──────────┘ │ ┌──────┴──────┐ │ │┌──▼───┐ ┌───▼────┐│Local │ │ Cloud ││Models│ │Providers│└──────┘ └────────┘Benefits:
- No extension required
- Works across all Electron apps
- Shared model cache
- Better performance
Tauri Apps
Section titled “Tauri Apps”Status: 📅 Planned Q2 2025
Similar architecture to Electron:
// Tauri backend (Rust)use webllm::Client;
#[tauri::command]async fn generate_text(prompt: String) -> Result<String, String> { let client = Client::connect().await?; let response = client.generate(&prompt).await?; Ok(response.text)}// Tauri frontend (JavaScript)import { invoke } from '@tauri-apps/api';
const response = await invoke('generate_text', { prompt: 'Hello from Tauri!',});Benefits:
- Smaller bundle size than Electron
- Native performance
- Cross-platform (Windows, macOS, Linux)
WebLLM Daemon
Section titled “WebLLM Daemon”Status: ✅ Reference Implementation Available (Node.js), OS-Native Planned Q3 2025
WebLLM includes a production-ready Node.js daemon that serves as a reference implementation of what operating system vendors will eventually build natively. The current daemon runs on localhost:54321 and demonstrates the architecture, protocols, and security model needed for native OS integration.
Current Node.js Daemon (Available Now):
The daemon validates the architecture that OS makers will implement:
# Start the daemonnpm run daemon
# Or in development mode with hot reloadnpm run daemon:devKey Features:
- Token-based authentication (Bearer tokens stored in
~/.webllm/daemon.token) - CORS protection with origin whitelisting
- HTTP/SSE endpoints for requests and streaming
- Configuration API for managing provider secrets
- Provider priority and fallback system
- Local model support (via WebGPU)
- Progress tracking and usage statistics
See the Node Daemon README for setup instructions.
Native OS Daemon (Future)
Section titled “Native OS Daemon (Future)”Operating system vendors (Apple, Microsoft, Linux distributions) will eventually implement WebLLM as a native system service. The current Node.js daemon serves as a reference for what they need to build.
What OS Makers Need to Implement:
A system service that provides WebLLM to all desktop apps:
Installation
Section titled “Installation”macOS:
brew install webllm-daemonwebllm-daemon installWindows:
winget install WebLLM.DaemonLinux:
curl -fsSL https://webllm.org/install.sh | shsudo systemctl enable webllm-daemonFeatures
Section titled “Features”Unified Configuration:
- One place to manage providers
- API keys stored in OS keychain
- Models downloaded once, shared by all apps
System Integration:
- System tray icon
- Notifications
- Usage statistics
- Update management
Developer API:
Multiple connection methods:
HTTP API:
fetch('http://localhost:8765/v1/generate', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ prompt: 'Hello', model: 'default', }),});WebSocket:
const ws = new WebSocket('ws://localhost:8765/v1/stream');ws.send(JSON.stringify({ prompt: 'Hello' }));ws.onmessage = (msg) => console.log(msg.data);Native Libraries:
const { WebLLM } = require('@webllm/node');const client = await WebLLM.connect();# Pythonfrom webllm import Clientclient = Client.connect()response = client.generate("Hello")// Rustuse webllm::Client;let client = Client::connect().await?;Security
Section titled “Security”Access Control:
- Apps must request permission
- User grants per-app access
- Revocable anytime
- Usage tracking
Sandboxing:
- Daemon runs in isolated process
- Limited system access
- Secure IPC
- Encrypted communication
OS Implementation Requirements
Section titled “OS Implementation Requirements”When operating system vendors implement native WebLLM support, they will need to build components equivalent to the current Node.js daemon:
1. System Service / Daemon Process
Section titled “1. System Service / Daemon Process”What to Implement:
- Background service that starts at boot
- Runs with limited privileges (not as root/admin)
- Manages lifecycle (start, stop, restart, updates)
- Crash recovery and automatic restart
Platform-Specific:
- macOS: launchd service in
/Library/LaunchDaemons/ - Windows: Windows Service managed by Service Control Manager
- Linux: systemd unit file in
/etc/systemd/system/
Equivalent to: The Node daemon’s main process
2. Authentication & Authorization
Section titled “2. Authentication & Authorization”What to Implement:
- Per-application access tokens (like the daemon’s Bearer tokens)
- Integration with OS permission system
- Application identity verification
- Token rotation and revocation
- Audit logging
Platform-Specific:
- macOS: TCC (Transparency, Consent, and Control) framework
- Windows: AppContainer sandbox and capability checks
- Linux: AppArmor or SELinux policies
Equivalent to: The daemon’s TokenManager and auth middleware
3. Credential Storage
Section titled “3. Credential Storage”What to Implement:
- Secure storage for API keys and tokens
- Per-user credential isolation
- Biometric authentication support
- Encrypted at rest
- Protected from process memory dumps
Platform-Specific:
- macOS: Keychain Services API
- Windows: Credential Manager and Data Protection API (DPAPI)
- Linux: Secret Service API (libsecret) or kernel keyring
Equivalent to: The daemon’s token storage in ~/.webllm/daemon.token
4. Inter-Process Communication (IPC)
Section titled “4. Inter-Process Communication (IPC)”What to Implement:
- Secure message passing between apps and daemon
- Request/response pattern (like HTTP)
- Streaming support (like SSE)
- Message authentication
- Rate limiting and resource quotas
Platform-Specific:
- macOS: XPC (inter-process communication framework)
- Windows: Named Pipes or WinRT APIs
- Linux: D-Bus or Unix domain sockets
Equivalent to: The daemon’s HTTP/SSE endpoints at localhost:54321
5. Provider Management
Section titled “5. Provider Management”What to Implement:
- Registry of AI providers (Anthropic, OpenAI, local models)
- Configuration API for managing providers
- Provider priority and fallback logic
- Credential association (which app uses which provider)
- Usage tracking and billing
Equivalent to: The daemon’s /config/providers endpoints and ProviderManager
6. Model Management
Section titled “6. Model Management”What to Implement:
- Centralized model storage and caching
- Download management with progress tracking
- Model verification (checksums, signatures)
- Automatic updates
- Storage cleanup and quota management
Platform-Specific:
- macOS:
~/Library/Application Support/WebLLM/models/ - Windows:
%LOCALAPPDATA%\WebLLM\models\ - Linux:
~/.local/share/webllm/models/
Equivalent to: The daemon’s LocalModelManager
7. Hardware Acceleration
Section titled “7. Hardware Acceleration”What to Implement:
- GPU detection and selection
- Inference optimization
- Memory management
- Thermal and power management
- Fallback to CPU when needed
Platform-Specific:
- macOS: Metal Performance Shaders, Neural Engine (ANE)
- Windows: DirectML, ONNX Runtime, NPU support
- Linux: CUDA, ROCm, Vulkan Compute
Equivalent to: The daemon’s WebGPU integration for local models
8. Native APIs
Section titled “8. Native APIs”What to Implement:
- Language-specific SDKs (C/C++, Swift, C#, Python, etc.)
- Platform conventions (async/await, callbacks, promises)
- Error handling and logging
- Documentation and samples
Platform Examples:
macOS (Swift):
import WebLLMKit
let client = WLMClient()let response = try await client.generate( prompt: "Hello", provider: .default)Windows (C#):
using WebLLM;
var client = new LLMClient();var response = await client.GenerateAsync( prompt: "Hello", provider: Provider.Default);Linux (C++):
#include <webllm/client.h>
webllm::Client client;auto response = client.generate({ .prompt = "Hello", .provider = webllm::Provider::Default});9. System Integration
Section titled “9. System Integration”What to Implement:
- System preferences/settings panel
- Menu bar/system tray interface
- Usage notifications
- Privacy dashboard
- System-wide shortcuts
Platform-Specific:
- macOS: Settings bundle, menu bar extra, SwiftUI interfaces
- Windows: Settings app integration, system tray icon, WinUI interfaces
- Linux: GNOME Settings panel, KDE System Settings, GTK/Qt interfaces
Equivalent to: The extension’s side panel UI and settings
Reference Implementation Benefits
Section titled “Reference Implementation Benefits”The current Node.js daemon provides several benefits to OS vendors:
- Working Example: Fully functional implementation to study
- Protocol Validation: Proven communication patterns and APIs
- Security Model: Tested authentication and authorization approach
- Performance Baseline: Benchmarks for native implementation comparison
- Developer Experience: Reference for SDK design
- Community Feedback: Real-world usage insights
Migration Path
Section titled “Migration Path”Today (Node Daemon):
// Apps connect to localhost:54321const response = await fetch('http://localhost:54321/api', { headers: { Authorization: 'Bearer token' },});Future (Native OS API):
// Apps use native OS APIconst response = await os.webllm.generate({ prompt: 'Hello',});The transition is transparent to end users - they just get better performance and tighter integration.
Mobile Browsers
Section titled “Mobile Browsers”iOS Safari
Section titled “iOS Safari”Status: 📅 Investigating
Challenges:
- Safari doesn’t support Chrome extensions
- Different extension architecture (Safari Web Extensions)
- iOS restrictions on background processes
Approach 1: Safari Web Extension (Planned 2025)
Build WebLLM as Safari Web Extension:
// Similar API, different packagingconst client = new WebLLMClient();const response = await client.generate({ prompt: 'Hello' });Limitations:
- iOS restrictions on local model size
- Limited background execution
- Must use iOS native APIs
Approach 2: iOS App + Safari Extension (Preferred)
Companion iOS app that:
- Manages providers and models
- Downloads and caches models
- Safari extension connects to app
┌──────────────┐│ Safari ││ (Web Page) │└──────┬───────┘ │┌──────▼───────┐│Safari Ext. │└──────┬───────┘ │ XPC┌──────▼───────┐│ WebLLM iOS ││ App ││ - Providers ││ - Models ││ - API Keys │└──────────────┘Benefits:
- Larger model support (app storage)
- Background processing (app)
- Better UX (native settings)
- Works offline
Timeline: Q3-Q4 2025
Android Chrome
Section titled “Android Chrome”Status: 📅 Planned Q2 2025
Chrome on Android supports extensions (limited):
Approach: Chrome Extension (Like desktop)
// Same API as desktop!import { WebLLMClient } from 'webllm';
const client = new WebLLMClient();const response = await client.generate({ prompt: 'Hello from Android!',});Considerations:
- Mobile performance (smaller models)
- Battery life (optimize inference)
- Storage (model size limits)
- Network (cellular data costs)
Optimizations:
- Smaller quantized models
- Battery-aware scheduling
- WiFi-only downloads by default
- Efficient inference
Timeline: Q2 2025
Other Mobile Browsers
Section titled “Other Mobile Browsers”Firefox Mobile: Q3 2025 Edge Mobile: Q2 2025 (Chromium-based) Brave Mobile: Q2 2025 (Chromium-based)
Native Mobile Apps
Section titled “Native Mobile Apps”iOS SDK
Section titled “iOS SDK”Status: 📅 Planned Q4 2025
Native iOS integration:
import WebLLM
class ViewController: UIViewController { let client = WebLLMClient()
func generateText() async { do { let response = try await client.generate( prompt: "Hello from iOS!", model: .default ) print(response.text) } catch { print("Error: \(error)") } }}Architecture:
┌────────────────────┐│ iOS App ││ (Your App) │└─────────┬──────────┘ │┌─────────▼──────────┐│ WebLLM.framework ││ - Provider Mgmt ││ - Local Inference ││ - API Key Storage │└─────────┬──────────┘ │ ┌──────┴──────┐ │ │┌──▼────┐ ┌───▼────┐│Core ML│ │ Cloud ││Models │ │ APIs │└───────┘ └────────┘Integration:
Keychain for API keys:
// API keys stored in iOS Keychainclient.addProvider(.anthropic, apiKey: key)Core ML for local inference:
// Use Apple's ML frameworklet config = InferenceConfig( backend: .coreML, useNeuralEngine: true)iCloud sync:
// Sync settings across devicesclient.enableiCloudSync = trueWidgets:
// Use WebLLM in widgetsstruct AIWidget: Widget { var body: some WidgetConfiguration { StaticConfiguration(kind: "AI") { entry in AIWidgetView(client: WebLLMClient()) } }}Android SDK
Section titled “Android SDK”Status: 📅 Planned Q4 2025
Native Android integration:
import com.webllm.Client
class MainActivity : AppCompatActivity() { private val client = WebLLMClient()
suspend fun generateText() { val response = client.generate( prompt = "Hello from Android!", model = Model.DEFAULT ) println(response.text) }}Architecture:
┌────────────────────┐│ Android App ││ (Your App) │└─────────┬──────────┘ │┌─────────▼──────────┐│ WebLLM Library ││ (.aar) │└─────────┬──────────┘ │ ┌──────┴──────┐ │ │┌──▼──────┐ ┌───▼────┐│TensorFlow│ │ Cloud ││Lite/NNAPI│ │ APIs │└──────────┘ └────────┘Integration:
Android Keystore:
// Secure key storageclient.addProvider(Provider.ANTHROPIC, apiKey)TensorFlow Lite:
// Local inferenceval config = InferenceConfig( backend = Backend.TFLITE, useNNAPI = true, useGPU = true)WorkManager:
// Background tasksval workRequest = OneTimeWorkRequestBuilder<ModelDownloadWorker>() .build()WorkManager.getInstance(context).enqueue(workRequest)Cross-Platform Sharing
Section titled “Cross-Platform Sharing”Sync Across Devices
Section titled “Sync Across Devices”Users configure once, works everywhere:
iCloud (Apple ecosystem):
- Settings sync via iCloud
- API keys in Keychain
- Downloaded models (wifi only)
Google account (Android/Chrome):
- Settings sync via Google account
- Encrypted API keys
- Model preferences
Manual export/import:
- Export configuration as file
- Import on other devices
- Secure transfer
Shared Model Cache
Section titled “Shared Model Cache”Apps on the same device share models:
Desktop:
~/Library/Application Support/WebLLM/models/ └─ llama-3.2-1b/ └─ phi-3-mini/Multiple apps use same models = save storage.
Mobile:
- iOS: Shared app group
- Android: Content provider
Platform-Specific Optimizations
Section titled “Platform-Specific Optimizations”Apple Silicon optimizations:
- Neural Engine for inference
- Unified memory architecture
- Metal Performance Shaders
- Efficient Core/Performance Core usage
Integration:
- Spotlight search with AI
- Quick Actions in Finder
- Shortcuts app integration
- Menu bar utility
Windows
Section titled “Windows”DirectML optimization:
- GPU acceleration via DirectML
- NPU support (AI PCs)
- Windows ML integration
Integration:
- PowerToys plugin
- Context menu actions
- Windows Terminal integration
- Task bar utility
Flexibility:
- Multiple GPU backends (CUDA, ROCm, Vulkan)
- Wayland/X11 support
- System tray integration
Integration:
- GNOME extension
- KDE plasmoid
- Command-line tools
- Systemd service
Battery optimization:
- Background App Refresh awareness
- Low Power Mode detection
- Thermal management
- Network-aware (WiFi vs cellular)
Integration:
- Shortcuts app
- Share sheet
- Widgets
- App Clips
Android
Section titled “Android”Battery optimization:
- Doze mode awareness
- JobScheduler integration
- Background limits compliance
Integration:
- Quick Settings tile
- Share menu
- Widgets
- Accessibility services
Developer Experience
Section titled “Developer Experience”Unified SDK
Section titled “Unified SDK”One SDK, all platforms:
Installation:
# Webnpm install webllm
# Node.js / Electronnpm install @webllm/node
# iOS (CocoaPods)pod 'WebLLM'
# Android (Gradle)implementation 'com.webllm:client:1.0.0'
# Pythonpip install webllmAPI Consistency:
Same concepts across platforms:
// JavaScript (Web/Node/Electron)const client = new WebLLMClient();const response = await client.generate({ prompt });// Swift (iOS/macOS)let client = WebLLMClient()let response = try await client.generate(prompt: prompt)// Kotlin (Android)val client = WebLLMClient()val response = client.generate(prompt)# Pythonclient = WebLLMClient()response = client.generate(prompt)Different syntax, same semantics.
Timeline
Section titled “Timeline”| Platform | Status | Timeline | Notes |
|---|---|---|---|
| Web Browsers | ✅ Available | Now | Chrome extension |
| Electron | 📅 Planned | Q2 2025 | Via daemon |
| Tauri | 📅 Planned | Q2 2025 | Via daemon |
| Desktop Daemon | 📅 Planned | Q3 2025 | macOS, Windows, Linux |
| Android Chrome | 📅 Planned | Q2 2025 | Chrome extension |
| Firefox Mobile | 📅 Planned | Q3 2025 | Mobile extension |
| iOS Safari | 📅 Investigating | Q3-Q4 2025 | App + extension |
| iOS SDK | 📅 Planned | Q4 2025 | Native framework |
| Android SDK | 📅 Planned | Q4 2025 | Native library |
Use Cases
Section titled “Use Cases”Desktop Apps
Section titled “Desktop Apps”Code Editors:
- AI autocomplete
- Code explanation
- Bug fixing
- Documentation generation
Note-taking Apps:
- Summarization
- Text generation
- Organization
- Search enhancement
Email Clients:
- Smart compose
- Reply suggestions
- Summarization
- Translation
Mobile Apps
Section titled “Mobile Apps”Reading Apps:
- Article summarization
- Translation
- Text-to-speech
- Comprehension help
Productivity Apps:
- Task generation
- Smart reminders
- Note organization
- Meeting summaries
Educational Apps:
- Homework help
- Explanations
- Practice problems
- Language learning
Challenges & Solutions
Section titled “Challenges & Solutions”Challenge: Storage on Mobile
Section titled “Challenge: Storage on Mobile”Models are large (1-10GB), mobile storage is limited.
Solutions:
- Smaller quantized models (500MB-2GB)
- On-demand downloading
- Cloud storage options
- WiFi-only by default
- Model sharing between apps
Challenge: Battery Life
Section titled “Challenge: Battery Life”AI inference is power-intensive.
Solutions:
- Efficient models (optimized for mobile)
- Battery-aware scheduling
- Thermal management
- Offload to cloud when on charger
- User controls (low power mode)
Challenge: Network Costs
Section titled “Challenge: Network Costs”Cloud API calls use data, expensive on cellular.
Solutions:
- WiFi preference
- Cellular data warnings
- Local-first on cellular
- Usage tracking
- User limits
Challenge: Platform Fragmentation
Section titled “Challenge: Platform Fragmentation”Each platform is different.
Solutions:
- Abstract platform differences
- Platform-specific optimizations
- Consistent API surface
- Thorough testing
- Platform-specific docs
Related
Section titled “Related”- WebLLM Roadmap - Overall roadmap
- Native Browser Integration - Browser standardization
- Technical Specification - API details
One protocol. Every platform. User control everywhere.