Skip to content

Developer Best Practices

Building with WebLLM is different from traditional AI integrations. This guide covers best practices for creating great user experiences while respecting user control and privacy.

Users choose the providers, not you. Design accordingly.

❌ Don’t:

// Don't assume a specific provider
const response = await client.generate({
model: 'gpt-4', // User might not have OpenAI configured!
prompt: 'Hello'
});

✅ Do:

// Describe your intent with task and hints
const response = await generateText({
task: 'general',
hints: { speed: 'fast' },
prompt: 'Hello'
// WebLLM intelligently selects best provider based on task/hints
});

Always handle the case where users don’t have the extension yet.

✅ Good UX:

import { promptInstall, isAvailable } from 'webllm';
async function initializeAI() {
try {
// Prompt for installation if needed
await promptInstall();
// Now safe to use WebLLM
return true;
} catch (error) {
// User declined or browser not supported
console.log('WebLLM not available:', error.message);
return false;
}
}
// Use it
const canUseAI = await initializeAI();
if (canUseAI) {
// Show AI features
} else {
// Hide AI features or show fallback
}

AI features should enhance your app, not break it.

✅ Progressive enhancement:

// Feature detection
const hasAI = await webLlmReady().catch(() => false);
if (hasAI) {
// Enhanced experience with AI
showAISummaryButton();
} else {
// Core functionality still works
hideAISummaryButton();
}

Step 1: Check availability

import { isAvailable, getBrowserInfo } from 'webllm';
function checkWebLLM() {
if (isAvailable()) {
return { available: true };
}
const browserInfo = getBrowserInfo();
return {
available: false,
supported: browserInfo.isSupported,
installUrl: browserInfo.installUrl,
reason: browserInfo.reason
};
}

Step 2: Show appropriate UI

const status = checkWebLLM();
if (status.available) {
// Show AI features
showAIFeatures();
} else if (status.supported) {
// Show install prompt
showInstallPrompt(status.installUrl);
} else {
// Browser not supported, hide AI features
hideAIFeatures();
console.warn('WebLLM not supported:', status.reason);
}

Step 3: Use promptInstall() for best UX

import { promptInstall } from 'webllm';
async function enableAIFeatures() {
try {
// Shows modal with install instructions
await promptInstall();
// User installed and configured
showAIFeatures();
return true;
} catch (error) {
// User cancelled
return false;
}
}

For straightforward prompts:

import { generateText } from 'webllm';
async function summarizeArticle(articleText) {
try {
const result = await generateText({
task: 'summarization',
hints: {
quality: 'high',
speed: 'balanced'
},
prompt: `Summarize this article:\n\n${articleText}`,
maxTokens: 200,
temperature: 0.3 // Lower for factual tasks
});
return result.text;
} catch (error) {
console.error('Summarization failed:', error);
// Fallback: return first paragraph
return articleText.split('\n\n')[0];
}
}

For multi-turn conversations:

import { streamText } from 'webllm';
async function chatWithAI(messages, onChunk) {
const stream = await streamText({
messages: [
{ role: 'system', content: 'You are a helpful assistant' },
...messages
]
});
let fullResponse = '';
for await (const chunk of stream) {
fullResponse += chunk.text;
onChunk(chunk.text); // Update UI incrementally
}
return fullResponse;
}

For JSON or structured data:

import { generateText } from 'webllm';
async function extractData(text) {
const result = await generateText({
task: 'extraction',
hints: {
quality: 'high',
capabilities: {
functionCalling: true // Prefer models with structured output
}
},
prompt: `Extract contact information from this text as JSON:
${text}
Return only valid JSON with fields: name, email, phone`,
temperature: 0.1, // Very low for structured output
maxTokens: 500
});
try {
return JSON.parse(result.text);
} catch (error) {
console.error('Failed to parse JSON:', error);
return null;
}
}
import { generateText, WebLLMError } from 'webllm';
async function generateWithRetry(prompt, maxRetries = 2) {
for (let i = 0; i < maxRetries; i++) {
try {
const result = await generateText({ prompt });
return result.text;
} catch (error) {
if (error instanceof WebLLMError) {
switch (error.code) {
case 'EXTENSION_NOT_FOUND':
// Extension not installed
throw new Error('Please install WebLLM extension');
case 'NO_PROVIDER_AVAILABLE':
// User hasn't configured any providers
throw new Error('Please configure an AI provider in WebLLM');
case 'PERMISSION_DENIED':
// User denied permission
throw new Error('Permission denied');
case 'RATE_LIMIT_EXCEEDED':
// Rate limited, retry after delay
if (i < maxRetries - 1) {
await new Promise(resolve => setTimeout(resolve, 2000));
continue;
}
throw new Error('Rate limit exceeded. Please try again later.');
case 'PROVIDER_ERROR':
// Provider-specific error
console.error('Provider error:', error.message);
throw error;
default:
throw error;
}
}
throw error;
}
}
}
function getUserFriendlyError(error) {
if (!error) return 'Unknown error occurred';
const errorMessages = {
'EXTENSION_NOT_FOUND': 'WebLLM extension not installed. Please install it to use AI features.',
'NO_PROVIDER_AVAILABLE': 'No AI provider configured. Please set up a provider in WebLLM settings.',
'PERMISSION_DENIED': 'Permission denied. Click the WebLLM icon to grant access.',
'RATE_LIMIT_EXCEEDED': 'Too many requests. Please wait a moment and try again.',
'QUOTA_EXCEEDED': 'Your usage quota has been exceeded. Check your WebLLM settings.',
'MODEL_NOT_AVAILABLE': 'The requested model is not available. Try a different provider.',
'NETWORK_ERROR': 'Network error. Please check your connection and try again.'
};
return errorMessages[error.code] || error.message || 'An error occurred';
}
// Usage
try {
await generateText({ prompt: 'Hello' });
} catch (error) {
showErrorToUser(getUserFriendlyError(error));
}

For repeated queries:

const cache = new Map();
async function generateWithCache(prompt) {
// Check cache first
if (cache.has(prompt)) {
return cache.get(prompt);
}
// Generate
const result = await generateText({ prompt });
// Cache result
cache.set(prompt, result.text);
// Limit cache size
if (cache.size > 100) {
const firstKey = cache.keys().next().value;
cache.delete(firstKey);
}
return result.text;
}

For real-time features:

function debounce(fn, delay) {
let timeout;
return (...args) => {
clearTimeout(timeout);
timeout = setTimeout(() => fn(...args), delay);
};
}
// Usage: AI-powered autocomplete
const generateSuggestions = debounce(async (text) => {
const result = await generateText({
prompt: `Complete this text: ${text}`,
maxTokens: 50
});
updateSuggestions(result.text);
}, 500); // Wait 500ms after typing stops
inputElement.addEventListener('input', (e) => {
generateSuggestions(e.target.value);
});
import { streamText } from 'webllm';
async function generateWithProgress(prompt, onProgress) {
onProgress({ status: 'starting', percent: 0 });
const stream = await streamText({ prompt });
let chunks = 0;
let fullText = '';
for await (const chunk of stream) {
fullText += chunk.text;
chunks++;
// Update progress
onProgress({
status: 'generating',
percent: Math.min(chunks * 5, 95), // Estimate
text: fullText
});
}
onProgress({ status: 'complete', percent: 100, text: fullText });
return fullText;
}

Let users know if WebLLM is available:

import { isAvailable } from 'webllm';
function WebLLMStatus() {
const available = isAvailable();
return (
<div className="status-badge">
{available ? (
<span className="status-active">
✓ AI Features Enabled
</span>
) : (
<button onClick={handleInstall}>
Enable AI Features (Install WebLLM)
</button>
)}
</div>
);
}

Show responses as they generate:

async function displayStreamingResponse(prompt, outputElement) {
outputElement.textContent = ''; // Clear
const stream = await streamText({ prompt });
for await (const chunk of stream) {
outputElement.textContent += chunk.text;
// Scroll to bottom
outputElement.scrollTop = outputElement.scrollHeight;
}
}

Show clear feedback during generation:

async function generateAndDisplay(prompt) {
const button = document.getElementById('generate-btn');
const output = document.getElementById('output');
// Show loading state
button.disabled = true;
button.textContent = 'Generating...';
output.innerHTML = '<div class="spinner">⏳ Thinking...</div>';
try {
const result = await generateText({ prompt });
output.textContent = result.text;
} catch (error) {
output.innerHTML = `<div class="error">Error: ${getUserFriendlyError(error)}</div>`;
} finally {
button.disabled = false;
button.textContent = 'Generate';
}
}

Respect user privacy - don’t send prompts to your analytics:

❌ Don’t:

const result = await generateText({ prompt });
analytics.track('ai_generation', { prompt }); // Don't do this!

✅ Do:

const result = await generateText({ prompt });
analytics.track('ai_generation', {
promptLength: prompt.length,
tokensUsed: result.usage.totalTokens
// No actual prompt content
});

Always sanitize prompts from user input:

function sanitizePrompt(userInput) {
// Remove potentially harmful content
return userInput
.replace(/<script>/gi, '')
.replace(/javascript:/gi, '')
.trim()
.substring(0, 10000); // Limit length
}
async function generateFromUserInput(userInput) {
const sanitized = sanitizePrompt(userInput);
return await generateText({ prompt: sanitized });
}

Don’t keep sensitive data in memory longer than needed:

async function processSecureData(sensitiveData) {
try {
const result = await generateText({
prompt: `Analyze: ${sensitiveData}`
});
return result.text;
} finally {
// Clear sensitive data
sensitiveData = null;
}
}

Use mocks in unit tests:

import { generateText } from 'webllm';
// Mock in test
vi.mock('webllm', () => ({
generateText: vi.fn(async ({ prompt }) => ({
text: 'Mocked response',
model: 'mock-model',
usage: { totalTokens: 10 }
})),
isAvailable: vi.fn(() => true)
}));
// Test
test('summarizes article', async () => {
const summary = await summarizeArticle('Long article...');
expect(summary).toBe('Mocked response');
expect(generateText).toHaveBeenCalledWith({
prompt: expect.stringContaining('Summarize'),
maxTokens: 200,
temperature: 0.3
});
});

Test graceful degradation:

test('handles missing extension', async () => {
// Mock extension not available
vi.mocked(isAvailable).mockReturnValue(false);
const app = render(<App />);
// AI features should be hidden
expect(app.queryByText('AI Summary')).toBeNull();
// Core features should work
expect(app.getByText('Read Article')).toBeTruthy();
});

Use hooks for better integration:

import { useState, useEffect } from 'react';
import { isAvailable, promptInstall, generateText } from 'webllm';
function useWebLLM() {
const [available, setAvailable] = useState(false);
const [loading, setLoading] = useState(true);
useEffect(() => {
setAvailable(isAvailable());
setLoading(false);
}, []);
return { available, loading };
}
function useGenerate() {
const [generating, setGenerating] = useState(false);
const [error, setError] = useState(null);
const generate = async (prompt) => {
setGenerating(true);
setError(null);
try {
const result = await generateText({ prompt });
return result.text;
} catch (err) {
setError(err);
throw err;
} finally {
setGenerating(false);
}
};
return { generate, generating, error };
}
// Usage
function AISummary({ text }) {
const { available } = useWebLLM();
const { generate, generating, error } = useGenerate();
const [summary, setSummary] = useState('');
if (!available) return null;
const handleSummarize = async () => {
const result = await generate(`Summarize: ${text}`);
setSummary(result);
};
return (
<div>
<button onClick={handleSummarize} disabled={generating}>
{generating ? 'Summarizing...' : 'Summarize'}
</button>
{error && <div className="error">{getUserFriendlyError(error)}</div>}
{summary && <div className="summary">{summary}</div>}
</div>
);
}

Use composables:

useWebLLM.js
import { ref, onMounted } from 'vue';
import { isAvailable, generateText } from 'webllm';
export function useWebLLM() {
const available = ref(false);
onMounted(() => {
available.value = isAvailable();
});
return { available };
}
export function useGenerate() {
const generating = ref(false);
const error = ref(null);
const generate = async (prompt) => {
generating.value = true;
error.value = null;
try {
const result = await generateText({ prompt });
return result.text;
} catch (err) {
error.value = err;
throw err;
} finally {
generating.value = false;
}
};
return { generate, generating, error };
}

Use stores:

webllm.js
import { writable } from 'svelte/store';
import { isAvailable, generateText } from 'webllm';
export const webllmAvailable = writable(isAvailable());
export async function generate(prompt) {
const result = await generateText({ prompt });
return result.text;
}
// Component.svelte
<script>
import { webllmAvailable, generate } from './webllm';
let generating = false;
let summary = '';
async function handleSummarize() {
generating = true;
summary = await generate('Summarize this...');
generating = false;
}
</script>
{#if $webllmAvailable}
<button on:click={handleSummarize} disabled={generating}>
Summarize
</button>
{#if summary}
<p>{summary}</p>
{/if}
{/if}

Before shipping WebLLM features:

  • Handle extension not installed gracefully
  • Test with extension disabled
  • Add loading states for all AI operations
  • Implement error handling with user-friendly messages
  • Don’t assume specific providers or models
  • Don’t log user prompts to analytics
  • Sanitize user input
  • Add retry logic for transient errors
  • Use streaming for better UX when appropriate
  • Cache responses when applicable
  • Document which features require WebLLM
  • Test on supported browsers
  • Implement progressive enhancement
// BAD: Will break if extension not installed
const result = await generateText({ prompt: 'Hello' });
// GOOD: Check availability first
if (await webLlmReady()) {
const result = await generateText({ prompt: 'Hello' });
} else {
showInstallPrompt();
}
// BAD: Errors crash the app
const result = await generateText({ prompt });
showResult(result.text);
// GOOD: Handle errors gracefully
try {
const result = await generateText({ prompt });
showResult(result.text);
} catch (error) {
showError(getUserFriendlyError(error));
}
// BAD: Blocks UI during generation
const result = await generateText({ prompt });
updateUI(result.text);
// GOOD: Show loading state
showLoading();
const result = await generateText({ prompt });
hideLoading();
updateUI(result.text);
// BAD: Forces specific model
const result = await generateText({
model: 'gpt-4', // User might not have this!
prompt: 'Hello'
});
// GOOD: Describe your intent with task/hints
const result = await generateText({
task: 'general',
hints: { speed: 'fast' },
prompt: 'Hello'
// WebLLM intelligently selects best model
});

Build great AI experiences that respect user control and privacy.