Vercel AI Provider

Overview

The webllm-ai-provider package lets you use Vercel AI SDK with intelligent model selection instead of hardcoding model names. This means:

Zero server costs - AI runs on user’s machines
Complete privacy - Data never leaves the browser
No model names needed - Describe what you need, WebLLM picks the best model
User choice - Let users configure their preferred providers

Key Concept: Task-Based Selection

Unlike traditional providers where you specify exact models like 'claude-3-5-sonnet' or 'gpt-4', WebLLM uses task types and hints to automatically select the best available model.

You describe what you want to do (task + hints), and WebLLM intelligently routes to the best provider based on user configuration.

Quick Start

npm install webllm

npm install webllm-ai-provider ai

Basic Usage

Compare approaches for browser-native AI execution:

WebLLM
Vercel AI SDK

Before (Traditional API with Costs):

import Anthropic from '@anthropic-ai/sdk';

// Costs you money, requires API key management
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_KEY });
const result = await anthropic.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  messages: [{ role: 'user', content: 'Explain TypeScript' }]
});

After (WebLLM - Zero Cost, Intelligent Selection):

import { generateText } from 'webllm';

// Zero cost, intelligent model selection
const result = await generateText({
  task: 'qa',
  prompt: 'Explain TypeScript',
  hints: { quality: 'high' }
});

Before (Traditional Provider with API Costs):

import { anthropic } from '@ai-sdk/anthropic';
import { generateText } from 'ai';

// Costs you money, uses your API key, requires server
const result = await generateText({
  model: anthropic('claude-3-5-sonnet-20241022'),
  prompt: 'Explain TypeScript'
});

After (WebLLM - Zero Cost, Intelligent Selection):

import { webllm } from 'webllm-ai-provider';
import { generateText } from 'ai';

// Zero cost, intelligent model selection
const result = await generateText({
  model: webllm({ task: 'qa', hints: { quality: 'high' } }),
  prompt: 'Explain TypeScript'
});

How It Works

Architecture Flow

Your App (Vercel AI SDK)
       ↓
webllm({ task: 'qa', hints: { quality: 'high' } })
       ↓
WebLLM Extension (intelligent selection)
       ↓
Selected Provider:
  - Anthropic (user's API key)
  - OpenAI (user's API key)
  - Local models (WebGPU)
       ↓
Response back to your app

Request Routing Process

Developer specifies task type and hints
WebLLM receives the request in the user’s browser
Analyzes requirements - task type, speed/quality preferences, capabilities needed
Scores available models - based on task compatibility and user configuration
Selects best match - highest-scoring available model
Executes request via selected provider
Returns response with selection metadata

Task Types

Describe what you want to do:

general - General conversation, Q&A
summarization - Text summarization
translation - Language translation
qa - Question answering
coding - Code generation/assistance
creative - Creative writing
extraction - Structured data extraction

Model Hints

Performance Preferences

Guide selection with speed and quality preferences:

WebLLM
Vercel AI SDK

import { generateText } from 'webllm';

const result = await generateText({
  task: 'coding',
  hints: {
    speed: 'balanced',
    quality: 'high'
  },
  prompt: 'Write a React component'
});

import { webllm } from 'webllm-ai-provider';
import { generateText } from 'ai';

const result = await generateText({
  model: webllm({
    task: 'coding',
    hints: {
      speed: 'balanced',
      quality: 'high'
    }
  }),
  prompt: 'Write a React component'
});

Speed Options:

fastest - Smallest, fastest models
fast - Quick with decent quality
balanced - Balance of speed and quality
quality - Prioritize quality over speed

Quality Options:

draft - Good enough for prototyping
standard - Production quality
high - High quality results
best - Best available, regardless of speed

Capability Requirements

Specify required capabilities:

WebLLM
Vercel AI SDK

const result = await generateText({
  task: 'coding',
  hints: {
    capabilities: {
      codeGeneration: true,
      reasoning: true,
      longContext: true
    }
  },
  prompt: 'Refactor this large codebase...'
});

const result = await generateText({
  model: webllm({
    task: 'coding',
    hints: {
      capabilities: {
        codeGeneration: true,
        reasoning: true,
        longContext: true
      }
    }
  }),
  prompt: 'Refactor this large codebase...'
});

Available Capabilities:

multilingual - Good multilingual support
codeGeneration - Strong coding abilities
reasoning - Chain-of-thought reasoning
longContext - Needs 32k+ context window
math - Mathematical problem solving
functionCalling - Native function calling support

Resource Constraints

Limit model size for mobile or resource-constrained devices:

const result = await generateText({
  model: webllm({
    task: 'qa',
    hints: {
      maxModelSize: 2,  // Max 2GB model
      maxMemory: 4      // Max 4GB RAM
    }
  }),
  prompt: 'Quick question...'
});

Common Use Cases

Fast Customer Support

Prioritize speed for quick customer interactions:

WebLLM
Vercel AI SDK

import { generateText } from 'webllm';

const result = await generateText({
  task: 'general',
  hints: { speed: 'fastest' },
  prompt: 'How do I reset my password?'
});

import { generateText } from 'ai';
import { webllm } from 'webllm-ai-provider';

const result = await generateText({
  model: webllm({
    task: 'general',
    hints: { speed: 'fastest' }
  }),
  prompt: 'How do I reset my password?'
});

Complex Code Analysis

Prioritize quality for deep technical analysis:

WebLLM
Vercel AI SDK

import { generateText } from 'webllm';

const result = await generateText({
  task: 'coding',
  hints: {
    quality: 'best',
    capabilities: {
      reasoning: true,
      codeGeneration: true
    }
  },
  prompt: 'Analyze this microservices architecture...'
});

import { generateText } from 'ai';
import { webllm } from 'webllm-ai-provider';

const result = await generateText({
  model: webllm({
    task: 'coding',
    hints: {
      quality: 'best',
      capabilities: {
        reasoning: true,
        codeGeneration: true
      }
    }
  }),
  prompt: 'Analyze this microservices architecture...'
});

Creative Writing

Balance capability and creativity:

WebLLM
Vercel AI SDK

import { generateText } from 'webllm';

const result = await generateText({
  task: 'creative',
  hints: { quality: 'high' },
  prompt: 'Write a short story about AI...',
  temperature: 0.9
});

import { generateText } from 'ai';
import { webllm } from 'webllm-ai-provider';

const result = await generateText({
  model: webllm({
    task: 'creative',
    hints: { quality: 'high' }
  }),
  prompt: 'Write a short story about AI...',
  temperature: 0.9
});

Streaming Responses

import { webllm } from 'webllm-ai-provider';
import { streamText } from 'ai';

const { textStream } = await streamText({
  model: webllm({
    task: 'creative',
    hints: { quality: 'best' }
  }),
  prompt: 'Write a story about robots...'
});

for await (const chunk of textStream) {
  process.stdout.write(chunk);
}

React Integration

Client Component

'use client';

import { useState } from 'react';
import { generateText } from 'ai';
import { webllm } from 'webllm-ai-provider';

export default function ChatPage() {
  const [response, setResponse] = useState('');
  const [isLoading, setIsLoading] = useState(false);

  const handleGenerate = async () => {
    setIsLoading(true);

    const result = await generateText({
      model: webllm({ task: 'qa' }),
      prompt: 'Explain quantum computing simply'
    });

    setResponse(result.text);
    setIsLoading(false);
  };

  return (
    <div>
      <button onClick={handleGenerate} disabled={isLoading}>
        {isLoading ? 'Generating...' : 'Generate'}
      </button>
      {response && <p>{response}</p>}
    </div>
  );
}

With useChat Hook

'use client';

import { useChat } from 'ai/react';

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat({
    api: '/api/chat',
  });

  return (
    <div>
      {messages.map(m => (
        <div key={m.id}>
          {m.role}: {m.content}
        </div>
      ))}

      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
        <button type="submit">Send</button>
      </form>
    </div>
  );
}

API Route (Next.js App Router)

import { streamText } from 'ai';
import { webllm } from 'webllm-ai-provider';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = await streamText({
    model: webllm({
      task: 'general',
      hints: { speed: 'balanced', quality: 'high' }
    }),
    messages,
  });

  return result.toDataStreamResponse();
}

Migration Guide

From Anthropic

Before:

import { anthropic } from '@ai-sdk/anthropic';

const result = await generateText({
  model: anthropic('claude-3-5-sonnet-20241022'),
  prompt: 'Hello!',
  temperature: 0.7,
  maxTokens: 1000
});

After:

import { webllm } from 'webllm-ai-provider';

const result = await generateText({
  model: webllm({ task: 'general', hints: { quality: 'high' } }),
  prompt: 'Hello!',
  temperature: 0.7,
  maxTokens: 1000
});

From OpenAI