Running Phi-3 Locally in the Browser: A WebGPU Revolution

The landscape of Artificial Intelligence is shifting from cloud-heavy APIs to on-device execution. With the release of Transformers.js v3, developers can now run powerful models like Microsoft’s Phi-3 Mini directly in the browser. By leveraging WebGPU, we achieve near-native performance without the latency or privacy concerns of traditional server-side inference.

Real-World Application: AI-Powered PDF Analysis

The transition to on-device AI isn't just theoretical. Modern platforms are already integrating these capabilities to enhance user experience. For instance, the AI Assistant on pdfaigen.com demonstrates how local AI can be used to interact with documents, providing instant summaries and insights directly within the interface.

Why Phi-3 with WebGPU?

Zero Latency: No round-trip requests to a server.
Privacy by Design: User data never leaves the local machine.
Cost Efficiency: Eliminate expensive GPU cloud hosting costs.
Hardware Acceleration: WebGPU provides direct access to the user's graphics card for lightning-fast token generation.

Implementation: The Code Snippet

To integrate Phi-3 into your web application, you need to use the quantized ONNX version of the model. Here is the implementation using @xenova/transformers.

import { pipeline } from '@xenova/transformers';

/**
 * Initialize and run Phi-3 Mini 4K Instruct via WebGPU
 * Model: Xenova/phi-3-mini-4k-instruct-onnx
 */
async function runPhi3Demo(prompt) {
    // 1. Create a text-generation pipeline with WebGPU acceleration
    const generator = await pipeline('text-generation', 'Xenova/phi-3-mini-4k-instruct-onnx', {
        device: 'webgpu', // Utilizing the local GPU
    });

    // 2. Setup the ChatML message format
    const messages = [
        { role: 'system', content: 'You are an expert AI assistant.' },
        { role: 'user', content: prompt },
    ];

    // 3. Generate a response
    const output = await generator(messages, {
        max_new_tokens: 256,
        temperature: 0.6,
        do_sample: true,
    });

    return output[0].generated_text;
}

Technical Requirements

Note: Since this is an experimental implementation, ensure your environment meets the following:

Browser: Chrome 113+, Edge 113+, or Firefox Nightly.
Model Size: The Phi-3 Mini ONNX weights are approximately 2.3GB. Ensure your application handles loading states gracefully.
VRAM: A minimum of 4GB VRAM is recommended for a smooth experience.

Experience Advanced AI Features

See how we implement high-performance AI tools in practice.

Explore PDF AI Assistant

Staksoft - Leading the way in on-device AI integration and smart document solutions.

Run Phi-3 Locally in Browser with WebGPU & Transformers.js | Staksoft AI Guide

Running Phi-3 Locally in the Browser: A WebGPU Revolution

Real-World Application: AI-Powered PDF Analysis

Why Phi-3 with WebGPU?

Implementation: The Code Snippet

Technical Requirements

Experience Advanced AI Features

Related Articles

Real-Time Whisper WebGPU: High-Performance Speech-to-Text in Browser | Staksoft Guide

Ready to Energize Your Project?