Run Phi-3 Locally in Browser with WebGPU & Transformers.js | Staksoft AI Guide
Share
Running Phi-3 Locally in the Browser: A WebGPU Revolution
The landscape of Artificial Intelligence is shifting from cloud-heavy APIs to on-device execution. With the release of Transformers.js v3, developers can now run powerful models like Microsoft’s Phi-3 Mini directly in the browser. By leveraging WebGPU, we achieve near-native performance without the latency or privacy concerns of traditional server-side inference.
Real-World Application: AI-Powered PDF Analysis
The transition to on-device AI isn't just theoretical. Modern platforms are already integrating these capabilities to enhance user experience. For instance, the AI Assistant on pdfaigen.com demonstrates how local AI can be used to interact with documents, providing instant summaries and insights directly within the interface.
Why Phi-3 with WebGPU?
Zero Latency: No round-trip requests to a server.
Privacy by Design: User data never leaves the local machine.
Cost Efficiency: Eliminate expensive GPU cloud hosting costs.
Hardware Acceleration: WebGPU provides direct access to the user's graphics card for lightning-fast token generation.
Implementation: The Code Snippet
To integrate Phi-3 into your web application, you need to use the quantized ONNX version of the model. Here is the implementation using @xenova/transformers.
import { pipeline } from '@xenova/transformers';
/**
* Initialize and run Phi-3 Mini 4K Instruct via WebGPU
* Model: Xenova/phi-3-mini-4k-instruct-onnx
*/
async function runPhi3Demo(prompt) {
// 1. Create a text-generation pipeline with WebGPU acceleration
const generator = await pipeline('text-generation', 'Xenova/phi-3-mini-4k-instruct-onnx', {
device: 'webgpu', // Utilizing the local GPU
});
// 2. Setup the ChatML message format
const messages = [
{ role: 'system', content: 'You are an expert AI assistant.' },
{ role: 'user', content: prompt },
];
// 3. Generate a response
const output = await generator(messages, {
max_new_tokens: 256,
temperature: 0.6,
do_sample: true,
});
return output[0].generated_text;
}
Technical Requirements
Note: Since this is an experimental implementation, ensure your environment meets the following:
Browser: Chrome 113+, Edge 113+, or Firefox Nightly.
Model Size: The Phi-3 Mini ONNX weights are approximately 2.3GB. Ensure your application handles loading states gracefully.
VRAM: A minimum of 4GB VRAM is recommended for a smooth experience.
Experience Advanced AI Features
See how we implement high-performance AI tools in practice.
Staksoft - Leading the way in on-device AI integration and smart document solutions.
Ready to Energize Your Project?
Join thousands of others experiencing the power of lightning-fast technology