On-device AI for every platform.

Run open-source LLMs on your browser, phone, and desktop — no cloud required. Your data never leaves your device. Free, private, and fast.

Get TinyWhale Try Web Demo Open Source on GitHub

One Monorepo, Four Platforms

Runs everywhere your users are.

The same on-device AI experience — adapted to each platform's strengths. WebGPU for browsers, Metal for iOS, Rust for desktops.

Browser Extension

AI sidebar for Chrome, Firefox & Safari. Powered by Transformers.js and WebGPU — runs directly in the service worker.

Plasmo · Transformers.js · ONNX · WebGPU

Chrome Web Store →

Web App

Chat with LLMs right in your browser tab. Web Worker keeps the UI smooth while the model runs on WebGPU.

Next.js · Transformers.js · ONNX · WebGPU

Try Demo →

Mobile

On-device LLM on iOS and Android. Native GPU acceleration via Metal and OpenCL through llama.cpp bindings.

Expo · llama.rn · GGUF · Metal

View on GitHub →

Desktop

Lightweight native app with Rust backend. Loads GGUF models from Hugging Face and runs inference via llama.cpp.

Tauri · Rust · llama.cpp · GGUF

View on GitHub →

Private by Design

Your data never leaves your device.

All AI inference runs locally — in the browser via WebGPU, on your phone via Metal, or on your desktop via llama.cpp. No servers, no API calls, no telemetry.

100% Local Inference

Models run entirely on your device. No data is transmitted anywhere.

Cached for Speed

Model files are cached locally. Subsequent launches load in seconds.

Works Offline

Once the model is downloaded, everything works without internet.

0 bytes

sent to any server

Text summarization demo showing the Chrome extension summarizing a research paper

Image captioning demo showing the extension describing a Grand Canyon photo

Vision + Text

Understand images and text together.

Multimodal models process images and text locally on your device. Upload a photo and ask questions — no cloud needed.

Image Understanding

Describe photos, read handwriting, analyze charts and diagrams.

Multi-Image Support

Upload multiple images in a single conversation for comparison and analysis.

Drag & Drop Upload

Simply drag images into the chat or use the file picker.

0.8B params

compact yet capable

Full Control

Tune the model to your needs.

Adjust temperature, top-p, top-k, repetition penalty, and more. A local AI playground with fine-grained control over generation behavior.

Generation Settings

Temperature, top-p, top-k, min-p, repetition penalty — all adjustable in real-time.

Real-Time Metrics

See tokens per second, time-to-first-token, and total generation time.

Stop & Reset

Interrupt generation at any time. Clear the conversation and start fresh.

~40 tok/s

on modern laptops

Code generation demo showing the extension writing Python code with generation settings

Platforms Supported

~40

Tokens per Second

Data Sent to Servers

100%

Local & Private

Ready to dive in?

No sign-up required. No data collected. Pick your platform, load a model, and start chatting.

Get TinyWhale Try Web Demo View on GitHub