TinyWhale logo

On-device AI for every platform.

Run open-source LLMs on your browser, phone, and desktop — no cloud required. Your data never leaves your device. Free, private, and fast.

One Monorepo, Four Platforms

Runs everywhere your users are.

The same on-device AI experience — adapted to each platform's strengths. WebGPU for browsers, Metal for iOS, Rust for desktops.

Browser Extension screenshot

Browser Extension

AI sidebar for Chrome, Firefox & Safari. Powered by Transformers.js and WebGPU — runs directly in the service worker.

Plasmo · Transformers.js · ONNX · WebGPU

Chrome Web Store
Web App screenshot

Web App

Chat with LLMs right in your browser tab. Web Worker keeps the UI smooth while the model runs on WebGPU.

Next.js · Transformers.js · ONNX · WebGPU

Try Demo
Mobile screenshot

Mobile

On-device LLM on iOS and Android. Native GPU acceleration via Metal and OpenCL through llama.cpp bindings.

Expo · llama.rn · GGUF · Metal

View on GitHub
Desktop screenshot

Desktop

Lightweight native app with Rust backend. Loads GGUF models from Hugging Face and runs inference via llama.cpp.

Tauri · Rust · llama.cpp · GGUF

View on GitHub
Private by Design

Your data never leaves your device.

All AI inference runs locally — in the browser via WebGPU, on your phone via Metal, or on your desktop via llama.cpp. No servers, no API calls, no telemetry.

1
100% Local Inference
Models run entirely on your device. No data is transmitted anywhere.
2
Cached for Speed
Model files are cached locally. Subsequent launches load in seconds.
3
Works Offline
Once the model is downloaded, everything works without internet.
0 bytes
sent to any server
Text summarization demo showing the Chrome extension summarizing a research paper
Image captioning demo showing the extension describing a Grand Canyon photo
Vision + Text

Understand images and text together.

Multimodal models process images and text locally on your device. Upload a photo and ask questions — no cloud needed.

1
Image Understanding
Describe photos, read handwriting, analyze charts and diagrams.
2
Multi-Image Support
Upload multiple images in a single conversation for comparison and analysis.
3
Drag & Drop Upload
Simply drag images into the chat or use the file picker.
0.8B params
compact yet capable
Full Control

Tune the model to your needs.

Adjust temperature, top-p, top-k, repetition penalty, and more. A local AI playground with fine-grained control over generation behavior.

1
Generation Settings
Temperature, top-p, top-k, min-p, repetition penalty — all adjustable in real-time.
2
Real-Time Metrics
See tokens per second, time-to-first-token, and total generation time.
3
Stop & Reset
Interrupt generation at any time. Clear the conversation and start fresh.
~40 tok/s
on modern laptops
Code generation demo showing the extension writing Python code with generation settings
4
Platforms Supported
~40
Tokens per Second
0
Data Sent to Servers
100%
Local & Private

Ready to dive in?

No sign-up required. No data collected. Pick your platform, load a model, and start chatting.