What models can I run with OrionPod?

OrionPod supports GGUF-format models. You can browse and download them directly from HuggingFace inside the app. Popular models like Mistral, DeepSeek, Qwen, Kimi, Llama, and Gemma all work. The app automatically filters out models that won't run on your hardware.

Is OrionPod really 100% private?

Yes. OrionPod runs entirely on your machine. There's no telemetry, no analytics, no cloud calls, no API keys. Your prompts and model outputs never leave your device.

What hardware do I need for OrionPod?

macOS 10.15 (Catalina) or later. Apple Silicon (M1/M2/M3/M4) is recommended for Metal GPU acceleration, but Intel Macs work too. For 7B models, 8 GB of RAM is a comfortable minimum; larger models need more.

Yes, OrionPod is free and open source. No subscriptions, no paywalls, no premium tiers.

Download OrionPod

Currently available for macOS

v0.2.3-beta

Faye Latest

2026-03-21

Suggested Models and Onboarding

macOS

Universal (Apple Silicon + Intel) · ~24 MB

OrionPod_0.2.3_universal.dmg

Changelog

Changes

› "Surprise Me" model discovery — one-click random suggestions from a curated list of small, high-quality models
› Curated model list: TinyLlama 1.1B, Qwen 2.5 (0.5B/1.5B/3B), Phi 3.5 Mini, Gemma 2 2B, StableLM 2 1.6B, SmolLM2 1.7B, Llama 3.2 (1B/3B)
› "Try Another" re-roll button — skips already-seen and already-downloaded models
› Inline download with progress tracking directly from the suggestion card
› HuggingFace search result caching (5-minute TTL) — fewer API calls, faster repeat searches
› Quantization variant badges on search result cards — see available quants at a glance
› Sort search results by downloads, likes, or recent activity
› Download pause/resume — pause active downloads, resume later (supports HTTP Range)
› Cancel download button with proper cleanup
› Disk space check before downloading — warns if insufficient space
› Download complete toast with "Load now?" action button
› Active download indicator in footer bar
› First-run welcome wizard — guided setup: download a starter model, auto-load, start chatting
› Partial download recovery — detects incomplete downloads after app crash and offers resume

For Geeks

› `src/lib/curatedModels.ts` — maintainable curated model list as a typed constant array
› `SurpriseCard` component with full download lifecycle (progress, pause/resume/cancel)
› `SearchCache` with TTL-based expiry in `HuggingFaceClient`
› `DownloadManagerState` with `DownloadHandle` for cancel/pause control
› `DownloadSidecar` metadata JSON written alongside downloaded GGUFs
› `available_disk_space()` using `sysinfo::Disks` for volume-aware space check
› `cancel_download`, `pause_download`, `resume_download`, `list_partial_downloads`, `get_available_disk_space` IPC commands
› Download resume via HTTP `Range` header with `.gguf.part` file detection
› `WelcomeWizard` component with 4-step flow (welcome → download → loading → ready)
› `useDownloads` hook now tracks download completion transitions for toast notifications

v0.2.2-alpha

2026-03-19

Your models, your rules

macOS

Universal (Apple Silicon + Intel) · ~24 MB

OrionPod_0.2.2_universal.dmg

Changelog

Changes

› GGUF metadata extraction — model cards now show parameter count, context length, and architecture
› Runtime controls — functional thread count, temperature, and context length sliders in Settings
› Chat template auto-detection from GGUF metadata with manual override dropdown (ChatML, Llama 3, Mistral, Gemma, Phi-3, DeepSeek, etc.)
› Context overflow handling — oldest messages automatically pruned when conversation exceeds context window
› Model status events — real-time loading/ready/error/unloaded status via Tauri events
› Update notification toast with download button when a new version is available
› Actionable toast notifications (toasts can now have clickable action buttons)

For Geeks

› `GgufModelInfo` struct with full GGUF header metadata (params, layers, heads, embedding dim, architecture, chat template)
› `InferenceEngine::format_prompt()` uses `apply_chat_template()` from llama.cpp with ChatML fallback
› `InferenceEngine::truncate_to_fit()` for pair-wise context pruning
› `model-status` Tauri event channel with `ModelStatusEvent` payload
› `AppConfig` extended with `chat_template` option for manual override
› `generate()` accepts configurable `n_threads` and `context_length` from config
› `ModelMetadata` enriched with `context_length`, `architecture`, `chat_template` fields (backward-compatible via `#[serde(default)]`)
› Auto-update check via `https://orionpod.com/api/latest.json` on app launch
› `check_for_updates` Rust IPC command with semver-aware version comparison
› `useUpdateCheck` hook (5s delayed, silent fail, non-blocking)
› `update-web-release.cjs` script for automated release metadata updates
› Changelog auto-extraction from `CHANGELOG.md` into `releases.js`

v0.2.1-rc1

Kusanagi

2026-03-18

First public release. Metal GPU acceleration, HuggingFace model browser, real-time observability.

macOS

Universal (Apple Silicon + Intel) · ~30 MB

OrionPod_0.2.1_universal.dmg

Changelog

Changes

› Chat interface with streaming responses and markdown rendering
› HuggingFace model browser with hardware compatibility filtering
› GGUF model support — download from HuggingFace or upload local files
› Real-time observability dashboard (tokens/s, memory, latency, GPU usage)
› Metal GPU acceleration on Apple Silicon
› Model parameter controls (temperature, context length, top-p, top-k)
› Toast notifications and user-friendly error messages
› Glassmorphism UI with macOS vibrancy

For Geeks

› Tauri v2 + React + TypeScript + Rust + llama.cpp
› orion-core agent harness crate (backend-agnostic)
› Universal macOS binary (Apple Silicon + Intel), ~30 MB
› Starts in under 2 seconds, <50 MB RAM idle
› Zero telemetry, zero analytics, zero cloud dependencies

System Requirements

✓ macOS 10.15 (Catalina) or later
✓ Apple Silicon (M-series) recommended for Metal GPU acceleration
✓ 8 GB RAM minimum for 7B models

Open the DMG → drag OrionPod to Applications → launch. That's it.