Download OrionPod

Currently available for macOS

v0.4.0-beta

Yuki Latest
2026-06-05

Re-aligning with focus on UX

macOS

Universal (Apple Silicon + Intel) · ~25 MB

OrionPod_0.4.0_universal.dmg
Changelog

Changes

  • Faster multi-turn chat — the model's context (KV cache) is now reused across turns instead of rebuilt on every message, lowering per-message latency and removing the redundant Metal warm-up after the first response
  • Observability now reports the real time-to-first-token per request (previously an estimate derived from tokens/sec)
  • Context Window card on Observability page — gauge ring showing token usage ratio, used/max/available counts, messages in context, and pruned message count
  • Keyboard shortcuts: `⌘N` new chat, `⌘K` quick model switcher, `⌘,` open settings, `Esc` stop generation
  • Quick model switcher — command palette (`⌘K`) with search, keyboard navigation, load/unload
  • Dynamic window title — shows "OrionPod — ModelName" when a model is loaded
  • Window size and position remembered across launches
  • Skeleton loader for model list (replaces plain text spinner)
  • Smooth animations: message appearance, sidebar active indicator, modal transitions

For Geeks

  • Persistent inference context — `InferenceEngine` keeps one `llama_context` per loaded model (created lazily, reused across turns; recreated only when context size or thread count changes). `generate()` is now `&mut self`
  • Incremental KV cache — each new prompt is reconciled against the tokens resident in the cache via longest-common-prefix; only the diverging suffix is decoded (`clear_kv_cache_seq`), and generated tokens are appended for reuse on the next turn
  • KV cache reconciles with orion-core pruning automatically (truncates at the divergence point, system-prompt prefix retained) and resets on `clear_conversation`, model switch/unload, and context overflow via `InferenceEngine::reset_context()`
  • Metal compute pipelines now compile once per loaded model instead of once per turn
  • `AgentEvent::GenerationStats` — new event carrying real tokens generated, tokens/sec, time-to-first-token, and generation time; `MetricsCollector` records actual TTFT instead of approximating it as `1000/tps`
  • Frontend `AgentEvent` union extended with the `generation_stats` variant
  • `tauri-plugin-window-state` added for persistent window geometry
  • `useKeyboardShortcuts` hook for global shortcut handling
  • `ChatWindow` converted to `forwardRef` to expose `clearChat()` for programmatic reset
  • Context budget metrics now recorded to `MetricsCollector` — wired `ContextBudget` agent event data through to session metrics
  • `SessionMetrics` TypeScript type extended with `context_used_tokens`, `context_max_tokens`, `context_messages_in_context`, `context_messages_pruned`
  • `ChatTemplate` trait in orion-core — pluggable prompt formatting with `format()`, `format_system()`, `format_message()`, `assistant_prefix()`
  • `ChatMLTemplate` default implementation
  • `detect_template()` — auto-selects template from GGUF metadata string, falls back to ChatML
  • Pair-wise context pruning — user+assistant turns pruned as units, never orphans a question or answer
  • Template-aware token accounting — budget counts template overhead (`<|im_start|>`, `<|im_end|>`, etc.) per message, not just raw content
  • System prompt + tool schema tokens deducted from context budget before conversation pruning
  • `CoreError::Context` on overflow — clear error when system prompt or latest message exceeds context budget
  • `prepare_context()` replaces separate `prune_messages()` + `format_chatml()` — single function for prune, format, and budget accounting
  • `Agent::with_template()` constructor and `set_template()` for runtime template switching
  • 17 new tests in `orion-core/tests/context_tests.rs` (pair-wise pruning, overflow errors, template overhead, tool budget, ChatML formatting, detect_template)

v0.3.0-alpha

Daru
2026-03-24

Hello World orion-core, an inbuilt harness

macOS

Universal (Apple Silicon + Intel) · ~24 MB

OrionPod_0.3.0_universal.dmg
Changelog

Changes

  • Structured agent event system — chat now receives rich lifecycle events (start, delta, end, error, warning) instead of raw token strings
  • Token budget bar in chat toolbar — shows context window usage (e.g. "1,200 / 4,096 tokens") with visual warning when >80% full
  • Discord community link added to About modal
  • System prompt support — backend accepts custom system prompts via `set_system_prompt` command
  • Inference parameter tuning — `set_inference_params` command for runtime temperature, context size, and thread count updates

For Geeks

  • `orion-core` integration: `LlamaCppBackend` implements `LlmBackend` trait, wrapping `InferenceEngine` for backend-agnostic agent loop
  • `AgentState` replaces `InferenceState` — all inference commands route through `orion_core::Agent` for conversation state, context pruning, and ChatML prompt formatting
  • `agent-event` Tauri event channel replaces `token-stream` — emits all 13 `AgentEvent` variants (`agent_start`, `message_delta`, `message_end`, `context_budget`, `error`, etc.)
  • `set_system_prompt` and `set_inference_params` IPC commands
  • `InferenceParams` extended with `n_threads` field (defaults to `available_parallelism - 2`)
  • Removed `inference/streaming.rs` (`TokenEvent`, `ChatMessage`, `format_chatml` superseded by orion-core types)
  • Removed dead `format_prompt` and `truncate_to_fit` from `InferenceEngine` (context pipeline now in orion-core)
  • Frontend: `AgentEvent` discriminated union type (13 variants), `AgentMessage`, `ToolCall`, `ToolResultData` types added to `lib/types.ts`
  • Frontend: `setSystemPrompt()` and `setInferenceParams()` IPC wrappers in `lib/tauri.ts`
  • Frontend: `ChatWindow.tsx` migrated from `token-stream` listener to `agent-event` with full event handling
  • Zero compiler warnings (Rust), zero TypeScript errors

v0.2.3-beta

Faye
2026-03-21

Suggested Models and Onboarding

macOS

Universal (Apple Silicon + Intel) · ~24 MB

OrionPod_0.2.3_universal.dmg
Changelog

Changes

  • "Surprise Me" model discovery — one-click random suggestions from a curated list of small, high-quality models
  • Curated model list: TinyLlama 1.1B, Qwen 2.5 (0.5B/1.5B/3B), Phi 3.5 Mini, Gemma 2 2B, StableLM 2 1.6B, SmolLM2 1.7B, Llama 3.2 (1B/3B)
  • "Try Another" re-roll button — skips already-seen and already-downloaded models
  • Inline download with progress tracking directly from the suggestion card
  • HuggingFace search result caching (5-minute TTL) — fewer API calls, faster repeat searches
  • Quantization variant badges on search result cards — see available quants at a glance
  • Sort search results by downloads, likes, or recent activity
  • Download pause/resume — pause active downloads, resume later (supports HTTP Range)
  • Cancel download button with proper cleanup
  • Disk space check before downloading — warns if insufficient space
  • Download complete toast with "Load now?" action button
  • Active download indicator in footer bar
  • First-run welcome wizard — guided setup: download a starter model, auto-load, start chatting
  • Partial download recovery — detects incomplete downloads after app crash and offers resume

For Geeks

  • `src/lib/curatedModels.ts` — maintainable curated model list as a typed constant array
  • `SurpriseCard` component with full download lifecycle (progress, pause/resume/cancel)
  • `SearchCache` with TTL-based expiry in `HuggingFaceClient`
  • `DownloadManagerState` with `DownloadHandle` for cancel/pause control
  • `DownloadSidecar` metadata JSON written alongside downloaded GGUFs
  • `available_disk_space()` using `sysinfo::Disks` for volume-aware space check
  • `cancel_download`, `pause_download`, `resume_download`, `list_partial_downloads`, `get_available_disk_space` IPC commands
  • Download resume via HTTP `Range` header with `.gguf.part` file detection
  • `WelcomeWizard` component with 4-step flow (welcome → download → loading → ready)
  • `useDownloads` hook now tracks download completion transitions for toast notifications

v0.2.2-alpha

Ed
2026-03-19

Your models, your rules

macOS

Universal (Apple Silicon + Intel) · ~24 MB

OrionPod_0.2.2_universal.dmg
Changelog

Changes

  • GGUF metadata extraction — model cards now show parameter count, context length, and architecture
  • Runtime controls — functional thread count, temperature, and context length sliders in Settings
  • Chat template auto-detection from GGUF metadata with manual override dropdown (ChatML, Llama 3, Mistral, Gemma, Phi-3, DeepSeek, etc.)
  • Context overflow handling — oldest messages automatically pruned when conversation exceeds context window
  • Model status events — real-time loading/ready/error/unloaded status via Tauri events
  • Update notification toast with download button when a new version is available
  • Actionable toast notifications (toasts can now have clickable action buttons)

For Geeks

  • `GgufModelInfo` struct with full GGUF header metadata (params, layers, heads, embedding dim, architecture, chat template)
  • `InferenceEngine::format_prompt()` uses `apply_chat_template()` from llama.cpp with ChatML fallback
  • `InferenceEngine::truncate_to_fit()` for pair-wise context pruning
  • `model-status` Tauri event channel with `ModelStatusEvent` payload
  • `AppConfig` extended with `chat_template` option for manual override
  • `generate()` accepts configurable `n_threads` and `context_length` from config
  • `ModelMetadata` enriched with `context_length`, `architecture`, `chat_template` fields (backward-compatible via `#[serde(default)]`)
  • Auto-update check via `https://orionpod.com/api/latest.json` on app launch
  • `check_for_updates` Rust IPC command with semver-aware version comparison
  • `useUpdateCheck` hook (5s delayed, silent fail, non-blocking)
  • `update-web-release.cjs` script for automated release metadata updates
  • Changelog auto-extraction from `CHANGELOG.md` into `releases.js`

v0.2.1-rc1

Kusanagi Deprecated
2026-03-18

First public release. Metal GPU acceleration, HuggingFace model browser, real-time observability.

macOS

Universal (Apple Silicon + Intel) · ~30 MB

OrionPod_0.2.1_universal.dmg
Changelog

Changes

  • Chat interface with streaming responses and markdown rendering
  • HuggingFace model browser with hardware compatibility filtering
  • GGUF model support — download from HuggingFace or upload local files
  • Real-time observability dashboard (tokens/s, memory, latency, GPU usage)
  • Metal GPU acceleration on Apple Silicon
  • Model parameter controls (temperature, context length, top-p, top-k)
  • Toast notifications and user-friendly error messages
  • Glassmorphism UI with macOS vibrancy

For Geeks

  • Tauri v2 + React + TypeScript + Rust + llama.cpp
  • orion-core agent harness crate (backend-agnostic)
  • Universal macOS binary (Apple Silicon + Intel), ~30 MB
  • Starts in under 2 seconds, <50 MB RAM idle
  • Zero telemetry, zero analytics, zero cloud dependencies

System Requirements

  • macOS 10.15 (Catalina) or later
  • Apple Silicon (M-series) recommended for Metal GPU acceleration
  • 8 GB RAM minimum for 7B models

Open the DMG → drag OrionPod to Applications → launch. That's it.