NO CLOUD • NO LATENCY • SECURE

Voice AI that runs on device.

Wake words, keyword spotting, speech-to-intent and more. Set up in hours.

Explore products →
No credit card required · Design partners welcome
Runs on
iOS Mac Android Microcontrollers Automotive Wearables Smart TV Embedded & IoT iOS Mac Android Microcontrollers Automotive Wearables Smart TV Embedded & IoT
Why us

Three things our SDK does differently.

Speed and battery, on real devices.

Inference runtime written from scratch in Rust — hand-tuned for ARM NEON and x86 SIMD. Production binaries in the hundreds of kilobytes, not megabytes. INT8 quantization by default. Predictable latency. Bounded battery cost.

Private by design.

Audio never leaves the user's phone. Models are encrypted at rest and bound to your license through per-customer key derivation. No cloud round-trip. No per-request fees. Works offline by default.

Trained to your domain.

Wake words, keyword vocabularies, and intent contexts are trained per customer from your spec. No generic assistant retrofitted to your product — a model that understands what your users actually say.

Products

Five voice primitives. One on-device runtime.

Compose them, ship them, and get a privacy story enterprise customers expect — without giving up latency or battery.

Speech-to-Intent · end-to-end

Available

Write a context spec — intents and slots in YAML — and we train a model that maps speech directly to structured intents. Audio to intent in one inference, no transcript stage. Lower latency, lower memory, better accuracy on your domain.

# your context intents: set_temperature: slots: [value, unit] # at runtime "Set the temperature to seventy-two degrees" { intent: "set_temperature", slots: { value: 72, unit: "degrees" } }

Voice Activity Detection

Available

Always-listening, sub-millisecond per frame on modern phones. The foundational primitive — gates wake-word and KWS for power efficiency, drives interruption logic, supports VAD-only use cases.

Custom Wake Word

Available

Train any wake phrase — "Hey YourBrand", "OK Product", multi-word activations. Robust to noise, distance, and accents through synthetic data and augmentation. Tunable confidence scoring.

Custom Keyword Spotting

Available

Detect a fixed vocabulary of voice commands — play, pause, next, louder, stop. Multi-class classifier with per-keyword confidence. Lower latency and compute than full ASR for closed-vocabulary control.

Streaming & batch ASR

Coming v1.1

Real-time English transcription with low-latency partial results, plus a higher-accuracy batch mode with punctuation and capitalization. Same on-device runtime — no cloud.

Multilingual training

Post v1

Pipeline supports English, French, German, Portuguese, Italian, and Spanish out of the box. Additional languages roll out as voice corpora and customer demand align.

Text-to-Speech

Planned · v2

On-device speech synthesis with multiple voices. For in-app responses, accessibility, and offline assistant experiences.

Speaker ID & Diarization

Planned · Conditional

Recognize enrolled speakers for personalization and access control. "Who spoke when" segmentation pursued where a quality on-device model is identified.

Devices

Mobile today. Everywhere voice runs, next.

Same model artifacts, same Rust runtime — new platform adapters. Tell us what you need to ship on and we'll prioritize.

Mobile

  • iOS 16+ · iPhone & iPad
  • Android 8.0+ (API 26)
Available v1

Apple ecosystem

  • macOS · Apple Silicon & Intel
  • tvOS · watchOS · visionOS
Coming v2

Desktop

  • Windows · x64 & ARM64
  • Linux · x64 & ARM64
  • Web · WebAssembly
Coming v2

Embedded & IoT

  • Raspberry Pi 3 / 4 / 5
  • NVIDIA Jetson
  • ARM SoC boards · kiosks
Coming v2

Microcontrollers

  • ARM Cortex-M4 / M7
  • Cortex-M33 / M55 / M85
  • no_std-compatible
Coming v2

Automotive

  • Android Automotive
  • Automotive Linux (AGL)
  • QNX
Coming v2

Wearables & TV

  • Wear OS · watchOS
  • Android TV / Google TV
  • tvOS
Coming v2

Server / hybrid

  • Linux x64 / ARM64
  • Same artifact, server-side
Coming v2

Most voice SDKs make you choose. We don't.

Either ship a large generic engine that bloats your app and drains battery, or send audio to the cloud and pay per request. Our runtime is purpose-built for tiny binary footprint and predictable latency on real ARM CPUs — every model shipped on-device, trained to your domain, licensed per app.

  • Hundreds of KB binaries — not megabytes.
  • No cloud round-trip. No per-detection fees.
  • Trained per customer from your spec.
Pre-launch

Onboarding design partners now.

Tell us what you want to build and which devices it has to run on. We'll get back within a business day.

No credit card required.

Ship voice features your users can rely on.

No privacy compromises, no network dependency, no per-detection costs.

No credit card required