Benchmark

Last updated 4 min read

evoglyph's pitch is speed and locality, not adjectives. This page shows the numbers behind that pitch and how they compare to the named alternatives developers usually evaluate. evoglyph's numbers come from our own evaluation harness. Competitor numbers are pulled from the vendor's site or independent reviews, with retrieval dates so you can verify them yourself.

How evoglyph compares

Four products, seven rows. evoglyph's numbers come from our internal eval run on 2026-05-11; competitor numbers are linked to the source we pulled them from on the same day.

Latency and accuracy figures are not measured on a common harness: evoglyph rows are our measurements; competitor rows are figures reported elsewhere, cited below.

Metric evoglyph Wispr Flow Aqua Voice Apple Dictation
p50 latency 85ms (engine-time)1 ~700ms (end-to-end, as reported by Voibe review)3 450ms (end-to-end, as reported by Voibe/Aqua)4 150ms to 400ms5
p95 latency 124ms (engine-time)1 unconfirmed unconfirmed unconfirmed
WER on LibriSpeech test-clean 2.41% (our harness)1 unconfirmed 2.7% (as reported by Aqua; AISpeak benchmark, different methodology)6 ~8% WER (as reported by MacRumors; different benchmark, different methodology)7
Empty-output rate on speech 0% (0/100)1 unconfirmed unconfirmed unconfirmed
Silence hallucination rate 0% (0/5 clean)1 unconfirmed unconfirmed unconfirmed
Local-only? Yes2 No8 No9 Configurable (per Apple's Settings check)10
Cloud round-trip required? No2 Yes8 Yes9 Not when on-device processing is enabled10

On evoglyph, the cleanup-pass latency adds 207ms p50 and 1019ms p95 on top of the speech-to-text numbers above when AI cleanup is enabled. Cleanup is local too. Both stages run on-device.11

Methodology

evoglyph's latency, WER, empty-output, and silence-hallucination numbers come from a 105-clip public evaluation set: 75 clips from LibriSpeech test-clean, 20 from the VCTK corpus, 5 silence clips, and 5 vocabulary-dense clips. Hardware is an Apple M4 Max with 64 GB of unified memory. The reference transcripts and the model output are both passed through OpenAI's EnglishTextNormalizer before WER is computed, which prevents punctuation and contraction artefacts from inflating the error rate.1

The cleanup-stage numbers come from a separate 117-fixture editorial-cleanup evaluation, run on the same hardware on 2026-05-11 against the production prompt and LoRA bundle that ship in the app.11

How we measure

The evoglyph p50 and p95 latency rows above measure speech-to-text engine time on the Apple Neural Engine. The user-facing perf metric in everyday use is the total pipeline time from hotkey release to text appearing in the focused app: that combines the speech-to-text stage, the cleanup stage if AI cleanup is on, and the text-injection stage. The 85ms speech-to-text figure is the part competitors generally publish; we surface it here for a like-for-like comparison. The full pipeline figure including cleanup is what you actually feel, and evoglyph's worst case on long inputs sits in the seconds range, like every other AI-cleanup tool in the category. Turn cleanup off and evoglyph returns to the 85ms regime.

Why the cloud round-trip matters

Wispr Flow and Aqua Voice both run transcription on their own servers. That is a structural design decision, not a tradeoff that can be patched away. Every dictation pays for the network: TLS handshake, audio upload, server inference, text download. Independent reviews put Wispr Flow at roughly 700 milliseconds end-to-end; Aqua Voice publishes 450 milliseconds. evoglyph runs the entire pipeline on the Apple Neural Engine inside your process, so there is no round trip to pay for and no privacy story to debate. The numbers above are the consequence of that architectural choice, not a marketing claim.

Sources

  1. evoglyph speech-to-text numbers (latency, WER, empty-output rate, silence hallucination): internal eval results, app-repo eval/results/v5p6-shipping-baseline-2026-05-11.md and README Evaluation section (#parakeet-vs-whisperkit-105-public-clips). Hardware: Apple M4 Max, 64 GB. Eval set: 105 public clips (75 LibriSpeech test-clean + 20 VCTK + 5 silence + 5 vocab-dense). 2026-05-11.
  2. evoglyph architecture (local-only, proprietary): Privacy and local-first docs page. 2026-05-11.
  3. Wispr Flow latency, ~700ms end-to-end (cloud round trip): Voibe Wispr Flow review. Retrieved 2026-05-11.
  4. Aqua Voice latency, 450ms response time: Voibe Aqua Voice pricing review; the Aqua site itself uses "sub-second latency". Retrieved 2026-05-11.
  5. Apple Dictation / SpeechAnalyzer latency, 150ms to 400ms: Dictato engine benchmark. Retrieved 2026-05-11.
  6. Aqua Voice "Avalon" model accuracy on AISpeak benchmark (97.3% accuracy, i.e. ~2.7% WER): aquavoice.com homepage. Note this is a different benchmark than LibriSpeech test-clean and is the vendor's own published number, not an independent measurement. Retrieved 2026-05-11.
  7. Apple SpeechAnalyzer WER ~8% (CER ~3%): MacRumors transcription benchmark. The benchmark dataset is not LibriSpeech test-clean and the methodology differs, so this row is a like-with-caveats comparison. Retrieved 2026-05-11.
  8. Wispr Flow cloud transcription: wisprflow.ai/privacy: "transcription always happens in the cloud to provide the best speed and accuracy". Retrieved 2026-05-11.
  9. Aqua Voice cloud architecture: Hacker News launch thread for Aqua Voice 2, where the developers said local models could not yet meet their quality bar at speed. Retrieved 2026-05-11.
  10. Apple Dictation on-device processing: Apple support page. Apple notes you can check Keyboard settings to see whether voice input is "processed on your device and not sent to Siri servers". Retrieved 2026-06-06.
  11. evoglyph cleanup-stage latency (p50 207ms, p95 1019ms), measured against the 117-fixture editorial-cleanup eval set on the production prompt and LoRA bundle that ship in the app: internal eval results, app-repo eval/results/v5p6-shipping-baseline-2026-05-11.md. Hardware: Apple M4 Max, 64 GB. 2026-05-11.

Methodology availability

The evaluation methodology is documented on this page. Raw per-clip results from the 105-clip speech-to-text eval and the 117-fixture cleanup eval are available on request via hello@evoglyph.com. We will not update this page silently when results change — new baselines get a new dated entry in our internal eval/results/ archive and a new citation link from this page.