Model

Voxtral TTS

Family
Voxtral
Context window
Open weights
Yes
Release date
2026-03-26

Benchmark notes

4B parameter multilingual text-to-speech model supporting 9 languages (English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, Arabic); zero-shot voice cloning from ~3 seconds of reference audio; reported time-to-first-audio ~90ms; human evaluations indicate parity with ElevenLabs v3 and superior naturalness vs ElevenLabs Flash v2.5; API at $0.016 per 1k characters.