Model

Voxtral TTS

by Mistral Voxtral

Family

Voxtral

Context window

—

Open weights

Yes

Release date

2026-03-26

Benchmark notes

4B parameter multilingual text-to-speech model supporting 9 languages (English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, Arabic); zero-shot voice cloning from ~3 seconds of reference audio; reported time-to-first-audio ~90ms; human evaluations indicate parity with ElevenLabs v3 and superior naturalness vs ElevenLabs Flash v2.5; API at $0.016 per 1k characters.