COMPANY PROFILE · /mistral-voxtral

Mistral Voxtral

Voxtral is a family of open-weight audio AI models developed by Paris-based Mistral AI, encompassing speech-to-text (ASR = Automatic Speech Recognition), speech understanding, and text-to-speech (TTS) capabilities. First released in July 2025 as Mistral's debut audio model line, the family has expanded to include Voxtral Transcribe 2 (February 2026) and Voxtral TTS (March 2026), positioned as lower-cost open alternatives to closed providers such as OpenAI Whisper and ElevenLabs.

speech-to-text text-to-speech audio-ai open-weight-models Foundation Models / LLM Providers

mistral.ai ↗· Paris, France· founded 2023reported

Aging · 48d ago

Latest valuation

$13.8B

Annual Recurring Revenue

~$100M-$200M (Mistral AI parent, reported)

Headcount

~500-1,093 (Mistral AI parent)

Funding total

$3B+

OVERVIEW

Voxtral originally launched July 15, 2025 with Voxtral Small (24B) and Voxtral Mini (3B) under Apache 2.0. Voxtral Transcribe 2 (Mini Transcribe V2 + Realtime) launched February 4, 2026, expanding language support to 13 and adding speaker diarization, context biasing and sub-200ms streaming latency. Voxtral TTS launched March 26, 2026 as a 4B-parameter multilingual text-to-speech model under CC BY-NC 4.0. Parent Mistral AI raised a €1.7B Series C in September 2025 led by ASML (€1.3B for ~11% stake) at a €11.7B/$13.8B valuation, and added $830M in debt financing in March 2026. CMA CGM is a notable customer using Voxtral for media workflows.

Status

active

Tier

master-list

Profitability

unprofitable

Pricing

Usage-based API: Voxtral transcription from $0.001/min; Voxtral Mini Transcribe V2 $0.003/min; Voxtral Realtime $0.006/min; Voxtral TTS $0.016 per 1k characters. Open weights free under Apache 2.0 or CC BY-NC 4.0.

FUNDING HISTORY · 0

No funding rounds on file.

PRODUCTS · 2

Mistral Studio Audio Playground

developer-tooling

Web-based playground in Mistral Studio for testing Voxtral transcription and TTS models, including diarization, timestamps and context biasing.

Le Chat Voice Mode (Voxtral-powered)

consumer-assistant

Voice interaction mode within Mistral's Le Chat assistant, allowing users to record or upload audio, get transcriptions, ask questions, or generate summaries.

MODELS · 5

Voxtral Small

Voxtral · 2025-07-15