What it does
Transcribes audio into text across 90+ languages with speaker diarization, word-level timestamps, and non-speech event tagging.
Key features
- Sub-150ms realtime latency (v2 Realtime)
- Speaker diarization
- Word-level timestamps
- Automatic language detection
- EU/India data residency
Integrations
REST APIWebSocket streamingElevenAgents