CATEGORY · 48 COMPANIES

Voice / Audio / Music

Text-to-speech, speech-to-text, voice agents, and music generation.

SORT
#CompanyValuationARRHeadcountFresh
01GGoogle DeepMind (Gemini)Alphabet's AI division producing the Gemini model family plus Veo (video), Imagen (image), and Lyria (music), integrated across Google products.Not separately broken outNot separately reported02SSunoAI music generation platform creating full songs from prompts, with the Suno Studio multitrack workstation.$200M ARR (at round)Not separately reported03SSierraEnterprise voice and chat customer-experience agent platform using outcome-based pricing (~$1.50 per resolution).$150M+ ARR (Jan 2026), from $26M end-2024Not separately reported04OOtter.aiMeeting notes and summaries05SStability AIOpen-source generative AI company developing Stable Diffusion and multimodal models for text-to-image, video, audio, and 3D generation. Raised $80M Series A extension (June 2024) at $1B valuation. Total funding $181M+ with leadership transition to Prem Akkaraju (ex-Weta Digital CEO, June 2024). 150M+ Stable Diffusion downloads.$1B184-192green06GGradiumGradium is developing next-generation expressive artificial intelligence voice models capable of conveying emotion, nuance, and personality in speech synthesis. The company focuses on building voice models that sound more natural and human-like than existing text-to-speech technologies.green07VVariationalVariational develops AI infrastructure and optimization technologies for large-scale model training and deployment. The company focuses on efficiency improvements and cost reduction for AI workloads.green08AAssemblyAIUniversal streaming speech recognition09DDavid AIDavid AI is an audio data research company building the data layer for voice AI. The startup creates high-quality, full-duplex channel-separated speech training datasets that address the critical bottleneck in voice AI model development, supporting cutting-edge voice production systems and frontier research.green10DDeepgramSpeech-to-text and voice agents11SSpeechmaticsMultilingual speech recognition12WWIZ.AIEnterprise GenAI platform specializing in multilingual voice agents and conversational AI for Southeast Asia. Develops localized LLMs for Bahasa Indonesian and regional languages, enabling Southeast Asian businesses to deploy culturally-tuned AI customer engagement solutions.~109green13RRetell AIVoice agent platform14PPolyAIEnterprise call-center voice15CCartesiaSonic low-latency text-to-speech16SSign AI<cite index="27-1,27-2,27-3">Sign AI is a Deaf-led company founded and guided by Deaf professionals with co-founders having disabilities, ensuring cultural accuracy and community trust are at the heart of everything built</cite>. <cite index="28-21">Sign AI explores real-time AI for sign language recognition and generation, including sign-to-text and text-to-sign technology</cite>.green17MMurf AIMurf AI is an enterprise-grade AI voice generation platform offering 120+ professional voices in 20+ languages with advanced customization controls. The platform includes video dubbing, voice cloning, and AI video generation capabilities tailored for business content creation.green18NNewo.aiNewo.ai provides AI voice agents that answer missed calls and capture customer leads for small-to-medium businesses. The platform uses Zero-Hallucination Architecture with multiple verification agents and integrates seamlessly into existing systems via the business's current phone number, email, and CRM integrations.green19iiFlytek<cite index="37-24,37-25">iFlytek is a partially state-owned Chinese information technology company established in 1999. It creates voice recognition software and 10+ voice-based internet/mobile products covering education, communication, music, intelligent toys industries.</cite> The company has pivoted into large language models (iFlytek Spark/Xinghuo) and multimodal agents, with widespread deployment in healthcare diagnostics, education, and smart devices.Market cap varies; substantial public enterpriseUndisclosed; multi-billion yuan revenue at enterprise scale3,000+green20GGigaGiga builds enterprise voice AI agents for customer support automation with real-time orchestration capabilities that handle listening, understanding, reasoning, database queries, and response generation within sub-500ms latency. The platform enables companies to deploy production-grade voice agents in under two weeks by ingesting support knowledge bases and historical transcripts to generate policy-compliant, emotionally aware agents.50+green21UUdioAI music generation22RResemble AIVoice cloning platform23RRimeRime develops ultra-realistic text-to-speech and speech synthesis models for enterprise applications. The company builds its proprietary Arcana and Mist models trained on a massive dataset of conversational speech, enabling natural voices with emotional expressiveness, laughter, breathing patterns, and code-switching capabilities that power tens of millions of conversations monthly across customer support, healthcare, and food service sectors.~20green24BBland AIPhone call automation25RRev.comHybrid human and AI transcription26SSplice (with Kits AI acquisition)Music creation platform combining royalty-free sample library (3M+ licensed sounds) with AI tools. Recent acquisitions (Spitfire Audio $50M Apr 2025, Kits AI Jan 2026) position it as a comprehensive music production ecosystem with ethical AI voice cloning, stem generation, and sample variation tools trained on licensed data.$500M (2021 Series D)349green27SSesameNatural conversational voice models28RRiffusionAI music generation platform using spectrogram-based image-to-audio generation. Enables users to create short 'riffs' with vocals by entering lyrics and musical styles. Recently launched subscription membership for fast generation features and community access.6green29WWispr FlowWispr makes Flow, an AI voice-dictation app that turns natural speech into formatted text across desktop and mobile applications. The company positions Flow as the foundation of a broader voice operating system for hands-free computing.$700M~$10M ARR (estimated)green30AAISpeech<cite index="42-1,42-3,42-4,42-5">AISpeech is a high-tech startup specialized in computer speech recognition and analysis, a prominent speech technology platform company in China specializing in providing advanced speech interaction solutions for enterprises and developers, offering a range of products including the DUI platform, enterprise business assistants, and a human-machine dialogue operating system, serving a diverse clientele from large enterprises seeking to enhance their customer service capabilities to developers.</cite>Private1,000green31IIFlytek Enterprise (讯飞智企)Speech recognition and AI assistant platform for enterprise. Listed on Shanghai Stock Exchange. Government-designated 'national AI team' by State Council. Provides NLP, voice, and customer service AI agents for finance, healthcare, and government.~$8B+ (market cap, 2024)~$1.2B+ (2024)5000+green32SSignapseGenerative AI sign language translation platform using photo-realistic avatars to translate video, web, and live content into British Sign Language (BSL) and American Sign Language (ASL). Deployed at major transport hubs including LNER train stations (UK) and Cincinnati/Northern Kentucky International Airport.$6.6M~15green33AAvaAI-powered live transcription and captioning app for deaf and hard-of-hearing users, converting speech to text in real-time across calls, meetings, and events. Supports speaker identification, multiple languages, and integrates with Zoom and Google Meet.~25green34IIconicVoice-first interactive entertainment platform powered by on-device AI. Converts natural voice input into immersive gameplay experiences without visual rendering, enabling audio-only or lightweight AI game worlds.15-25 (estimated)green35AAiva TechnologiesAI music composition platform that generates emotive soundtracks for films, games, commercials, and TV. Trained on classical composers like Mozart and Beethoven, Aiva achieved SACEM composer status recognizing its AI-composed work as professionally viable music.$5M~9 (as of 2021)green36QQ ConciergeVoice AI digital concierge system for hotels that acts as a 24/7 hotel front desk handling guest calls and inquiries. Uses natural language understanding to handle booking inquiries, answer complex questions, and process reservations with human-like voice interaction. Captures revenue from missed calls and improves guest response times.green37EEndel Sound GmbHGenerative AI platform that creates personalized, adaptive soundscapes for focus, relaxation, sleep, and wellness. Uses circadian rhythms, biometric data (heart rate, movement), location, weather, and time of day to generate unique, real-time soundscapes. Partners with major labels (Warner, Universal, Spinnin' Records) and consumer platforms (Alexa, Apple Watch).~$1.5M+ monthly listening hours (1M+ active users reported)~30green38IIntellaArabic speech-recognition and analytics platform specializing in dialect transcription (Egyptian, Gulf, Levantine). Trains models on regional audio from call centres and media archives. Offers real-time transcription, sentiment/intent analysis, voicebots, and agent-assist via APIs.green39BBotlhale AISpeech and language model builder for Southern Africa specializing in understanding code-switching, slang, and colloquial speech. Offers contact-centre AI stack (transcription, intent detection, routing, analytics) via APIs for banking/telecom.green40RRespeecherRespeecher provides professional AI voice synthesis and voice conversion solutions for entertainment, broadcast, and enterprise applications. The platform uses deep neural networks to transform voices while preserving emotional nuance, with a focus on consent-verified, rights-cleared voice models. Recently launched a live Text-to-Speech API with 200ms latency.30+green41EEndelGenerative AI wellness soundscape platform that creates personalized, adaptive audio for focus, relaxation, and sleep based on real-time environmental inputs (weather, heart rate, time of day). Signed distribution deals with Warner Music Group and Universal Music Group; first algorithm to release music on a major label.not publicly disclosed~$2-3M (estimated from 4M users)50-70 (estimated)green42SSoundfulAI music generation platform enabling creators to produce royalty-free background music across genres like EDM, hip-hop, and pop. Uses ethically trained AI on licensed/produced-in-house sounds with exclusive licensing options.green43BBeatoven.aiAI music generation platform for video/podcast creators using mood-based and multimodal prompts. Recently launched Maestro model trained on fully licensed data with artist-payout infrastructure via partnership with Musical AI. Supports sound effects and stems export.~$83K (INR 83.3L, as of Mar 2025)16green44BBoomyAI-powered music creation platform enabling users to generate original songs by selecting genres, with direct distribution to Spotify and other DSPs. Freemium model with commercial licensing tiers. Expanded product line to include LoopMagic (professional sample/loop generation with producer !llmind).~$4.1M (as of 2025)16green45SSoundHound AISoundHound AI (Nasdaq: SOUN) is a publicly traded leader in voice and conversational AI with 20+ years of proprietary technology. The company delivers voice agents for customer service, automotive, restaurants, smart devices, and financial services, with Amelia 7.0 providing full agentic AI capabilities and billions of annual interactions.public market cap$170M run-rate (Q2 2025: $42.7M, 217% YoY growth)green46RRiiDRiiD is a South Korean AI EdTech company providing adaptive language learning through its Santa AI tutor platform, focusing on English proficiency tests (TOEIC). The platform uses AI to analyze weakness areas, predict test scores in minutes, and recommend personalized study curricula with video lectures and practice questions.green47VVoiceitt<cite index="13-2,13-3">Voiceitt specializes in creating patented automatic speech recognition (ASR) technology designed specifically for people facing challenges with speech disabilities, aging voices, and accents, leveraging cutting-edge machine learning methods and a unique database of non-standard speech patterns</cite>. <cite index="21-2">Voiceitt is a stand-alone Web app supporting communication with people and with technology</cite>.20-50green48SSuperDialSuperDial builds voice AI agents to automate high-friction phone calls in healthcare billing and provider operations. The company emerged from SuperBill, a revenue cycle management startup, and replaces hours of manual insurer calls with end-to-end AI agents. SuperDial navigates phone trees, processes holds, and conducts complex negotiations between providers and insurance companies.green
Voice / Audio / Music · AIDB