CATEGORY · 48 COMPANIES

Voice / Audio / Music

Text-to-speech, speech-to-text, voice agents, and music generation.

SORT

Most viewed A → Z Newest Recently researched

#CompanyValuationARRHeadcountFresh

01GGoogle DeepMind (Gemini)Alphabet's AI division producing the Gemini model family plus Veo (video), Imagen (image), and Lyria (music), integrated across Google products.—Not separately broken outNot separately reported—02SSierraEnterprise voice and chat customer-experience agent platform using outcome-based pricing (~$1.50 per resolution).—$150M+ ARR (Jan 2026), from $26M end-2024Not separately reported—03WWIZ.AIEnterprise GenAI platform specializing in multilingual voice agents and conversational AI for Southeast Asia. Develops localized LLMs for Bahasa Indonesian and regional languages, enabling Southeast Asian businesses to deploy culturally-tuned AI customer engagement solutions.——~109amber 04DDavid AIDavid AI is an audio data research company building the data layer for voice AI. The startup creates high-quality, full-duplex channel-separated speech training datasets that address the critical bottleneck in voice AI model development, supporting cutting-edge voice production systems and frontier research.———amber 05SSplice (with Kits AI acquisition)Music creation platform combining royalty-free sample library (3M+ licensed sounds) with AI tools. Recent acquisitions (Spitfire Audio $50M Apr 2025, Kits AI Jan 2026) position it as a comprehensive music production ecosystem with ethical AI voice cloning, stem generation, and sample variation tools trained on licensed data.$500M (2021 Series D)—349amber 06WWispr FlowWispr makes Flow, an AI voice-dictation app that turns natural speech into formatted text across desktop and mobile applications. The company positions Flow as the foundation of a broader voice operating system for hands-free computing.$700M~$10M ARR (estimated)—amber 07NNewo.aiNewo.ai provides AI voice agents that answer missed calls and capture customer leads for small-to-medium businesses. The platform uses Zero-Hallucination Architecture with multiple verification agents and integrates seamlessly into existing systems via the business's current phone number, email, and CRM integrations.———amber 08iiFlytek<cite index="37-24,37-25">iFlytek is a partially state-owned Chinese information technology company established in 1999. It creates voice recognition software and 10+ voice-based internet/mobile products covering education, communication, music, intelligent toys industries.</cite> The company has pivoted into large language models (iFlytek Spark/Xinghuo) and multimodal agents, with widespread deployment in healthcare diagnostics, education, and smart devices.Market cap varies; substantial public enterpriseUndisclosed; multi-billion yuan revenue at enterprise scale3,000+amber 09BBeatoven.aiAI music generation platform for video/podcast creators using mood-based and multimodal prompts. Recently launched Maestro model trained on fully licensed data with artist-payout infrastructure via partnership with Musical AI. Supports sound effects and stems export.—~$83K (INR 83.3L, as of Mar 2025)16amber 10SSuperDialSuperDial builds voice AI agents to automate high-friction phone calls in healthcare billing and provider operations. The company emerged from SuperBill, a revenue cycle management startup, and replaces hours of manual insurer calls with end-to-end AI agents. SuperDial navigates phone trees, processes holds, and conducts complex negotiations between providers and insurance companies.———amber 11RRimeRime develops ultra-realistic text-to-speech and speech synthesis models for enterprise applications. The company builds its proprietary Arcana and Mist models trained on a massive dataset of conversational speech, enabling natural voices with emotional expressiveness, laughter, breathing patterns, and code-switching capabilities that power tens of millions of conversations monthly across customer support, healthcare, and food service sectors.——~20amber 12SStability AIOpen-source generative AI company developing Stable Diffusion and multimodal models for text-to-image, video, audio, and 3D generation. Raised $80M Series A extension (June 2024) at $1B valuation. Total funding $181M+ with leadership transition to Prem Akkaraju (ex-Weta Digital CEO, June 2024). 150M+ Stable Diffusion downloads.$1B—184-192amber 13GGradiumGradium is developing next-generation expressive artificial intelligence voice models capable of conveying emotion, nuance, and personality in speech synthesis. The company focuses on building voice models that sound more natural and human-like than existing text-to-speech technologies.———amber 14MMurf AIMurf AI is an enterprise-grade AI voice generation platform offering 120+ professional voices in 20+ languages with advanced customization controls. The platform includes video dubbing, voice cloning, and AI video generation capabilities tailored for business content creation.———amber 15DDeepgramSpeech-to-text and voice agents————16SSunoAI music generation platform creating full songs from prompts, with the Suno Studio multitrack workstation.—$200M ARR (at round)Not separately reported—17BBoomyAI-powered music creation platform enabling users to generate original songs by selecting genres, with direct distribution to Spotify and other DSPs. Freemium model with commercial licensing tiers. Expanded product line to include LoopMagic (professional sample/loop generation with producer !llmind).—~$4.1M (as of 2025)16amber 18CCartesiaSonic low-latency text-to-speech————19SSoundHound AISoundHound AI (Nasdaq: SOUN) is a publicly traded leader in voice and conversational AI with 20+ years of proprietary technology. The company delivers voice agents for customer service, automotive, restaurants, smart devices, and financial services, with Amelia 7.0 providing full agentic AI capabilities and billions of annual interactions.public market cap$170M run-rate (Q2 2025: $42.7M, 217% YoY growth)—amber 20VVariationalVariational develops AI infrastructure and optimization technologies for large-scale model training and deployment. The company focuses on efficiency improvements and cost reduction for AI workloads.———amber 21GGigaGiga builds enterprise voice AI agents for customer support automation with real-time orchestration capabilities that handle listening, understanding, reasoning, database queries, and response generation within sub-500ms latency. The platform enables companies to deploy production-grade voice agents in under two weeks by ingesting support knowledge bases and historical transcripts to generate policy-compliant, emotionally aware agents.——50+amber 22SSign AI<cite index="27-1,27-2,27-3">Sign AI is a Deaf-led company founded and guided by Deaf professionals with co-founders having disabilities, ensuring cultural accuracy and community trust are at the heart of everything built</cite>. <cite index="28-21">Sign AI explores real-time AI for sign language recognition and generation, including sign-to-text and text-to-sign technology</cite>.———amber 23AAssemblyAIUniversal streaming speech recognition————24IIconicVoice-first interactive entertainment platform powered by on-device AI. Converts natural voice input into immersive gameplay experiences without visual rendering, enabling audio-only or lightweight AI game worlds.——15-25 (estimated)amber 25QQ ConciergeVoice AI digital concierge system for hotels that acts as a 24/7 hotel front desk handling guest calls and inquiries. Uses natural language understanding to handle booking inquiries, answer complex questions, and process reservations with human-like voice interaction. Captures revenue from missed calls and improves guest response times.———amber 26EEndelGenerative AI wellness soundscape platform that creates personalized, adaptive audio for focus, relaxation, and sleep based on real-time environmental inputs (weather, heart rate, time of day). Signed distribution deals with Warner Music Group and Universal Music Group; first algorithm to release music on a major label.not publicly disclosed~$2-3M (estimated from 4M users)50-70 (estimated)amber 27RResemble AIVoice cloning platform————28EEndel Sound GmbHGenerative AI platform that creates personalized, adaptive soundscapes for focus, relaxation, sleep, and wellness. Uses circadian rhythms, biometric data (heart rate, movement), location, weather, and time of day to generate unique, real-time soundscapes. Partners with major labels (Warner, Universal, Spinnin' Records) and consumer platforms (Alexa, Apple Watch).—~$1.5M+ monthly listening hours (1M+ active users reported)~30amber 29RRiiDRiiD is a South Korean AI EdTech company providing adaptive language learning through its Santa AI tutor platform, focusing on English proficiency tests (TOEIC). The platform uses AI to analyze weakness areas, predict test scores in minutes, and recommend personalized study curricula with video lectures and practice questions.———amber 30AAvaAI-powered live transcription and captioning app for deaf and hard-of-hearing users, converting speech to text in real-time across calls, meetings, and events. Supports speaker identification, multiple languages, and integrates with Zoom and Google Meet.——~25amber 31PPolyAIEnterprise call-center voice————32BBotlhale AISpeech and language model builder for Southern Africa specializing in understanding code-switching, slang, and colloquial speech. Offers contact-centre AI stack (transcription, intent detection, routing, analytics) via APIs for banking/telecom.———amber 33AAISpeech<cite index="42-1,42-3,42-4,42-5">AISpeech is a high-tech startup specialized in computer speech recognition and analysis, a prominent speech technology platform company in China specializing in providing advanced speech interaction solutions for enterprises and developers, offering a range of products including the DUI platform, enterprise business assistants, and a human-machine dialogue operating system, serving a diverse clientele from large enterprises seeking to enhance their customer service capabilities to developers.</cite>Private—1,000amber 34SSpeechmaticsMultilingual speech recognition————35OOtter.aiMeeting notes and summaries————36AAiva TechnologiesAI music composition platform that generates emotive soundtracks for films, games, commercials, and TV. Trained on classical composers like Mozart and Beethoven, Aiva achieved SACEM composer status recognizing its AI-composed work as professionally viable music.$5M—~9 (as of 2021)amber 37RRev.comHybrid human and AI transcription————38BBland AIPhone call automation————39RRetell AIVoice agent platform————40UUdioAI music generation————41RRespeecherRespeecher provides professional AI voice synthesis and voice conversion solutions for entertainment, broadcast, and enterprise applications. The platform uses deep neural networks to transform voices while preserving emotional nuance, with a focus on consent-verified, rights-cleared voice models. Recently launched a live Text-to-Speech API with 200ms latency.——30+amber 42RRiffusionAI music generation platform using spectrogram-based image-to-audio generation. Enables users to create short 'riffs' with vocals by entering lyrics and musical styles. Recently launched subscription membership for fast generation features and community access.——6amber 43SSesameNatural conversational voice models————44VVoiceitt<cite index="13-2,13-3">Voiceitt specializes in creating patented automatic speech recognition (ASR) technology designed specifically for people facing challenges with speech disabilities, aging voices, and accents, leveraging cutting-edge machine learning methods and a unique database of non-standard speech patterns</cite>. <cite index="21-2">Voiceitt is a stand-alone Web app supporting communication with people and with technology</cite>.——20-50amber 45IIntellaArabic speech-recognition and analytics platform specializing in dialect transcription (Egyptian, Gulf, Levantine). Trains models on regional audio from call centres and media archives. Offers real-time transcription, sentiment/intent analysis, voicebots, and agent-assist via APIs.———amber 46IIFlytek Enterprise (讯飞智企)Speech recognition and AI assistant platform for enterprise. Listed on Shanghai Stock Exchange. Government-designated 'national AI team' by State Council. Provides NLP, voice, and customer service AI agents for finance, healthcare, and government.~$8B+ (market cap, 2024)~$1.2B+ (2024)5000+amber 47SSignapseGenerative AI sign language translation platform using photo-realistic avatars to translate video, web, and live content into British Sign Language (BSL) and American Sign Language (ASL). Deployed at major transport hubs including LNER train stations (UK) and Cincinnati/Northern Kentucky International Airport.$6.6M—~15amber 48SSoundfulAI music generation platform enabling creators to produce royalty-free background music across genres like EDM, hip-hop, and pop. Uses ethically trained AI on licensed/produced-in-house sounds with exclusive licensing options.———amber