Deepgram’s Aura AI: Text-to-Speech for Voice AI Agents
Deepgram a San Francisco, CA-based company has today launched their text-to-speech model, Deepgram Aura AI. Aura contains diverse realistic human-like voices with lower latency that allow developers to build API for conversational AI agents and applications.
“We are thrilled to launch Aura, our text-to-speech API, to the public after seeing the overwhelming demand for our early access product in the fall. Aura is the result of years of research and development by our team of world-class AI scientists and engineers, who have leveraged the latest advances in deep learning and GPU technology to create a state-of-the-art TTS solution that outperforms anything else on the market,” said Scott Stephenson, CEO and co-founder of Deepgram.
In December 2023, Deepgrams announced about Aura saying that soon they would release the Faster and more efficient text-to-speech AI model. But at that time no one expected the announcement of Deepgram’s Aura AI for general use.
Also read – Google’s Employees Struggling with Wi-Fi Issues at New “Bay View” Campus
Aura AI: New Voice Platform
With Aura, Deepgram has introduced a complete set of APIs for the developers so that they can create powerful voice AI platforms. Aura consists of three main components to respond in conversational interactions including Listening, Thinking, and Speak.
Listen: As per Deepgram, Aura is using a perceptive AI model so that it can accurately transform the audio into text. It will later help the model to better understand what the user is asking of it.
Think: Aura is using different abstractive LLM to understand dozens of human-like voices. It is helping the model in better understanding and retrieval of information.
Speak: using the LLM model, Aura is efficiently speaking with the human speakers as if they are speaking to another person.
“Aura is already being used in production by several of our customers.” – said Josh Fox in his blog post.
Deepgram’s Aura AI opens for General use:
Aura is now available for general use for everyone who has a general Deepgram key, the developers can now test their API to build more interactive speech-to-text models.
Also read – 5 Most Terrifying AI Systems In Science Fiction
“Customers will initially have a choice amongst 12 English-speaking voices (7 male, 5 female) with additional voices planned for future releases. All of our voices are trained on high-quality conversational datasets and have average response times below 250 ms for typical dialogue sequences. Aura will follow Deepgram’s standard usage-based pricing scheme and starts at just $0.015/1K characters.”- said Josh Fox.
The company claims that Aura is capable of replicating human dialogues and can also generate hesitation sounds like “uh”..” um” and other emotions in its tone to make the conversation more interactive for humans.
The company said that Deepgram’s Aura AI is already used by some of its partners to create better APIs for their models. One of its partners Humah has provided feedback for using Aura.
Tim Houlne, CEO at Humach said -” When we switched from a cloud vendor’s transcription service to Deepgram’s Nova-2, we saw a notable leap in transcription accuracy and responsiveness. Now, with Aura’s text-to-speech, we’re achieving speeds 2-5 times faster than competitors, while delivering the voice quality and latency needed for low handle times and first-call resolutions. Deepgram’s robust infrastructure serves high-quality, reliable models that excel in supporting our seasonal traffic for retail, utility, travel, and healthcare use cases”.
The company has now welcomed the developers to use Aura to achieve better speed and performance at a relatively low price.