Amazon is rushing to transplant Alexa’s ‘brain’ using generative artificial intelligence
Amazon is preparing to relaunch its Alexa voice digital assistant as an artificial intelligence “agent” that can perform practical tasks, as the tech group races to address challenges that have hampered the overhaul of AI systems.
The $2.4 trillion company has spent the past two years working to redesign Alexa, its conversational system embedded in 500 million consumer devices worldwide, so that the software’s “brain” is transplanted with generative artificial intelligence.
Rohit Prasad, who leads the Artificial General Intelligence (AGI) team at Amazontold the Financial Times that the voice assistant still has several technical hurdles to overcome before being rolled out.
This includes addressing issues of “hallucinations” or fabricated responses, response speed or “latency” and reliability. “Hallucinations have to be close to zero,” Prasad said. “It’s still an open issue in the industry, but we’re working extremely hard on it.”
The vision of Amazon leaders is to transform Alexa, which is currently still used for a narrow set of simple tasks such as playing music and setting alarms, into an “agent” product that acts as a personalized concierge. This can include anything from suggesting restaurants to configuring bedroom lights based on a person’s sleep cycle.
A redesign of Alexa has been underway since the launch of OpenAI’s ChatGPT, backed by Microsoft, in late 2022. While Microsoft, Google, Meta and others have quickly built generative artificial intelligence into their computing platforms and improved their software services, critics have wondered whether whether Amazon solves its technical and organizational problems in time to compete with its rivals.
According to multiple employees who have worked on Amazon’s voice assistant teams in recent years, their effort was fraught with complications and followed years of AI research and development.
Several former employees said the long wait for the rollout was largely due to unexpected difficulties involved in switching and combining the simpler, predefined algorithms on which Alexa was built with more powerful but unpredictable big-language models.
In response, Amazon said it is “working hard to enable even more proactive and capable assistance” from its voice assistant. It added that the technical implementation of this scale, into a live service and a suite of devices used by customers around the world, was unprecedented and not as simple as overlaying LLM on the Alexa service.
Prasad, the former chief architect of Alexa, said last month’s release of the company’s internal Amazon Nova models — led by its AGI team — was motivated in part by specific needs for optimal speed, cost and reliability to help AI applications like Alexa.” to get to the last mile, which is really difficult”.
To work as an agent, Alexa’s “brain” must be able to call hundreds of third-party software and services, Prasad said.
“Sometimes we underestimate how many services are integrated into Alexa, and it’s a huge number. These apps get billions of requests per week, so when you’re trying to make reliable actions fast . . . you have to be able to do it in a very cost-effective way,” he added.
The complexity comes from Alexa users who expect fast responses as well as extremely high levels of accuracy. Such qualities are at odds with the inherent probabilistic nature of today’s generative artificial intelligence, the statistical software that predicts words based on speech and language patterns.
Some former employees also point to struggles to preserve the assistant’s original attributes, including its consistency and functionality, while imbuing it with new generative features such as creativity and free dialogue.
Because of the more personalized, chatty nature of LLM, the company also plans to hire experts to shape the AI’s personality, voice and diction so that it remains familiar to Alexa users, according to one person familiar with the matter.
One former senior member of the Alexa team said that while LLMs are very sophisticated, they come with risks, such as giving answers that are “sometimes completely made up”.
“The way Amazon operates, this could happen a large number of times a day,” they said, damaging its brand and reputation.
In June, Mihail Eric, a former machine learning scientist at Alexa and a founding member of its “conversation modeling team,” said publicly that Amazon “dropped the ball” to become the “unequivocal market leader in conversational AI” with Alexa.
Eric said that despite strong scientific talent and “enormous” financial resources, the company was “riddled with technical and bureaucratic problems”, suggesting that “data was poorly labelled” and that “documentation was either non-existent or out of date”.
According to two former employees who worked on the artificial intelligence associated with Alexa, the historical technology underlying the voice assistant was inflexible and difficult to change quickly, burdened by a clunky and disorganized code base and an engineering team that was “too thin.”
The original Alexa software, built on top of technology acquired from British start-up Evi in 2012, was a question-answering machine that worked by searching within a defined universe of facts to find the right answer, such as the weather forecast for the day or a particular song in your music library.
The new Alexa uses a number of different artificial intelligence models to recognize and translate voice queries and generate responses, as well as identify rule violations, such as collecting inappropriate responses and hallucinations. Building software to translate between legacy systems and new AI models was a major hurdle in the Alexa-LLM integration.
The models include Amazon’s own in-house software, including the latest Nova models, as well as Claude, an AI model from start-up Anthropic, in which Amazon invested More than 8 billion dollars during the past 18 months.
“[T]The most challenging thing about AI agents is making sure they are safe, reliable and predictable,” Anthropic CEO Dario Amodei told the FT last year.
Agent-like artificial intelligence software needs to get to a point “where . . . people can actually trust the system,” he added. “When we get to that point, we’ll release these systems.”
One current employee said additional steps are needed, such as rolling out child safety filters and testing custom integrations with Alexa, such as smart lights and the Ring doorbell.
“Reliability is an issue — getting it to work almost 100 percent of the time,” the employee added. “That’s why you see us. . . either Apple or Google deliver slowly and incrementally.”
A number of third parties developing “skills” or features for Alexa said they were unsure when the new AI-enabled generative device would be introduced and how to create new functions for it.
“We are waiting for details and understanding,” said Thomas Lindgren, co-founder of Wanderword, which develops Swedish content. “When we started working with them, they were much more open. . . then they changed over time.”
Another partner said that after an initial period of “pressure” on developers by Amazon to start preparing for the next generation of Alexa, things quieted down.
An ongoing challenge for Amazon’s Alexa team — which was hit by massive layoffs in 2023 — is how to make money. Figuring out how to make assistants cheap enough to work in large numbers will be a big task, said Jared Roesch, co-founder of generative AI group OctoAI.
Options being discussed include creating a new Alexa subscription service or cutting back on sales of goods and services, a former Alexa employee said.
Prasad said Amazon’s goal was to create various AI models that could act as “building blocks” for various applications beyond Alexa.
“What we’re always based on is clients and practical AI, we don’t do science for science’s sake,” Prasad said. “We’re doing this. . . deliver value and impact to customers, which in this era of generative artificial intelligence is becoming more important than ever as customers want to see a return on investment.”