The Secret Origins of Amazon’s Alexa (2024)

If you buy something using links in our stories, we may earn a commission. This helps support our journalism. Learn more. Please also consider subscribing to WIRED

This story is adapted from Amazon Unbound: Jeff Bezos and the Invention of a Global Empire, by Brad Stone.

Jeff Bezos first sketched out the device that would become the Amazon Echo on a conference room whiteboard in early 2011. He wanted it to cost $20 and be controlled entirely by voice. Its brains would live in the cloud, exploiting the company’s Web Services offerings and allowing Amazon to constantly improve it without requiring owners to upgrade their hardware.

The first-ever depiction of a device with Alexa—the artificially intelligent virtual assistant that Bezos would name after the ancient library of Alexandria—showed the speaker, a microphone, and a mute button. It wouldn’t be able to understand commands right out of the box, so the sketch identified the act of configuring the device to a wireless network as a challenge requiring further thought.

Greg Hart, who was Bezos’ technical adviser, or “TA,” at the time, was the other person in the meeting, and he was listening closely. Bezos said he wanted Hart to lead the group that would turn this somewhat outlandish notion for a voice computer into an actual product. Hart snapped a photo of the drawing with his phone.

“Jeff, I don’t have any experience in hardware, and the largest software team I’ve led is only about 40 people,” he recalls saying.

“You’ll do fine,” Bezos replied.

Hart thanked him for the vote of confidence and said, “OK, well, remember that when we screw up along the way.”

Jeff Bezos first sketched an Alexa device on a whiteboard in 2011.

Courtesy of Amazon

For the next three years, Bezos would remain intimately involved in the project. He authorized the investment of hundreds of millions of dollars before the first Echo was ever released, made detailed product decisions, and met with the team as frequently as every other day. Using the German superlative, employees referred to him as the über product manager.

But it was Hart who ran the effort, just across the street from Bezos’ office, in a building that housed the team working on the Kindle. Over the next few months, Hart hired a small group from inside and outside the company. Like his boss, he was obsessed with secrecy. He sent out vague emails to prospective hires with the subject line “Join my mission” and asked interview questions like “How would you design a Kindle for the blind?” He declined to specify what product candidates would be working on. One interviewee recalls guessing that it was Amazon’s widely rumored smartphone and says that Hart replied, “There’s another team building a phone. But this is way more interesting.”

The initial Alexa crew worked with a feverish sense of urgency. Unrealistically, Bezos wanted to release the device in six to 12 months. He would have a good reason to hurry. On October 4, 2011, just as the Alexa team was coming together, Apple introduced the Siri virtual assistant in the iPhone 4S. It was the last passion project of cofounder Steve Jobs, who died of cancer the next day. Hart and his team felt validated by the news that a resurgent Apple was also working on a voice-activated personal assistant, but they were discouraged by the fact that Siri was first to market and initially garnered some negative reviews.

The Amazon team tried to reassure themselves that their product was unique, since it would be independent from smartphones. They were also attempting to pull off a much more technically complex feat. Siri’s users spoke commands directly into microphones. Amazon was trying to build a service capable of understanding language spoken from across a noisy room, using a relatively immature technology called far-field speech recognition.

Most Popular

  • Gear

    The Top New Features Coming to Apple’s iOS 18 and iPadOS 18

    By Julian Chokkattu

  • Gear

    Give Your Back a Break With Our Favorite Office Chairs

    By Julian Chokkattu

  • Gear

    The Best Fitness Trackers and Watches for Everyone

    By Adrienne So

  • Gear

    Our Favorite Smartwatches Do Much More Than Just Tell Time

    By Julian Chokkattu

To speed up development, Hart and his crew went looking for startups to acquire. It was a nontrivial challenge, since Nuance, the Boston-based speech giant whose technology Apple had licensed for Siri (and which was recently acquired by Microsoft), had grown over the years by gobbling up the top American speech companies. Alexa execs tried to learn which of the remaining startups were promising by asking prospective targets to voice-enable the Kindle digital book catalog, then studying their methods and results. The search led to several rapid-fire acquisitions over the next two years, including the Polish startup Ivona.

Ivona was founded in 2001 by Lukasz Osowski, a computer science student at the Gdańsk University of Technology. Osowski had the notion that so-called text-to-speech, or TTS, could read digital texts aloud in a natural voice and help the visually impaired in Poland. With a younger classmate, Michal Kaszczuk, he took recordings of an actor’s voice and selected fragments of words, called diphones, and then blended or “concatenated” them together in different combinations to approximate natural-sounding words and sentences that the actor might never have uttered.

The Ivona founders got an early glimpse of how powerful their technology could be when they paid a popular Polish actor named Jacek Labijak to record hours of speech to create a database of sounds. The resulting product, which they called Spiker, quickly became the top-selling computer voice in Poland. Over the next few years, it was used widely in subways, elevators, and for robocall campaigns. Labijak subsequently began to hear himself everywhere, and regularly received phone calls in his own voice urging him, for example, to vote for a candidate in an upcoming election. Pranksters manipulated the software to have him say inappropriate things and posted the clips online, where his children discovered them. The Ivona founders then had to renegotiate the actor’s contract after he angrily tried to withdraw his voice from the software. (Today “Jacek” remains one of the Polish voices offered by AWS’ Amazon Polly computer voice service.)

In 2006, Ivona began to enter and repeatedly win the annual Blizzard Challenge, a competition for the most natural computer voice, organized by Carnegie Mellon University. By 2012, Ivona had expanded into 20 other languages and offered more than 40 voices. Hart and Al Lindsay, the first engineering manager on the project, visited them in Gdańsk on a trip they were taking through Europe to look for acquisition targets. “From the minute we walked into their offices, we knew it was a culture fit,” Lindsay says, pointing to Ivona’s progress in a field where researchers often get distracted by high-minded pursuits and have a difficult time shipping actual products. “Their scrappiness allowed them to look outside pure academia and not be blinded by science.”

The purchase, for around $30 million, was completed in 2012 but kept secret for a year. The Ivona team and the growing number of speech engineers Amazon would hire for its new Gdańsk R&D center were put in charge of crafting Alexa’s voice. The program was micromanaged by Bezos himself and subject to the CEO’s usual curiosities and whims.

Most Popular

  • Gear

    The Top New Features Coming to Apple’s iOS 18 and iPadOS 18

    By Julian Chokkattu

  • Gear

    Give Your Back a Break With Our Favorite Office Chairs

    By Julian Chokkattu

  • Gear

    The Best Fitness Trackers and Watches for Everyone

    By Adrienne So

  • Gear

    Our Favorite Smartwatches Do Much More Than Just Tell Time

    By Julian Chokkattu

At first, Bezos said he wanted dozens of distinct voices to emanate from the device, each associated with a different goal or task, such as listening to music or booking a flight. When that proved impractical, the team considered lists of characteristics they wanted in a single personality, such as trustworthiness, empathy, and warmth, and determined those traits were more commonly associated with a female voice.

To develop this voice and ensure it had no trace of a regional accent, the team in Poland worked with an Atlanta-area-based voice-over studio, GM Voices, the same outfit that had helped turn recordings from a voice actress named Susan Bennett into Apple’s agent, Siri. To create synthetic personalities for its customers, GM Voices gives voice actors hundreds of hours of text to read, from entire books to random articles, a mind-numbing process that could stretch on for months.

Believing that the selection of the right voice for Alexa was critical, Hart and colleagues spent months reviewing the recordings of various candidates that GM Voices produced for the project, and they presented the top picks to Bezos. The Amazon team ranked the best ones, asked for additional samples, and finally made a choice. Bezos signed off on it. Characteristically secretive, Amazon has never revealed the name of the voice artist behind Alexa. I learned her identity after canvasing the professional voice-over community: voice actress and singer Nina Rolle, who is based in Boulder, Colorado. Her professional website contains links to old radio ads for products such as Mott’s Apple Juice and the Volkswagen Passat—and the warm timbre of Alexa’s voice is unmistakable. Rolle said she wasn’t allowed to talk to me when I reached her on the phone in February 2021. When I asked Amazon to speak with her, they declined.

Alexa now had a voice, but it soon became clear that she needed a new brain. In early 2013, Amazon began moving a prototype of the original Echo into the homes of hundreds of employees, who were asked to sign confidentiality agreements and fill out surveys about their experiences with the product.

The experimental devices were, by all accounts, slow and dumb. Perhaps the most harrowing review came from Bezos himself. The CEO was apparently testing a unit in his Seattle home, and in a pique of frustration over its lack of comprehension, he told Alexa to go “shoot yourself in the head.” One of the engineers who heard the comment while reviewing interactions with the test device said, “We all thought it might be the end of the project, or at least the end of a few of us at Amazon.”

In the months that followed, Amazon’s ongoing efforts to make its product smarter would become embroiled in a battle between dueling AI dogmas and would lead to its biggest challenge yet.

Thanks to the acquisition of an artificial intelligence company in Cambridge, England, called Evi, Alexa was already proficient in the culturally common chitchat called phatic speech. If a user said to the device, “Alexa, good morning, how are you?” Alexa could make the right connection and respond. It could also handle factual queries, such as requests to name the planets in the solar system. These qualities, the result of a programming technique called knowledge graphs, gave the impression that Alexa was smart. But was it? Proponents of another method of natural language understanding, called deep learning, believed that Evi’s method was too regimented to give Alexa the kind of authentic intelligence that would satisfy Bezos’ dream of a versatile assistant that could talk to users and answer any question. If a user said, “Play music by Sting,” for instance, they feared a knowledge-graph-based system could think he was trying to say “bye” to the artist and get confused.

Most Popular

  • Gear

    The Top New Features Coming to Apple’s iOS 18 and iPadOS 18

    By Julian Chokkattu

  • Gear

    Give Your Back a Break With Our Favorite Office Chairs
  • Gear

    The Best Fitness Trackers and Watches for Everyone

    By Adrienne So

  • Gear

    Our Favorite Smartwatches Do Much More Than Just Tell Time

    By Julian Chokkattu

In the deep learning method, machines were fed large amounts of data about how people converse and what responses proved satisfying, and then were programmed to train themselves to offer the best answers. In other words, the more Alexa was used, the smarter it would get.

The chief proponent of this approach was an Indian-born engineer named Rohit Prasad. Prasad and his colleagues had to solve the paradox that confronts all companies developing AI: If they launch a system that is dumb, customers won’t use it, and therefore won’t generate enough data to improve the service. But companies need that data to train the system to make it smarter. Google and Apple solved the paradox in part by licensing technology from Nuance, using its results to train their own speech models and then afterward cutting ties with the company. For years, Google also collected speech data from a toll-free directory assistance line, 800-Goog-411. Amazon had no such services it could mine, and Hart was against licensing outside technology—he thought it would limit the company’s flexibility in the long run. But the meager training data from beta tests in employees’ homes amounted to speech from a few hundred white-collar workers, usually uttered from across a noisy room in the mornings and evenings when they weren’t at the office. The data was lousy, and there wasn’t enough of it.

Rohit Prasad is the head scientist of Alexa Artificial Intelligence at Amazon.

Photograph: Joe Buglewicz/Bloomberg/Getty Images

Meanwhile Bezos grew impatient. “How will we even know when this product is good?” he kept asking. Hart, Prasad, and their team created graphs that projected how Alexa would improve as data collection progressed. The math suggested they would need to roughly double the scale of their data collection efforts to achieve each successive 3 percent increase in Alexa’s accuracy.

That spring, only a few weeks after Prasad had joined the company, the team brought a six-page narrative to Bezos that laid out these facts, and they proposed to double the size of the speech science team and postpone a planned launch from the summer into the fall. The meeting did not go well. “You are going about this the wrong way,” Bezos said after reading about the delay, according to someone who was present. “First tell me what would be a magical product, then tell me how to get there.”

Bezos’ technical adviser at the time, Dilip Kumar, then asked if the company had enough data. Prasad, who was calling into the meeting from Cambridge, replied that they would need thousands of more hours of complex, far-field voice commands. According to an executive who was in the room, Bezos apparently factored in the request to increase the number of speech scientists and did the calculation in his head in a few seconds. “Let me get this straight. You are telling me that for your big request to make this product successful, instead of it taking 40 years, it will only take us 20?”

Prasad tried to dance around it. “Jeff, that is not how we think about it.”

Most Popular

  • Gear

    The Top New Features Coming to Apple’s iOS 18 and iPadOS 18

    By Julian Chokkattu

  • Gear

    Give Your Back a Break With Our Favorite Office Chairs

    By Julian Chokkattu

  • Gear

    The Best Fitness Trackers and Watches for Everyone

    By Adrienne So

  • Gear

    Our Favorite Smartwatches Do Much More Than Just Tell Time

    By Julian Chokkattu

“Show me where my math is wrong!” Bezos said, according to a person who was in the room. Hart jumped in. “Hang on, Jeff, we hear you, we got it.” Prasad and other Amazon executives would remember that meeting, and the other tough interactions with Bezos during the development of Alexa, differently. But according to a person who was there, the CEO stood up and said, “You guys aren’t serious about making this product,” and abruptly ended the meeting.

After Jeff Bezos walked out on them, the Alexa executives working on the prototype retreated with their wounded pride to a nearby conference room and reconsidered their solution to the data paradox. Their boss was right. Internal testing and training with Amazon employees was too limited. They would need to massively expand the Alexa beta while somehow still keeping it a secret from the outside world.

The resulting program would put the Alexa program on steroids and answer a question that later vexed speech experts: How did Amazon come out of nowhere to leapfrog Google and Apple in the race to build a speech-enabled virtual assistant?

To execute its plan, internally called AMPED, Amazon contracted with an Australian data collection firm called Appen and went on the road with Alexa, in disguise. Starting in Boston, Appen rented homes and apartments, and then Amazon littered several rooms with all kinds of “decoy” devices: pedestal microphones, Xbox gaming consoles, televisions, and tablets. There were also some 20 Alexa devices planted around the rooms at different heights, each shrouded in an acoustic fabric that hid them from view but allowed sound to pass through. Appen then contracted with a temp agency, and a stream of contract workers filtered through the properties, eight hours a day, six days a week, reading scripts from an iPad with canned lines and open-ended prompts like “Ask to play your favorite tune” and “Ask anything you’d like an assistant to do.”

The Secret Origins of Amazon’s Alexa (1)

The WIRED Guide to Artificial Intelligence

Supersmart algorithms won't take all the jobs, But they are learning faster than ever, doing everything from medical diagnostics to serving up ads.

By Tom Simonite

The speakers were turned off, so the Alexas didn’t make a peep, but the seven microphones on each device captured everything and streamed the audio to Amazon’s servers. Then another army of workers manually reviewed the recordings and annotated the transcripts, classifying queries that might stump a machine, like “turn on Hunger Games,” as a request to play a movie, so that the next time Alexa would know. The Boston test showed promise, so Amazon expanded the program, renting more homes and apartments in Seattle and 10 other cities over the next six months to capture the voices and speech patterns of thousands more paid talkers. It was a mushroom-cloud explosion of data about device placement, acoustic environments, background noise, regional accents, and all the gloriously random ways a human being might phrase a simple request to hear the weather, for example, or play a Justin Timberlake hit.

The constant flood of random people into homes and apartments repeatedly provoked suspicious neighbors to call the police. In one instance, a resident of a Boston condo complex suspected a drug-dealing or prostitution ring was next door and called the cops, who asked to enter the apartment. The nervous staff gave them an elusive explanation and a tour and afterward hastily shut down the site. Occasionally, temp workers would show up, consider the bizarre script and vagueness of the entire affair, and simply refuse to participate. One Amazon employee who was annotating transcripts later recalled hearing a temp worker interrupt a session and whisper to whoever he suspected was listening: “This is so dumb. The company behind this should be embarrassed!”

Most Popular

  • Gear

    The Top New Features Coming to Apple’s iOS 18 and iPadOS 18

    By Julian Chokkattu

  • Gear

    Give Your Back a Break With Our Favorite Office Chairs

    By Julian Chokkattu

  • Gear

    The Best Fitness Trackers and Watches for Everyone

    By Adrienne So

  • Gear

    Our Favorite Smartwatches Do Much More Than Just Tell Time

    By Julian Chokkattu

Amazon was anything but embarrassed. By 2014 it had increased its store of speech data by a factor of 10,000 and largely closed the data gap with rivals like Apple and Google. Bezos was giddy. Alexa was being fed the equivalent of a brain-boosting superfood. By the fall, it was ready for launch.

The introduction of the Amazon Echo on November 6, 2014, was molded by the failure of the company’s Fire Phone only months before it. There was no press conference or visionary speech by Bezos—he was seemingly done forever with his half-hearted impression of the late Steve Jobs, who had unveiled new products with such verve. Instead, Bezos appeared more comfortable with a new, understated approach: The team announced the Echo with a press release and two-minute explanatory video on YouTube that showed a family cheerfully talking to Alexa. Amazon execs did not tout the new device as a fully conversational computer, but they carefully highlighted several domains where they were confident it was useful, such as delivering the news and weather, setting timers, creating shopping lists, and playing music.

Then they asked customers to join a waiting list to buy an Echo and reviewed the list carefully, considering factors like whether applicants were users of Amazon Music and owned a Kindle. Recognizing that it was an untested market, they also ordered an initial batch of only 80,000 devices, compared to a preliminary order of more than 300,000 Fire Phones, and distributed them gradually over the next few months. “The Fire Phone certainly made folks a little cautious,” says Hart. “It led us to revisit everything.”

More than one Alexa veteran suspected that the Amazon Echo might leave another smoking crater in the consumer technology landscape, right next to the Fire Phone’s. On launch day, they huddled over their laptops in a “war room” to watch as the waiting list swelled past even their most hyperbolic projections. It turned out that the notion of a cloud-connected computer that listens and responds from across a room was just as tantalizing and novel as Jeff Bezos had hoped it would be when he first sketched it out on that conference room whiteboard, nearly four years before.

In the midst of the vigil, someone realized they were letting a significant accomplishment slide by unappreciated. So a hundred or so employees headed to a nearby bar for a long-awaited celebration, and a few of the longtime executives and engineers on the project closed it down that night.

From AMAZON UNBOUND: Jeff Bezos and the Invention of a Global Empire, by Brad Stone. Copyright © 2021 by Brad Stone. Reprinted by permission of Simon & Schuster, Inc.

More Great WIRED Stories

  • 📩 The latest on tech, science, and more: Get our newsletters!
  • The cold war over McDonald's hacked ice cream machines
  • It began as an AI-fueled dungeon game. It got much darker
  • Don't underestimate the challenge of building a PC
  • Plastic is falling from the sky. But where’s it coming from?
  • NFTs and AI are unsettling the very concept of history
  • 👁️ Explore AI like never before with our new database
  • 🎮 WIRED Games: Get the latest tips, reviews, and more
  • 💻 Upgrade your work game with our Gear team’s favorite laptops, keyboards, typing alternatives, and noise-canceling headphones
The Secret Origins of Amazon’s Alexa (2024)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Fredrick Kertzmann

Last Updated:

Views: 6194

Rating: 4.6 / 5 (66 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Fredrick Kertzmann

Birthday: 2000-04-29

Address: Apt. 203 613 Huels Gateway, Ralphtown, LA 40204

Phone: +2135150832870

Job: Regional Design Producer

Hobby: Nordic skating, Lacemaking, Mountain biking, Rowing, Gardening, Water sports, role-playing games

Introduction: My name is Fredrick Kertzmann, I am a gleaming, encouraging, inexpensive, thankful, tender, quaint, precious person who loves writing and wants to share my knowledge and understanding with you.