Meet the coders who hear you yell at Alexa
Alexa! Hey Siri! Hey Google!
Which wake word will rule them all?
Each company is using state-of-the-art artificial intelligence to convince us that their virtual assistant is the best one to trust with your requests. The winner of the wake word war will get to be the gatekeeper everyone else has to go through to make their smart devices work in your home.
It will have to be useful and trustworthy and not too creepy.
In this episode of Prime(d), we look at the bots we invite into our home. And as we say in the radio biz – the mic is always on.
Listen to the podcast by clicking the play button above. Subscribe in your favorite podcast app.
Our theme song is "Ripples on an Evaporated Lake" by Raymond Scott.
In this episode, we utilized Microsoft Azure Cognitive Services to build our own version of Alexa. Our workflow was based loosely on this project.
Meet a couple of the developers who listen to how Alexa misunderstood you
Noelle La Charite
Noelle LaCharite works for Microsoft now, but she started out at Amazon building Alexa skills that offer mindfulness tips and daily affirmation.
“I used to have a T-shirt that said I heart A.I. And people would stop me in the airport and say, ‘You know that’s gonna kill us all, right?’ And I would tell them, ‘Don’t worry, I taught it to be mindful, and kind, so even if it’s bad, we will totally enjoy it.’”
LaCharite coded the following response for Alexa:
Here’s your daily affirmation for today: I am at one with all that is, and am at peace.
(To try this skill yourself, say: "Alexa, enable daily affirmation.")
“I coded 700 lines of — if you say this, say this. It turned out I was pretty good at guessing what people would say, especially around affirmations and mindfulness. But was magic to the people that used it, no one knew that I hand wrote it.”
Anthonio Pettit is a veteran at writing for smart speakers now, and he’s worked on several of them such as Alexa, Cortana, and Samsung’s Bixby. He has even worked on the digital assistants living in car dashboards made by Toyota, Audi and Lexus.
“I can’t tell you how many times I’ve heard – “‘Lexa, bring me a beer!”
Pettit says they all have what are called “false accepts.” Like if you hit the button on your steering wheel, or you said something that sounded kind of like the wake word. That triggered the device to start recording.
“You know, when people aren’t aware that they’re being recorded – or sometimes even when they are — you hear weird stuff.”
In his job, Pettit heard stuff from all over the country.
Sounds of yelling. People using intimate nicknames for each other. A near car accident. (That was from one of the dashboard mounted assistants.)
“You’re never quite alone, when you’re speaking to these speakers. Potentially. There’s always the chance that your data is being used for quality assurance and that there’s somebody on the other side – a human – who is overhearing it.”
Tech companies like Amazon say they aren’t interested in surveilling us, they need the info they are overhearing to help the AI inside the smart speaker craft a better response. That’s because – some of Alexa’s answers don’t make a lot of sense. One recording featured a guy who just could not believe what Alexa told him when he asked what hamsters eat.
“So they called in their friend, and they’re like – ask it what hamsters eat."
Alexa told them: Hamsters eat chicken, cucumber and nut.”
Joshua looked it up and found that hamsters are omnivores, so it’s not technically wrong – though they’re more likely to eat bugs than chicken.
“It was as weird for me to hear it as it was for them.”
At the time, Pettit’s job was to mark that recording as “not actually related to Whole Foods shopping.” That’s true to the company’s character. Ultimately, Amazon wants to sell you stuff.
Today, Pettit actually writes answers for smart speakers. There are much bigger, more important things to write answers for — answers that will help these companies win or lose territory in the Wake Word War.
“Like – if you tell Alexa you’ve been hungry, it’ll ask:
'Which of these 3,000 restaurants do you want to order food from?'
But if you tell Cortana you’re hungry, it’ll say:
'Why not try eating something?'”
With the really hard questions, Pettit says the writers will put several smart speaker brands on a table and listen to all of them respond just to see what the industry standard is.
"Like if somebody said 'I’m lonely.' So it’s not really a task, it’s sort of a n emphatic statement. And yet, you’d want to be able to respond in a way that signals to the person doing the inputting that they’re being heard. But also not oversell the capabilities of the speaker. You’re not really a therapist, you’re not really someone who’s going to be able to provide an answer."
This brings us to the edge of the precipice, because what the person telling the smart speaker that they’re lonely is doing is reaching out, as one would reach out to a friend.
And a smart speaker cannot be that friend.