From ‘witnessing’ a murder to allowing six year olds to place Amazon orders, voice-activated devices have generated unexpected situations. Undaunted, the sector continues to boom and Nokia has launched MIKA, dedicated to making telcos more efficient.
Nokia’s MIKA (for Multi-purpose Intuitive Knowledge Assistant) offers voice-based access to information, which it says will give highly skilled engineers faster access to critical info and recommendations. The company reckons it could save them up to an hour a day.
MIKA is underpinned by the Nokia AVA cognitive services platform and “combines augmented intelligence with automated learning to provide access to a range of tools, documents and data sources,” according to the company. They include the Nokia AVA knowledge library, a repository of best practice gathered from Nokia projects around the world.
Time will tell how helpful MIKA turns out to be, but clearly voice-activated interfaces and devices will increasingly be the way humans interact with machines. What lessons can we deduce from our experiences so far?
Put the available safeguards in place for everyone’s sake – there are reports of an Amazon device having a dodgy ‘conversation’ with a small boy, having misinterpreted what he said.
On a lighter note, in early January, a six-year-old in Dallas asked her family’s Amazon Echo Dot smart speaker for a doll’s house and several pounds of cookies. Alexa, Amazon’s artificial assistant, helpfully ordered them for her, resulting in a surprise package for her parents, costing $160.
Even more amusingly, a TV station in San Diego picked up the story and when one of the anchors said, “I love the little girl, saying ‘Alexa ordered me a dollhouse’”, which resulted in Amazon devices in several homes in San Diego trying to place orders for dolls’ houses too, activated by the word Alexa.
There is an off button – Alexa could be witness to a murder – an offbeat reminder that whatever you’re up to at home, the chance that something is listening will get become stronger as voice-activated interfaces become more pervasive. Indeed, as we reported, Hello Barbie was launched as an always-on device, designed to listen to – privacy groups say snoop on – children the better to understand and sell to them. You can watch Barbie listening here.
Hearing isn’t understanding – we’ve already mentioned the inappropriate conversation with a little lad, but there are lots of challenges around accents and cultural use of language too. My German brother-in-law (who spoke very good English) took a while to recognize that when someone in the English half of his family said, “Would you like to do X”, sometimes they weren’t asking his opinion or preference; rather it was request for help or a veiled command (depending on who said it, in what context). Any native British English speaker would get that, but how would a machine?
And of course, there are hundreds of languages apart from English, which might be the most widely spoken, but is not spoken by the most people, and of course the same effort* has to be replicated for each language.
As an article in The Conversation (appropriately enough) summed up progress nicely;
Over the past few years, voice interfaces have become much better at understanding everyday or ‘natural’ speech rather than only stilted and carefully worded commands. They are still better at handling simple queries, like ‘who’s playing in the Australian Open?’, and tend to struggle with more complicated requests, like ‘who’s playing in the Australian Open for the first time this year?’, and follow-up questions, like ‘will it rain during the finals?.
Despite all this, Gartner still reckons that by next year, 30 percent of our of our (those of us in developed economies in the English-speaking world, presumably) interactions with technology will be through ‘conversations’ with smart machines.
Hence, interfaces and devices are optimized to deliver certain outcomes – like Alexa is optimized for ordering stuff from Amazon, Apple’s Siri for search, while, “Google Now is good at giving relevant responses to a wide range of requests because it benefits from Google’s troves of data about the web, and your personal activities, if you use Google services,” according to The Conversation piece. As users, we need to remember that.
Closely coupled with the point above is being realistic: You’re not having a conversation with someone who knows you well, but with a machine designed to produce particular outcomes. A small but interesting study by Microsoft found that people who persevere with voice interfaces tend to start off with low expectations and are patient, and many machine interfaces ‘learn’ and improve.
*Remember, again from The Conversation article, it’s an interface and this is what it has to go through to carry out the simplest request:
- ‘hear’ your voice, and distinguish it from background noise
- figure out where each word begins and ends, ignoring ums and ahs
- match the sound of each word to a word in the dictionary, picking the right one from context if there are words that sound alike but have different meanings)
- interpret the meaning of the whole sentence
- generate a meaningful and useful response that matches your request.
And let’s face it, a lot of the time , you don’t get that from your nearest and dearest!