The base of Amazon Echo, a smart speaker home of voice agent Alexa
Considering the evolutionary ladder of user interfaces, you might think the voice user interface (VUI) will replace the graphical user interface (GUI).
Part of that symbiosis is the similar pronunciation of their acronyms. VUI is pronounced “vooey,” a counterpart to GUI’s “gooey.”
Tom Hebner, currently Global Head of Innovation at voice tech firm Nuance and former head of the VUI team there, told me that his company — a leader in voice recognition — has been using the term VUI since the turn of this century.
“One of my objectives was to get rid of the name ‘vooey,’” he said, although that effort has failed. “Now, I’m stuck with ‘vooey.’”
Although they are pronounced similarly, he noted that VUI presents a very different set of challenges for marketers, compared to GUI.
The use case of beers
Marketers’ “biggest misconception is that [VUI] is easy,” he said.
Hebner noted that, because marketers have conversations all day, some expect that a VUI will be seamless and easy to create. But it’s not.
Creating a VUI requires a variety of different skill sets. There’s the user experience design, which addresses such issues as the ways conversations on a given topic might flow, as well as their rules and framework. There’s also the actual writing of prompts within the limits of topics or word choices, a usability analysis to determine if the resulting possible conversations are confusing and audio production to create sound clips or synthetic speech.
Plus there are still conversational areas being worked out by voice interface developers, such as determining intent, determining the next step once intent is pegged and figuring out how results will be presented.
One presentation example used by his team, Hebner said, is the recitation of beers available at a bar. Typically, the bartender might verbally list them for you, and you “listen for the one you want.” In that use case, a list of various options can be processed verbally because you mentally “grab” the right choice when it is presented.
But what if the user is asking for the best thin-crust pizza within a 10-mile radius? In that use case, the results may be complex and confusing if presented verbally.
One solution to the pizza results problem, Hebner said, is having the voice interface “chunk” the results, such as offering the top three thin-crust pizza parlors according to some local publication. Then the interface can ask if the user wants to hear the next three, and so on. But comparing this set of three with the previous three, or with the following set, can get confusing quickly.
‘Voice in, text out’
The other option, and one that may well become common, is that the VUI refers you to a GUI.
It could send a text list or a link to your phone, your laptop, or even a connected TV. Hebner describes that mode as “voice in, text out.”
“Graphical User Interfaces replaced specialized command line interfaces,” Fjord Senior Design Director Jevaun Howell told me via email, “making for more intuitive and expressive interactions — a significant upgrade from blinking cursors.” Fjord is Accenture Interactive’s design and innovation consultancy.
He added that VUIs “allow for intuitive, natural, voice-driven interactions that have the potential to increase immediacy and lower the barrier of entry” for engagement with products and services.
“But let’s be clear,” he said. “Graphical User Interfaces aren’t going anywhere.” VUIs are great for “issuing simple commands and triggering simple actions,” Howell said, but “once the system needs to return complex information — a long list, a set of instructions, or an array of products for comparison — a GUI ends up being a far more suitable interface.”
Since VUIs and GUIs will live together for the foreseeable future, marketers’ key challenge going forward is not about understanding how voice will replace visual, but how these complementary interfaces should interact.