Emergence of Voice UI for Enterprise Software

If you have seen the movie Iron Man – you already know about Jarvis, Tony Stark’s personal assistant.

If you have seen the movie Iron Man – you already know about Jarvis, Tony Stark’s personal assistant (virtual AI assistant). Jarvis is the secret sauce that keeps Tony’s business, personal, and superhero lives running seamlessly. The interaction with Jarvis is natural, and it is the perfect blend of a human and computer assistant. It is every sci-fi fan’s dream to have a Jarvis-like assistant.

With the rapid proliferation of voice-controlled smart assistants led by Amazon Echo (Dot, Show and other variants) and Google Home and rapid advances in technology, every person will have his or her own Jarvis soon. Since 2015, voice-control technologies are witnessing disruptive growth in the form of voice-enabled speakers and digital assistants. This is coming true in the house first with consumer applications and then moving to business applications as the next logical evolution. 2019 might well be the year where use of voice for enterprise tech takes off.

Voice will become the future UI of enterprise systems such as the CRM, HRMS and so on. Voice-control technologies can understand, interpret, and answer conversations like a professional. That makes speech a very good input device. Voice is intuitive, fast and allows nuances in language that text can’t replicate. Technology is not yet where it needs to be but is getting there rapidly.

Most bots today are text-based, but enterprise-grade smart assistants combined with voice activation like the Outwork Sales Intelligence Bot can disrupt the way you work. From a business point of view, the voice-controlled bots humanize enterprises and add a layer of personalization. We have already seen this in the consumer world with the popularity of Amazon Echo, Siri, and systems with Microsoft’s Cortana.

Voice-activated smart assistants at workplace enable you to multi-task: you don’t need to physically interact with or see the devices you’re using. Major companies are backing this trend – Salesforce Einstein (launched in the past year), and Oracle Digital Assistant (launched in October 2018) are some prominent examples.

The spurt in the adoption of voice-control technologies, smart speakers and so on, also referred to as voice user interfaces (VUIs) or Voice-Powered UIs, is not merely a trend but a disruption.

What Is Driving Voice-Powered UIs

Speech Processing Technologies

Speech processing technologies such as automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS) have reached high accuracy levels and are getting better and are now widely available in real-time and in a scalable way. When these technologies join forces with AI that helps recognize, learn, and remember patterns and preferences, the VUIs turn intelligent.


The hardware has progressed too. Technologies such as far-field voice input processing (FFVIP) enable users to access VUI-enabled technology from a distance. Amazon Echo and Google Home with their multiple microphone configurations are good examples.

New Platforms

New platforms that are ideal for playing out VUIs to the fullest are open through cloud/web services and IoT. For example, a smart home with IoT deployed to control the gadgets around the home makes it a lot more convenient to set the timer on the microwave through voice rather than pushing buttons.

Apps are Fine but not Enough

While mobile has become the center of an individual’s world, the crowded app space is not auguring well for user experience. With an average of 42 apps on a mobile with only 9 to 10 apps garnering 90% of users time, 75% of users quitting apps within the first three months of installation, and 25% of installations witness one-time use, the app usage data is saying it loud and clear that while apps are great, they are not enough. On the contrary, the dominance of messaging apps like WhatsApp, Facebook Messenger and WeChat and the average user time go a long way to indicate where the users are heading. Hence the need to marry apps with a messaging interface, be it voice or chat. This need is specifically leading to voice bots and chatbots.

Character-Based Languages

According to eMarketer, China, Taiwan, and Japan have particularly adopted voice assistants well to save users from typing.

Ease of Use for Children, Non-literates, and Differently-Abled

Voice UI makes technology more inclusive through wider adoption by children, differently-abled and non-literates and others who cannot train in using technology.

Return of Natural Communication

The VUIs bring back what was taken away by text-based UIs and GUIs, that is, natural communication.

Why the Euphoria about Voice-Based Interaction

Technology is More Human

We all know, VUI engages our voice and ears but keeps the eyes- and hands-free. However, it also does something that is fundamentally human. It lends a personality to technology through voice.

At a subconscious level, humans equate voice communication with human-to-human interaction. Though there are other means for humans to communicate such as eye-contact, touch, gestures, body posture and so on, speech is fundamental in human-to-human interaction.

Now technology is placing itself in an enviable position wherein it asks for to be talked to and wants to speak with users. This is bound to create a fundamental shift in human perception about machines and the engagement it can cause when successfully implemented. Hence the enormous interest in VUIs.

The Voice-Powered Digital Assistant understands context, derives intent, and identifies and learns user behaviors and patterns to automate routine tasks such as expenses and meetings on behalf of the user.

Querying Goes Multi-Modal

Voice interfaces need not be application-centric. A voice interface can access any application that is voice-enabled. That brings down the barrier to query an application.

Speed of Engagement

Humans can speak 4X times the speed of typing (150 words vs. 40 words). That makes speech a very good input device.

Place of Engagement

People primarily spend their time in 4 places: work, home, car, and mobiles. These are major ecosystems for companies to reach users now. And voice-control technologies beautifully slip into home, car, and mobile where it was difficult to engage customers earlier. Now, voice technologies are knocking on the doors of our workplaces as well, leading to the rise of voice-enabled enterprise technology products.

Voice Technologies for Enterprise Tech: The New Equation

Enterprise use of voice assistants is a rapidly growing area. There opportunities for enterprises to take advantage of voice assistants through smart speakers are wide-ranging.

Intelligent voice assistants can act as additional resources for organizations. They leverage conversational technology and AI-enabled, cloud-based processing power to perform tasks that would otherwise be assigned to administrative assistants. So, every employee in the organization gets an executive assistant, albeit a virtual one.

Voice assistants’ administrative use cases such as scheduling meetings, setting reminders, and assisting with conference calls are well demonstrated. Amazon Alexa for Business integrates with company calendar and conferencing systems to allow attendees to start and control meetings automatically using their voice (Alexa for Business was released a year ago). Also, companies are looking at using voice assistants to enable employees to connect with internal departments, such as CRM, HRMS or IT, to facilitate internal support. Likewise, a Sales Intelligence Bot becomes a virtual Sales Analyst & Assistant.

Businesses are now contemplating that voice assistants should integrate with systems to allow voice-based queries and conversational interaction with enterprise information resources and corporate data.

Voice technologies are primarily taking two forms:

  • Voice-enabled digital assistants on smartphones (Google Assistant, Siri) and laptops (Cortana).
  • Voice-enabled smart speakers such as Amazon Alexa and Google Home.

While the above are mostly used by individuals today, Amazon is focusing on adopting voice technologies to meet enterprise needs. In his blog titled Unlocking Enterprise Systems Using Voice, Werner Vogels, CTO, Amazon puts it succinctly:

To use voice in the workplace, you really need three things. The first is a management layer, which is where Alexa for Business plays. Second, you need a set of APIs to integrate with your IT apps and infrastructure, and third is having voice-enabled devices everywhere.

There is a revolution in the edge devices that access the Internet and corporate networks.

  • The intelligent digital assistants on phones and desktops are providing a layer of speech-based abstraction to interact with enterprise applications.
  • The desktop or tab at home is giving way to the smart speaker or smart digital assistant for querying. Google says that 20% of all searches are voice-based and on Bing, 25% of all searches are voice-based. Gartner’s prediction that by 2020, 30 percent of browsing will be done online is in line with these current trends.
  • The devices without a keyboard are multiplying at a rapid pace. According to Statista, the number of connected devices (Internet of Things) will increase from 23.14 billion in 2018 to 75.44 in 2025.

These devices are significant in two ways – Firstly, they signify machine-machine knowledge transfer. Secondly, none of these devices has an input or human-readable output mechanism as they are built to collect and share information among their peers intelligently. Consequently, voice emerges as a suitable mechanism to query these devices.

All the above mean a fundamental shift in business technology. Devices like Echo Show even bring a screen to present the results of a voice search. So, it is high-time that enterprise businesses implement voice technologies for users.

While voice-enablement of enterprise technology is inevitable, the industry understands that voice augments the current interfaces rather than replace it. Sundar Pichai, CEO, Google, resonates:

“We expect voice to work from many different contexts. We are thinking about it across phones, homes, TVs, cars and trying to drive the ecosystem that way, and we want it to be there for users when they need it.”

Some Concerns Remain

Bill Buxton, Principal Researcher, Microsoft Research, summarizes beautifully:

“Everything is best for something and worst for something else. The problem is, when someone hits a home-run with one technology in one area, people try to ride on the coat-tails of that success, and indiscriminately deploy the same technology in the too-often misguided blind hope that the new deployment will achieve the same success. We have seen this with touch interfaces, non-contact gestures, and will see it with speech. The large number of failures that result are as inevitable as they are avoidable. Without an equally solid understanding of both the strengths and weaknesses of the technology – when, where, why, how, for what, and for whom it is and isn’t suitable – one is gambling rather than practicing design (much less acting in the best interests of users, shareholders, or employees).”

Many voice UIs are built on narrow AI that operates within limits. They are rule-based. Such systems do not scale well to meet user expectations because once the gates of voice interaction are thrown open, only the user has control over his conversation with the system.

The enterprise tech has been made with a focus on GUI and for a good reason considering the complexity of the products of this class.

According to Nielsen Norman, a screen is an efficient output modality as it can present large volumes of data to reduce the load on the human brain, make for efficient visual scanning, convey system status, and help speed up task execution through visual cues.


According to eMarketer, a whopping 63% are concerned about Smart Speakers and Virtual Assistants spying on them. Organizations are eyeing AI voice assistant devices cautiously and, to some extent distrustfully.

Enterprise security officers are concerned about always-on microphone snooping devices that send audio information outside the enterprise. This means that skills developed for voice assistants should be generic enough that any user can request without security or privacy concerns. This relegates the devices to use with only lower value information. Once better authentication can be put in place, more significant use of the technology will be possible.

Voice Morphing

Voice morphing, substitution of one voice for another, can pave the way for piracy and false news leading to the emergence of voice piracy laws.

Outwork AI – Voice Powered Sales Bots

Outwork AI provides Alexa integrated Sales Intelligence and Sales Assistant Bot. It allows for human-like conversations with your enterprise software. It works with Outwork, Salesforce and other major CRMs with data input and retrieval capabilities.

Hello! I am Outwork Assistant.
Click me to know more.

Outwork Assistant