alphaspirit - Fotolia


Lex-powered voice recognition apps lack voice in enterprise IT

Enterprises that want a consumer-facing chatbot can incorporate Amazon Lex technology into their apps. But uses remain murky, and it's not a fit for every industry.

Voice user interfaces and chatbots, like Alexa, Siri and Cortana, attract a lot of attention in the consumer space. And many enterprises plan to incorporate this technology into voice recognition apps, hoping the consumer market shifts toward those types of apps, just as the rise of smartphones spurred mobile application development.

Speech recognition and natural language processing capabilities are in their infancies. Design patterns will emerge as more enterprises research and deploy voice recognition apps, which should lead to more long-term adoption.

Amazon Alexa capabilities are available for developers to build consumer and enterprise apps via Amazon Lex, which enables voice recognition and natural language processing for those apps. Amazon limits each Lex request to 15 seconds of speech or 1,024 characters of text. AWS developers integrate Lex with Amazon Polly for text-to-speech, AWS Lambda for business logic, Amazon Cognito for user authentication and direct connections back to enterprise apps. Enterprises that experiment with Amazon Lex today should be in a better position to uncover future use cases and applications.

Speech recognition and natural language processing capabilities are in their infancies.

Early uses of voice recognition apps, for example, could improve productivity for drivers and employees operating dangerous equipment. Voice interfaces enable workers to retrieve information while keeping their attention on other tasks. Other early uses include quickly finding information buried in customer relationship management (CRM) and ERP systems or filling in forms on small screens without having to type.

Properties of voice interfaces

People can speak much faster than they can type -- especially compared to a small keyboard on mobile devices -- particularly when capturing large sentences. Voice and natural text interfaces have unique benefits and limitations compared to other interaction patterns, such as typing on a keyboard or mobile device, reading or scrolling and clicking with a mouse.

End users can also navigate large menu hierarchies faster with natural language processing, and they can make complex requests with a single sentence. But voice recognition algorithms are less accurate at translating speech to text than typing. Noisy environments further reduce accuracy. As a result, speech vocabularies work best when constrained to a few choices and when they give users visual or verbal feedback. End users also tend to have a much slower listening speed than reading speed. That speed difference is even more apparent when displaying a large menu of choices on a screen.

Test-driving Amazon Lex

While Amazon Lex enables developers to build chatbots, there are a number of components that go into building speech recognition and natural language processing apps. Amazon Lex includes support for a variety of tools to build rich apps on, including:

  • APIs that only process text for Java, JavaScript, Python, .NET, Ruby, PHP, Go and C++;
  • software development kits for iOS and Android that support text and speech input;
  • AWS Lambda blueprints that provide building blocks for Lex apps;
  • enterprise mobile hub connectors for Salesforce, Microsoft Dynamics, Marketo, Zendesk, QuickBooks and HubSpot;
  • integration with Polly for text-to-speech synthesis;
  • integration with Cognito for user authentication;
  • sample chatbot code that developers can use as a template; and
  • SMS and messaging service endpoint integration with Facebook Messenger, Slack and Twilio.

These characteristics suggest that voice recognition apps will be most widely adopted when a worker's hands and eyes are occupied, such as while driving or operating equipment. Voice interfaces could also be used in hospital settings when sanitation is a concern. As a result, most successful Amazon Lex applications likely will adopt a hybrid strategy in which voice provides the input and screens provide the output. That approach could come in handy in the following scenarios:

  • pulling up client contact information that appears on the screen using CRM integrations;
  • requesting inventory status for a product that shows up on a screen using ERP integrations; or
  • populating a help center rep's screen with useful information based on complex text queries in conjunction with Zendesk and Freshdesk.

Early enterprise adopters of Amazon Lex include such companies as business software company Infor, which used Lex to help build its Coleman artificial intelligence and machine learning service. Insurance provider Liberty Mutual uses the AWS technology to enable employees to search through complex databases. Other enterprises are developing consumer-facing chatbots on top of Lex, including Kelley Blue Book, the American Heart Association and Capital One.

Next Steps

Lex helps developers build an expert system

Paths converge for Amazon and AWS

Alexa features will eventually appeal to enterprises

Dig Deeper on AWS natural language processing