Dreaming about what else Voice UI could bring to your product with custom engineering

These days, it seems like everything is voice enabled. It’s an exciting step forward towards truly natural user interfaces (UI), allowing us to interact with technology (and each other) in a more seamless way than tapping at a screen. Most voice-enabled products use off-the-shelf assistants like Google Home or Alexa, and it’s amazing how easy it is for product developers to bring a basic voice UI capability to devices. While it’s easy to get started, actually making one of these off-the-shelf voice assistants perform really well requires some difficult engineering—like noise/echo cancellation and far-field speaker isolation; challenges we love to tackle with our clients.

This is an exciting subject for those of us who are always looking at ways to push the envelope with technology and design, and below we explore some ideas about how voice UIs could be used to enable completely new user experiences.

Optimize for specific words or environments

If your product is commonly used in certain settings that have unique ambient noise signatures, say in a factory, your audio processing could be optimized to filter out that noise. A task that may otherwise prove challenging or power hungry for a general purpose voice UI.

Similarly, your use case may require critical recognition of specific words. Notable, for example, provides a system that uses voice recognition and AI for medical purposes in an environment where there’s a lot of specific vocabulary used. A general-purpose voice UI will likely get confused by medical terms like “rhinorrhea” (clogged sinuses) or “sphenopalatine ganglioneuralgia” (“brain freeze”), but a custom system could be trained for this.

Or you may decide that the off the shelf wake words like “Okay Google” and “Hey Siri” don’t suit your use case or complicate your brand experience. Training a system on a new wake word is a challenge but definitely possible.

Optimize for low power

At Synapse we’ve been developing small form-factor wearable devices for years and have become experts in low-power design, and when looking at currently available voice assistants, they aren’t optimized to run on battery power. However with a well-understood use case we could develop a low-power UI appropriate for battery operation; bringing us all a little closer to being a bunch of Dick Tracys.

Our Cambridge team recently announced “Ecoutez”, which is a technology to enable Voice Activity Detection with lower power consumption than other available options. This could allow for the “always listening” functionality of an Echo to run on a small battery rather than wall power.

Hear more than just words from speech

What if you require recognizing something about a person’s speech other than the words themselves? One area where we’ve been making some progress is in identifying the speaker based on their voice to allow the UI to respond differently for different people. Our Aksent system can detect your accent, and our research shows that we could determine a user’s gender or even age from voice alone, which could be valuable for market research purposes.

Or what if your device could detect a person’s emotion and respond differently depending on if the speaker is scared, angry, or happy? Maybe I want the music to turn down a lot faster if I angrily yell “volume down” than if I say it happily.

Hearing beyond speech

Imagine a system that could recognize what’s going on around the user through ambient noises, and then pair that with speech to provide better informed responses. Apple is scratching the surface of this when Siri responds to the question “what song is this?” But it’s possible to interpret so much more about our surroundings through audio.

In a smart home application, your voice assistant could recognize the sound of a knock at the door, and then if you ask “who is it?” your video doorbell could turn on. It’s also possible to detect sounds  like breaking windows for security, or maybe the sound of a person falling, for elderly care. Our team has done some groundbreaking work in non-verbal audio recognition, including complex tasks like recognizing music and categorizing it by genre using deep learning, as shown in the video below.

As a new dad, I would love a system that could listen to my baby’s crying and tell me if she’s hungry or has just had another blow out!

Combining speech with other sensing modalities for more natural UI

Along with verbal communication, we humans also use physical actions to communicate. A more natural UI could incorporate input from a camera or other sensors to recognize and understand facial expressions and hand gestures, for example.

Dallas-based startup KinTrans is using computer vision to translate sign language to text. An assistant could use that kind of gesture recognition to allow more natural interactions. A machine could better diagnose ailments with the ability to pinpoint exactly where on the body pain is coming from by recognizing voice and hand movement together. A flying drone could respond to gestures as well as voice control, like Amazon has envisioned in a recent patent.

Credit: USPTO

What would you want your assistant to do?

Currently available voice assistants are amazing solutions and have already changed the way we interact with technology, but, even greater personalization and utility is possible. If you’re developing a device that needs a really well executed voice assistant or if you want to push the envelope and create a new kind of user experience, please get in touch—we’d love to geek out!