The human interface that connects us with machines — the way we interact and control them — has changed a lot over the years. From tactile methods like knobs, buttons, keyboards, pads and touch screens to more recent voice and visual command capabilities, we’ve adapted our devices to become more user-friendly and more humanlike by using more intuitive input techniques. We’ve all grown accustomed to the swipe, the pinch, the “Hey, Google,” and the hand gesture to tell our devices what to do. But they still require the human element, a proactive direction by a person. That, too, is changing.
A new generation — indeed, ecosystem — of devices, will be driven by interfaces that perceive your wants and needs. Welcome to the future of IoT and perceptive intelligence, where user interaction is optional and contextual awareness is machine learning enabled. When devices transition from collecting and transferring information to using that information intelligently on their own, computing has become ambient.
Although based on some level of human interaction, ambient computing doesn’t require active participation. Artificial intelligence and deep learning can now power entire integrated ecosystems of devices to learn about users, their environments and their preferences, and then adjust accordingly to provide the optimal response or action. This kind of perceptive intelligence is enabled by sensors and vision and is embedded in our living and working spaces in a way that allows its use without being fully aware that we are doing so.
This level of intelligence is a result of the progression of AI and machine learning to deep neural networks that change the paradigm from sensing to perception and, ultimately, recognition of intent. Recent breakthroughs in deep learning are creating a revolution in the application of AI-to-speech recognition, visual object recognition and object detection. The connected devices provide the data and the AI learns from that data to perform certain tasks without human intervention.
Best of all, perceptive intelligence doesn’t even require a connection to the internet. Edge-based processing now has the performance and accuracy required (as well as the energy efficiency and small form factors to fit in battery-powered consumer products) to run sophisticated AI and machine learning algorithms locally, sparing users the cost, bandwidth, latency and privacy challenges of a cloud-based model. Now, devices can collect and analyze video and audio data and respond intelligently in near real time — without the risk of compromising user privacy or security or the cost of transmitting literally zettabytes of data to the cloud-based data centers.
Voice, Then Video
Voice-enabled systems are already having a major impact on the move toward perceptive intelligence. This goes far beyond simply asking your voice assistant a direct question or issuing it a specific command. Performance and feature breakthroughs using a far-field voice interface brings a more natural user convenience and usefulness to voice-enabled devices. More and more, smart devices are becoming context and conversationally aware, sensing needs, preferences or relationships between information without requiring direct commands.
This level of functionality has benefitted from deep neural networks that drive adaptive machine learning. In a voice-enabled system, this is an expansion of a system’s functionality — a larger vocabulary, for example, or voice biometrics for security and identity purposes. This allows a broader range of input styles or terms so that users are not reliant on just on a few trigger words (e.g., “Hey, Siri”). This creates a more natural interface that can also recognize intent based on contextual events, previous behavior or commands.
Advancements in computer vision, as well as the ability to enable vision on the edge, are broadening the possibilities of ambient computing. It’s a linchpin in a true multimodal approach to the IoT interface, where voice, gestures, gaze and touch will all play a role.
Such systems are taking advantage of much more humanlike, neuromorphic approaches that mimic how the human brain and eye work. As with voice, deep neural networks in machine vision power new levels of intelligence and contextual awareness. This includes facial recognition that can then interpret intent or preferences based on prior knowledge; a TV or set-top box that serves up content you typically watch on a Saturday night; a smart speaker/display device that can recognize you as soon as you walk in and deliver your personal updates, recommendations and schedule; a security system that recognizes a legitimate delivery from a porch thief; or a coffee maker that knows just how you like your morning brew.
Thanks to more efficient neural networks that can run on the edge, devices can enable richer, more accurate visual awareness that can be used to drive decision making by machines and not need to connect to the cloud to do so in many cases.
Such automation has many potential uses in the workplace as well, including security and access systems, perceptive controls for heating and lighting, and productivity-oriented tools for automated collaboration — all of which can use voice, gestures or other nonverbal interfaces to infer intent in an office or other work environment. Before adopting such systems, however, companies will want to understand important issues around security, integration with existing systems and the specific use model for each tool.
Human-machine interface (HMI) is an important component in improving the user experience when it comes to connected devices. Enhancements in how machines can collect audio and visual data and use it to understand and predictively respond to our actions are a game-changer for the future of IoT. Understanding intent, not just commands, will transform devices into truly helpful assistants.