BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Enterprise developers can use AWS' audio and video recognition hardware to get their feet wet with machine learning and IoT.
AWS DeepLens is still in its early phases, which has made for some growing pains. For example, some users reported problems with basic elements of the deep learning-enabled camera technology, such as Wi-Fi connectivity. But even in its infancy, developers can make use of the platform to experiment with and improve their skills with other AWS offerings, such as Greengrass and SageMaker, which can respond to events in the physical world.
AWS DeepLens has a variety of experimental projects to learn the basics of video, image and audio recognition. There are some particularly interesting projects out there, such as a customer counter that measures the line at a business, a tool that tracks audience engagement with a presentation and a Simon Says game that evaluates a person's ability to mimic a movement. The basic principles behind these apps could be extended into enterprise apps, too.
Developers can use machine learning frameworks, such as Apache MXNet -- and, eventually, TensorFlow and Caffe -- to create custom multimedia recognition models built in Python. They can then use SageMaker, AWS' managed machine learning platform, to further refine and customize those models. DeepLens can also work with Kinesis Video Streams to build and deploy more sophisticated analytics apps in the cloud.
Once these models are optimized with SageMaker, the team can then deploy them to DeepLens to create apps that run independent of the cloud. There are also tools to integrate DeepLens apps with other native services, including S3, Lambda, DynamoDB and Rekognition.
Improving restaurant experience
Developers can use integration with Lambda to identify basic events in the recorded video or audio and chain them together to create higher-order events. They can then explore programming constructs similar to the synthetic sensor framework developed at Carnegie Mellon University.
For example, an event could consist of a person who walks into a restaurant and then walks out. Higher-order events might log the time a customer spent in the restaurant. A developer could correlate that data with other events, such as emotional expression, menu selections, weather or restaurant specials, to produce more sophisticated customer experience analytics.
To reduce privacy concerns, Carnegie Mellon researchers created a framework for event correlations directly on the device. AWS DeepLens can do the same kind of correlation on the camera, which should make it easier to reduce privacy concerns that can arise if a video analytics app is deployed in areas such as the European Union.
Get an ergonomic coach
Another developer might build a DeepLens app to track employees' posture and breathing at work. This app could provide audio cues when someone slouches, tightens their shoulders or bangs the keyboard too hard. The developer could customize pose estimation models that define how someone sits as events.
A developer would have to record sample videos of people slouched and upright and feed that data into SageMaker to train the models. They might also want to create a library of appropriate audio cues to remind someone to take a break. Then, they could use Greengrass to deploy those models to AWS DeepLens.
The DeepLens device has a microphone that can create a separate model to correlate keyboard noise with how hard someone bangs on the keys. DeepLens doesn't natively support Alexa or audio analytics, but developers can create and deploy custom audio analytics models built with MXNet. This could seem a little clunky, but developers could use that same code with laptop cameras and microphones as Amazon enhances its machine learning ecosystem.
Enterprises could also install AWS DeepLens in conference rooms to create apps that measure and improve meeting engagement. One Lambda function could correlate emotional expression and posture with a measure of participant engagement. A separate Lambda function could use an image or audio recognition model to identify who spoke. These two Lambda functions could correlate to create an app that coaches participants on their engagements with co-workers and improve their communication skills.
Developers could use this kind of app to create feedback loops to empower employees and build trust, though it's important that these apps help people to grow with meaningful feedback instead of criticism. Developers might want to review the ongoing work at Carnegie Mellon on Sensei, which analyzes audio and video to capture, isolate and analyze voices and behavior patterns to improve a professor's classroom performance.