What Goes Into Delivering a Drink Feature Image Blog Post x

What Goes Into Delivering a Drink?

The list of assignments you can give Misty is long and varied, and delivering drinks is on it.

While there are several different ways you could combine Misty’s capabilities for this job, this post focuses on an approach that uses Misty’s audio localization capabilities to search for someone specific and deliver their drink of choice.

When this skill runs, Misty starts listening to sound and turns toward speech she detects. If the speaker is the person she’s supposed to deliver a drink to, Misty approaches and hands it off. What’s more? In this version of the skill, when you touch Misty’s chin, she calls out to Azure’s Cognitive Services to generate a brief description of what she can see.

All of this happens in a matter of minutes, but there’s a lot going on that’s worth a deeper look. This article describes, at a high-level, how Misty leverages the capabilities at play. We’ll start with what happens when Misty hears a voice.

Audio localization & life-like movement

When Misty records sound, she returns audio localization data that you can use in your skills. This data includes the degree of approach for the loudest voice that Misty can hear, so she can calculate a speaker’s position relative to the current orientation of her head. We use this information in Misty’s beverage delivery skill to determine where she should move.

Coding a life-like response to environmental stimuli is part of what’s fun about programming Misty. It’s also an important part of human-robot interaction. Robots who act alive endear themselves to us, and we feel an emotional connection when we spend time with them. This is an important consideration when we code Misty’s movement in the beverage delivery skill. We could have Misty rotate her head and torso together, and Misty would end up facing the right direction, but her movement would appear rigid and, well, robotic.

We can achieve a more life-like response by having Misty turn her head before her torso, similar to the way a human would move. To do this, we use data from the actuator for the yaw (or horizontal side-to-side) movement of Misty’s head. Misty calculates the relative angle of the speaker’s voice and calls a head movement command to look in the right direction. Data from the inertial measurement unit (IMU) in Misty’s torso provides the information Misty needs to calculate how much to rotate her body. When Misty’s head is facing the right direction, she holds it in position while her torso completes the turn.

It’s not ballet, but it’s close.

Hardware hacks & hospitality

This is all quite impressive, but all the audio localization data in the world doesn’t help Misty deliver a drink unless she can carry it. And while Misty’s default arms are highly useful when programming her personality or inventing new gestures, they’re not quite right for carrying cans of soda. Luckily, Misty is hackable hardware, and all you need to build a new arm is an interesting idea and access to a 3D printer.

When delivering drinks, Misty swaps out her right arm for a modified version of a limb originally designed to carry an external battery pack. Turns out, this battery pack is roughly the same size as a soda can.

What Goes Into Delivering a Drink Misty and Friends Blog Post
It doesn’t take long for Misty to become the life of the party.

So what happens when Misty arrives with the drink? In this skill, we program Misty to use facial recognition to see if the speaker is the designated recipient of the beverage. Once Misty confirms the speaker’s identity, she can deliver the beverage with confidence. Her hazard avoidance system prevents her from bumping into obstacles on the way.

Going deeper with Azure Cognitive Services

When delivering drinks isn’t enough, Misty shows off a bit of what’s possible when you combine her capabilities with third-party services – in this case, Microsoft Azure.

A touch on Misty’s chin triggers a block of code that has Misty take a picture with the camera in her visor. Then, she sends a web request to upload the photo to Azure’s Cognitive Services and return a description of the scene. We use Azure’s Speech Services to convert the text into an audio file, which Misty then downloads and plays through her speakers.

You can see how all this comes together in this video of Misty debuting her beverage delivery service at Microsoft BUILD.

Misty’s scene descriptions are candid and unfiltered, which can lead to some entertaining results.

Cloud services like those Azure provides are hugely beneficial for developers who want to use AI, machine learning, analytics, and other advanced tech in Misty’s skills, but who may not have the expertise required to set up those systems on their own.

We’re only just beginning to see what’s possible when you bring robotics development platforms into the broader tech landscape. There are a lot of big things in Misty’s future, and we look forward to being amazed by what the community of Misty developers decides to create.

Leave a Reply

Your email address will not be published.

Free Misty Robot Giveaway for Developers

Enter to win a Misty prototype and then receive a trade-in for a brand new Misty II during the first wave of shipments!