This post walks through building a skill that allows Misty to leverage the power of the cloud and external libraries to extract and read text from an image.
It covers how to:
• Set up a Microsoft Azure Function
• Send a request from Misty to this freshly created serverless endpoint
• Use Microsoft Cognitive Services to extract text from an image Misty captures, and
• Return an encoded .wav file that Misty can save and play.
The first step is to set up an Azure account here: https://azure.microsoft.com/en-us/free/ At the time of writing, you can get $200 of free credits to use with this and other services. (you should definitely look at all the AI Platform Services that Microsoft offers.) After you get your account set up, follow these steps to get up and running:
1.Log into the Azure Portal with your fresh credentials and click on the Create a resource link in the top right corner.
2. Browse the resources to find and create a new Function App. Fill out the form to set up the app.
3. Give the Function App a name. For this skill, we use TeachMistyToRead.
4. Select your subscription level. If you have a new account, your level will be called free tier, but if you have a subscription to Azure it should be called Pay-As-You-Go.
6. Click Create. (Note: The creation of a new resource can take a few minutes. If it doesn’t show up, just wait a little bit and click refresh.)
7. Now that we have a new Function app, let’s create the function itself by clicking on the + button next to the Functions list.
If you want to debug your function locally, you’ll want to use Visual Studio Code. Follow these instructions on how to make use of Azure Functions locally.
8. Let’s select the option to edit our cloud function In-portal.
9. Select Webhook + API.
All right! Now we can add some code to our Function. Before we proceed, it’s important to note that what we’re building will be accessible from anywhere. It’s a good idea not to share your URL with anyone. There are authentication processes you’ll want to implement when you’re ready to create a production-ready skill (check out Microsoft’s Documentation for more information on how to do this).
We’re going to start with the basics. Let’s create and test the Text-To-Speech component to make sure we have the validation and token process working.
10. Copy the code found in the AzureTextToSpeech.js file, which can be found in the Misty Reads Tutorial repository under the ‘Azure Files’ Directory. Open this code in your favorite text editor.
11. You’ll notice that at the top of the code there is a placeholder for your subscription key: “<Put Subscription Key Here>”. To get these keys, we’ll need to create a Cognitive Services resource. To do this:
• Open the Azure Portal.
• Go to Create a Resource search for and select Speech (notot Speech API).
• Click Create.
12. We’ll call the asset MistySpeech. Be sure to select the same resource group we created earlier. (Remember, creating a new resource can take a few minutes. Be patient at this step if your resource doesn’t immediately show up.)
13. Once the resource is created, get the keys by navigating to the Resource Management → Keys group. Back in the code file, replace the areas marked “<Put Subscription Key Here>” with your actual subscription key (Key 1).
14. We’re nearly ready! Now we need a skill on Misty to utilize the endpoint we just created and verify that everything is working. You can download the AnnounceKnownPerson.js and AnnounceKnownPerson.json code files to install on Misty and test your function. Before you load this sample skill to your robot, edit the files to put the endpoint for the Azure speech service in the string “<Put Azure Endpoint Here>” area.
15. Next, use Misty’s Skill Runner to install the skill on your robot. Drag and drop your updated AnnounceKnownPerson.js and AnnounceKnownPerson.json onto the Install area to install the code on Misty (Be sure to connect to your robot first!)
16. Now, when we run the Announce Known Person skill, we can verify that everything is working correctly. When Misty sees a known person, she will greet them – in my case, she says, “Hi There Chris!”.
NOTE: For this skill to work, Misty must have a trained a person to recognize. The skill will not trigger with unknown people. To train a Misty to recognize a person, or to see who Misty is already trained to recognize, you can use the Command Center.
Now that we have the first half of the functionality ready, we’re going to go the rest of the way in teaching Misty to read!
17. So, for Misty to be able to read, we’ll create a new service in Azure that allows us to use Cognitive Service’s Vision’ APIs. To do this, navigate back to the Microsoft Azure portal and select Create a resource (the same process we used above).
18. Select AI + Machine Learning and then click on Computer Vision.
19. You can title the resource whatever you’d like. I’m going to call ours MistyRead.
20. Select the subscription type , confirm your pricing tier, select the Resource Group we used before (TeachMistyToRead), and then click create! (Keep in mind that deploying a new resource can take a little bit of time.)
21. Now we’ll need the keys required to make use of our new service. These can be found by going to ‘All resources, selecting the resource you created, selecting keys and copying the one you’d like to use.
IMPORTANT NOTE: Do not share your keys with anyone! Since we are using HTTP calls, anyone with your key can use your service, which can cost you $$$.
Now we’re ready to link everything together. We’re going to use the Recognize Text Endpoint within Cognitive Services Vision 2.0’s APIs. This endpoint is asynchronous; we’ll have to use a function call that can take in an image and submit that image to the appropriate text extraction endpoint. Within our recently written function, we’ll poll the returned URL until we receive the text that has been extracted from the image Misty gave us.
For simplicity, we’re going to create a new Azure function that does everything for us. It will handle the image from Misty, call and poll the Vision API, parse the results, then generate and return the Text-To-Speech sound file to Misty. This will complete the full loop for teaching Misty how to read.
22. Back in Azure Portal, goto ‘All Resources’ and select the Function App you created earlier (TeachMistyToRead). In the dropdown, click the + button next to Functions and create a new endpoint.
23. We’re going to create a new HTTP trigger. You can call it whatever you’d like; I’m going to call ours ReadMistyRead.
24. Leave the Authorization level as Function.
25. Copy and paste the code from our GitHub repository (ReadMistyRead) into the new function.
NOTE: You can test your newly created function call by sending a post command with an image encoded as a base64 string.
26. Now that you have your new function call, let’s bring it all together by installing the skill to run on Misty. Open up MistyReads.js and MistyReads.json files (remember you can find these in the Misty Reads Tutorial Github Repo) and replace the AzureEndpoint value with your new endpoint (which should look something like this: https://TeachMistyToRead.azurewebsites.net/api/ReadMistyRead.
27. Finally, load the Misty Skill that brings it all together. Install the .js and .json files using Misty’s Skill Runner.
28. And now for the end-to-end test. Point Misty at a piece of written text – a bit of writing on a whiteboard, for example. The Misty Reads skill will activate when you tap the top of Misty’s head to activate her cap touch sensors. Misty will pause for 2 seconds, then snap a picture to send along to Cognitive Services to read and speak the text that she is able to extract!
If you get any errors, you can use the Command Center to see what Misty is able to see(try snapping a picture to ensure the text is visible).
So where do we go from here? Really, the possibilities are endless – this only scratches the surface of how we can make Misty a cloud-connected node that can interact with our environment. Here are some challenges (you can make use of the referenced code from this tutorial to get started):
• Create an attendance bot that will note the time that Misty recognized a person, store it in Microsoft SQL DB, and export a daily report to your email using an email service like Sendgrid.
• Use Cognitive Services to periodically snap a picture and announce the description of what Misty sees(future blog post pending!).
• Use Azure Translator to have Misty be your interpreter.
• Run sentiment analysis on any text Misty is able to read and have her emote based on what she’s reading.
We hope this inspires you as a starting point to create a fascinating use case for a robot in your home or office – Let us know what you’re working on over in the Community Forum.