Messaging services set the stage for humans to interact with programmable robots using the same devices we already use to talk with each other. That kind of interaction feels a little like magic, but it’s magic that anyone who codes can conjure. To show you what I mean, we need look no further than Misty’s Photo Booth skill which we demoed at Twilio SIGNAL 2019.
When this skill runs, you can send an SMS to ask your Misty robot to take your picture. When Misty gets your text, she stops what she’s doing, turns to look at you, and snaps your portrait with the camera in her visor. She then sends that picture right back to your phone via MMS.
Let’s start by looking at Twilio Autopilot, the service your text visits first.
SMS -> Twilio Autopilot
To get Misty’s attention, the user texts a phone number hooked up to Twilio’s Autopilot service. If you’re new to Autopilot, here’s how Twilio describes it:
Autopilot is a conversational AI platform to build omnichannel bots and virtual assistants with natural language understanding and complete programmability with Autopilot Actions.
When you text this special number, Twilio forwards your message to the Programmable Messaging channel associated with a unique Autopilot “bot”. Each Autopilot bot you create can perform one or more “tasks” (a.k.a. actions that trigger when your bot receives a certain kind of message). You customize tasks in the Twilio Console to program what they should do when triggered.
For the photo booth skill, the core function of our Autopilot bot lives in a task called we call take_picture. We “train” our bot to trigger the take_picture task when it receives one of the following phrases:
The next step is to program what the bot does when the take_picture task triggers. Possible actions include sending a message back to the user, listening for responses, or collecting and storing information. Or, if the built-in actions aren’t enough, you can program tasks to redirect to other services. That’s what we do in the Photo Booth skill. When the take_picture task triggers, it calls a redirect to run a Twilio Function.
Autopilot -> Twilio Function
Twilio Functions are serveless functions for handling inbound Twilio communications. These functions are a quick way to introduce the communications you’re processing in Twilio to the rest of the web (a medium of communication in which robots like Misty are exceptionally well-versed).
The Twilio Function we use in our Photo Booth skill does a few things. First, it assigns the user’s phone number to a variable. Then, it uses the Twilio API to reply to the user with an SMS: [••] Time to Pose. Finally, it POSTs the user’s phone number (and the nature of the action they’ve asked for) to the PubNub channel Misty is listening to for new messages. (More detail on this in the next section).
The Twilio Function code for the Photo Booth skill looks something like this:
With the Twilio Function and Autopilot Bot set up, we’re ready to look at PubNub, the service that notifies Misty to take a picture.
Twilio Function -> PubNub
Hacking a robot to react to an SMS is pretty cool. Even cooler? When that robot responds with hardly any latency. That’s where PubNub comes in.
PubNub provides a real-time messaging API that developers can leverage via HTTP communication protocols, allowing for quick communication between all kinds of machines. When you use PubNub’s messaging API, you create a data channel – sort of like a chatroom for devices – that multiple devices (like Twilio servers and robots) can subscribe and publish messages to.
While PubNub provides SDKs for several different languages, we get along just fine in the Photo Booth skill with basic HTTP requests. When you create a new “app” in PubNub, you get unique API Keys for publishing and subscribing to that app. To publish data (as we do in the Twilio Function above), we send a POST request to:
You’ll notice that the PubNub URL includes publish/subscribe keys, a channel name (sort of like the name of a chatroom), and a client name (a unique name that identifies this device in the PubNub channel). When we send this request, we pass along a JSON body with the message we want to publish. In our case, that message resembles:
You can read more about this in the PubNub developer docs, but the high-level view is that this request publishes a message to the PubNub app we created for the Photo Booth skill. External devices (like a programmable robot) that are listening to that app can then read those messages and make use of them on their own.
PubNub -> Misty
Each request Misty sends will time out if it doesn’t get a response after twenty seconds, and there’s no guarantee that someone will send Misty a message within that frame of time. We work around this in our skill by pairing our subscription request with a request to publish an empty message to the PubNub channel, which runs on a loop to keep the lines of communication open. When our Twilio bot forwards a message to PubNub, Misty returns it to our skill and passes it into the _pubNubSubscribe() callback function.
The outputExt() function (shown above) extracts the phone number and the value of the type parameter from the message the Twilio Function sends. Misty stores the phone number and checks that the value of type is equal to photo. If it is, she runs a block of code that has her move her head (and camera) to face the user, change her display image, and play sounds to let the user know what’s going on. Here’s an example of how that can look:
Misty -> Imgur
When you code Misty to take pictures, you can pass base64-encoded strings of the picture data into callback functions for additional processing. By default, those callback functions use the same name as the misty.TakePicture() method, prefixed with an underscore: _TakePicture(). In our skill, we use this _TakePicture() callback function to pass the base64-encoded string with our picture data into an uploadImage() function. This callback resembles the following:
When we call the uploadImage() function, Misty posts the picture to a private Imgur album. There are several image-sharing services we could use to host these pictures, but Imgur’s API does two things that make it ideal for the Photo Booth Skill. Thing One: it accepts base64-encoded strings, and Thing Two: it returns the URL for the uploaded image in the response body. By passing this returned URL back into Twilio’s MMS API, Misty can send the picture directly to the person who asked for it.
The code for managing this in the Photo Booth skill looks something like this:
Misty -> MMS
When we call the sendPicture() function, Misty sends a request to the Twilio API, which drops the image at the given URL into our user’s messaging inbox. It goes a bit like this:
So, to recap:
- Someone sends a message to our Twilio phone number
- Twilio passes the message to our Twilio Autopilot bot
- Our Autopilot bot reads the message, identifies the task, and redirects to our Twilio Function
- The Twilio Function posts the user’s phone number to our PubNub channel
- Misty, who’s been running the Photo Booth skill all the while, pulls down the message from PubNub
- Misty repositions her camera and takes a picture
- Misty uploads the picture to a private Imgur album and calls out to the Twilio API to send it as an MMS to our user
Sending your robot a text is a pretty sociable way to ask it to do something. When the robot replies with a picture of its favorite person? That’s downright chummy.
Join the Misty community to see what other developers are teaching Misty to do.