The future of robotics with LLMs and Generative AI

Denis Korneev
Sep 18, 2023
3 min read

Updated: Oct 2, 2023

In the realm of generative artificial intelligence (GenAI), we are currently witnessing an astounding rate of growth, which doesn't come as a complete surprise. Since the invention of the calculator, we've been steadily advancing towards the creation of more intelligent machines and while recent GenAI breakthroughs may appear sudden, they represent the culmination of a gradual and painstaking journey.

Large Language Models (LLMs), with OpenAI's ChatGPT as a prime example, mark the latest milestone in AI's evolutionary path. These LLMs have reached a juncture akin to when the internet became accessible to the general public in 1993, poised to revolutionize every facet of human society, from our professional endeavors to our interactions with both each other and machines.

So, what do these latest LLM developments mean for the world of robotics?

To grasp the full implications of LLMs like ChatGPT, it's essential to delve into their mechanics. LLMs utilize Natural Language Processing (NLP) to generate the remarkably human-like responses they've become renowned for. Essentially, LLMs are trained on vast datasets, such as the internet, and employ a network of machine learning algorithms that mimic the functioning and storage of information in the human brain. Within this expansive framework, NLP algorithms and models enable LLMs to understand and use language in a human-like manner.

While the inner workings of LLMs like ChatGPT involve numerous layers and intricacies, their ability to communicate like humans is particularly pivotal. This capability democratizes the analytical and generative powers of AI, enabling seamless communication between people and machines through simple language.

It's hardly surprising that GenAI tools built upon LLMs are now proliferating. LLMs have made it feasible for humans to communicate with machines as effortlessly as they communicate with one another, thus granting AI a newfound accessibility. We can now all speak the same language.

However, this brings us back to the realm of robotics. Does this enhanced accessibility of GenAI, coupled with advances in LLMs, broaden the accessibility of robots beyond the confines of research and computer science? Could this herald the dawn of widespread robot adoption in society?

Integration of ChatGPT with Misty at Microsoft Technology Center in Bengaluru, India.

Indeed, LLMs like ChatGPT have made robots, such as the Misty and Furhat robot, considerably more autonomous. Interactions no longer necessitate meticulous scripting and design by developers for the robot to engage in fluid conversations with individuals. Thanks to ChatGPT, Bard, or any other AI chatbot, robots can be placed anywhere and are primed for human-like conversations on a myriad of topics. Even those lacking programming expertise can now enjoy a functional robot capable of meaningful interactions. This development is a game-changer, rendering robots far more appealing to a broader demographic and potentially accelerating their widespread integration into society.

Nevertheless, it's vital to recognize that a robot differs substantially from a mere chatbot or virtual avatar. As embodied agents, particularly robots like Misty, they must possess a broader skill set than what current AI chatbots offer.

Beyond generating responses to questions or prompts, a social robot must be adept at performing gestures and interpreting the gestures of others. It must convey emotions and sound genuinely human, not just in its voice but also in its speech patterns, incorporating pauses, interjections like "aha" or "hmmm," and responding appropriately with nods and eye contact. These non-verbal behaviors, collectively known as backchanneling, play a pivotal role in ensuring interactions with a robot feel natural and engaging.

ChatGPT enabled Misty (AIleen) at Twilio Signal 2023

It matters little how exceptional a chatbot's responses are if a robot is staring at the ground, blinking excessively, or incessantly nodding while delivering them. In the realm of human-robot interaction integrating them with AI chatbots certainly enhances their conversational autonomy. However, this autonomy must be complemented with proficient backchanneling to achieve a truly human-like interaction. Until LLMs can be trained on data from spoken interactions to automatically generate backchanneling behaviors, their capabilities in social robotics interactions will be limited.

Nevertheless, current iterations of AI chatbots undeniably expedite and streamline robot skill development like never before. While much work lies ahead, there's no doubt that this is an exceptionally exciting era for robotics companies like ours. We are entering a phase in which numerous AI firms, emerging across various domains, will be exploring novel ways for AI to interact with people and the world. Robots emerge as the natural choice, as they offer AI a more embodied and direct connection to the world, a critical step toward achieving greater levels of intelligence and comprehension.

In summary, robots are now poised to become true social agents, taking LLMs and NLP to unprecedented heights by encompassing an understanding of body language, facial expressions, and social nuances. The prospect of providing AI with such a multi-dimensional experience is indeed exciting and a goal we are fully prepared to pursue.

The future of robotics with LLMs and Generative AI

Subscribe to our newsletter

Misty - a part of Furhat Robotics

Visit furhatrobotics.com