The GPT moment for robots is coming

By Rafael Hostettler

ChatGPT is now more than one year old – and this is a good moment to observe in which ways the world has changed and even more importantly: in which it has not.

Generative text has found its way into many tools. In many ways its use has normalised – I have a keyboard shortcut to directly talk to ChatGPT – and I use it many times a day for all sorts of things, from translations, text corrections, changes, code snippets and more – a bit like a micro-assistant or extended brain.

Our Confluence now has AI in search and a writing assistant as well. But unlike a few early adopters, companies are still trying to figure out how to really extract value from this amazing technology.

Image generation is one step behind in this regard, as the models still struggle to create consistency – as well as finding ways to give artists the level of control they need. That being said, with for example the integration of GenAI into Photoshop, Adobe is on a great path looping in the artists.

From my network, I hear it has disrupted asset generation for games in a major fashion, and will likely keep doing so. Once the consistency problems can be solved, I see an exponentially larger impact in anything graphics related.

Finally, we are starting to see multi-modality in the models, opening the ChatGPT app, taking a photo of a confusing parking sign in a foreign language and asking it if I can park here now, and getting a good answer, clearly feels like the future is here.

But all of this has not fundamentally changed my work (yet). It made me a lot faster in some places and more enjoyable here and there, especially as a lot of boring things became a lot less boring as they are now down to minimal touch points. But the true power still feels locked away behind interfaces – for me as a user and also for the AI to the tools and programs I use.

So in essence, the reason the capabilities have not found their ways into more productive ways yet boils down to integration and usability – and because integration is fractured and usability takes human feedback to iterate, this takes time. And it’s also just been a year.

But robots

Mostly unbeknownst to the larger community, robotics has made strides in control thanks to generative AI and we are moving towards previously impossible autonomy at very fast speed! It did so in an obvious way, and a surprising one.

Obvious first. LLMs are great in transforming text. Longer, shorter, different language, and format – it is thus no surprise that they are also good at translating human intent into something that is more palpable for robots to understand.

Google Deepmind had an excellent demonstration last spring with PaLM-E, using LLMs to break down vague instructions like “Clean the table” into a sequence of actions a robot could execute. This is also where all the progress from multi-modal models shines – because they make it easy to bridge between vision and language.

The second, albeit less obvious, breakthrough is the application of diffusion models in robotic control systems. These models have proven to be instrumental in creating versatile and robust control policies for specific tasks. Robots can now handle a variety of scenarios with unprecedented dexterity and adaptability. This adaptability marks a substantial leap from previous limitations, where there was no clear understanding of how to even state a task to enable a robot to execute it.

Combine this with the current flurry of humanoids entering the market and it is no surprise that we are seeing first impressive demos of these humanoids doing real tasks. E.g. Tesla handling eggs and Figure making coffee. (There seems to be a affinity for breakfast among roboticists…)

The impact of these advancements is not to be underestimated and akin to the early days of the first GPT models. As those initial models represented significant strides for a specialized group of experts to build on, the current developments in robotics are setting the stage for broader accessibility and application.

This is not yet the ChatGPT moment. A universally accessible robot control kit for the general public is tantalizingly close, but still 2-3 years away. But from here on, the roadmap has cleared rapidly and it is now very clear that it will be less than a decade before advanced robots will become an integral part of everyday life.

For us, these advancements are particularly exhilarating as it means that we will be able to reduce the need for human operators in various tasks much sooner than anticipated. Especially household tasks without direct user contact will quickly be automated – think dishwasher handling, cooking, etc.

In conclusion, the breakthroughs brought about by GenAI in robotics signify a significant shift from conceptual possibilities to practical applications. As the technology continues to evolve and become more accessible, we stand on the brink of a new era in robotics, one that promises to reshape our interaction with technology and expand the horizons of what is achievable in automation and artificial intelligence. At Devanthro, we are leading the way, leveraging the progress in AI to allow our elderly relatives to age in dignity.

Devanthro is a Munich-based robotics and AI business, building Robodies – robotic avatars for the elderly care market. Their partners include Charité Berlin, University of Oxford, and Diakonie. An early prototype is part of the permanent exhibition at Deutsches Museum in Munich. For more information, please visit https://devanthro.com/