Language to rewards for robotic skill synthesis
🔬 ANALYSEUR SCIENCE & TECH
Language to rewards for robotic skill synthesis
🤖 Intelligence Artificielle
✍️ Auteur(s)
Wenhao Yu and Fei Xia
📅 Publication
2023-08-22T11:47:00.004-07:00
📖 Longueur
800 mots
Source: https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgacqpwjLAyY...
📋 Extrait de l'article
Posted by Wenhao Yu and Fei Xia, Research Scientists, Google Empowering end-users to interactively teach robots to perform novel tasks is a crucial capability for their successful integration into real-world applications. For example, a user may want to teach a robot dog to perform a new trick, or teach a manipulator robot how to organize a lunch box based on user preferences. The recent advancements in large language models (LLMs) pre-trained on extensive internet data have shown a promising path towards achieving this goal. Indeed, researchers have explored diverse ways of leveraging LLMs for robotics, from step-by-step planning and goal-oriented dialogue to robot-code-writing agents . While these methods impart new modes of compositional generalization, they focus on using language to link together new behaviors from an existing library of control primitives that are either manually engineered or learned a priori . Despite having internal knowledge about robot motions, LLMs struggle to directly output low-level robot commands due to the limited availability of relevant training data. As a result, the expression of these methods are bottlenecked by the breadth of the available primitives, the design of which often requires extensive expert knowledge or massive data collection. In “ Language to Rewards for Robotic Skill Synthesis ”, we propose an approach to enable users to teach robots novel actions through natural language input. To do so, we leverage reward functions as an interface that bridges the gap between language and low-level robot actions. We posit that reward functions provide an ideal interface for such tasks given their richness in semantics, modularity, and interpretability. They also provide a direct connection to low-level policies through black-box optimization or reinforcement learning (RL). We developed a language-to-reward system that leverages LLMs to translate natural language user instructions into reward-specifying code and then applies MuJoCo MPC to find optimal low-level robot actions that maximize the generated reward function. We demonstrate our language-to-reward system on a variety of robotic control tasks in simulation using a quadruped robot and a dexterous manipulator robot. We further validate our method on a physical robot manipulator. The language-to-reward system consists of two core components: (1) a Reward Translator, and (2) a Motion Controller . The Reward Translator maps natural language instruction from users to reward functions represented as python code . The Motion Controller optimizes the given reward function using receding horizon optimization to find the optimal low-level robot actions, such as the amount of torque that should be applied to each robot motor. LLMs cannot directly generate low-level robotic actions due to lack of data in pre-training dataset. We propose to use reward functions to bridge the gap between language and low-level robot actions, and enable novel complex robot motions from natural language instructions. Reward Translator: Translating user instructions to reward functions The Reward Translator module was built with the goal of mapping natural language user instructions to reward functions. Reward tuning is highly domain-specific and requires expert knowledge, so it was not surprising to us when we found that LLMs trained on generic language datasets are unable to directly generate a reward function for a specific hardware. To address this, we apply the in-context learning ability of LLMs. Furthermore, we split the Reward Translator into two sub-modules: Motion Descriptor and Reward Coder . Motion Descriptor First, we design a Motion Descriptor that interprets input from a user and expands it into a natural language description of the desired robot motion following a predefined template. This Motion Descriptor turns potentially ambiguous or vague user instructions into more specific and descriptive robot motions, making the reward coding task more stable. Moreover, users interact with the system through the motion description field, so this also provides a more interpretable interface for users compared to directly showing the reward function. To create the Motion Descriptor, we use an LLM to translate the user input into a detailed description of the desired robot motion. We design prompts that guide the LLMs to output the motion description with the right amount of details and format. By translating a vague user instruction into a more detailed description, we are able to more reliably generate the reward function with our system. This idea can also be potentially applied more generally beyond robotics tasks, and is relevant to Inner-Monologue and chain-of-thought prompting . Reward Coder In the second stage, we use the same LLM from Motion Descriptor for Reward Coder, which translates generated motion description into the reward function. Reward functions are represented using python code to benefit from the LLMs’ knowledge of reward, coding, and code structure. Ideally, we would like to use an LLM to directly generate a reward function R ( s , t ) that maps the robot state s and time t into a scalar reward value. However, generating the correct reward function from...
📖 LIRE L'ARTICLE COMPLET SUR CE LIEN :
🔗 http://ai.googleblog.com/2023/08/language-to-rewards-for-robotic-skill.htmlCliquez sur le lien ci-dessus pour consulter l'article dans son intégralité.
🏷️ Mots-clés :
🤖 Intelligence Artificielle
📊 Statistique :
Extrait de 800 mots
🤖 Publication automatique par Analyseur Science | Source originale : http://ai.googleblog.com/2023/08/language-to-rewards-for-robotic-skill...
Commentaires
Enregistrer un commentaire