Google llc (20250144795). CONTROLLING ROBOTS USING MULTI-MODAL LANGUAGE MODELS
CONTROLLING ROBOTS USING MULTI-MODAL LANGUAGE MODELS
Organization Name
Inventor(s)
Peter Raymond Florence of San Francisco CA US
Danny Michael Driess of Berlin DE
Igor Mordatch of Oakland CA US
Seyed Mohammad Mehdi Sajjadi of Berlin DE
CONTROLLING ROBOTS USING MULTI-MODAL LANGUAGE MODELS
This abstract first appeared for US patent application 20250144795 titled 'CONTROLLING ROBOTS USING MULTI-MODAL LANGUAGE MODELS
Original Abstract Submitted
methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling an agent interacting with an environment. in one aspect, a method comprises: receiving one or more observations of an environment; receiving an input text sequence that describes a task to be performed by a robot in the environment; generating an encoded representation of the input text sequence in an embedding space; generating a corresponding encoded representation of each of the one or more observations in the embedding space; generating a sequence of input tokens that comprises the encoded representation of the input text sequence and the corresponding encoded representation of each observation; processing the sequence of input tokens using a language model neural network to generate an output text sequence that comprises high-level natural language instructions; and determining, from the high-level natural language instructions, one or more actions to be performed by the robot.