Jump to content

Google llc (20250144795). CONTROLLING ROBOTS USING MULTI-MODAL LANGUAGE MODELS

From WikiPatents


CONTROLLING ROBOTS USING MULTI-MODAL LANGUAGE MODELS

Organization Name

google llc

Inventor(s)

Peter Raymond Florence of San Francisco CA US

Danny Michael Driess of Berlin DE

Igor Mordatch of Oakland CA US

Andy Zeng of Stanford CA US

Seyed Mohammad Mehdi Sajjadi of Berlin DE

Klaus Greff of Berlin DE

CONTROLLING ROBOTS USING MULTI-MODAL LANGUAGE MODELS

This abstract first appeared for US patent application 20250144795 titled 'CONTROLLING ROBOTS USING MULTI-MODAL LANGUAGE MODELS

Original Abstract Submitted

methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling an agent interacting with an environment. in one aspect, a method comprises: receiving one or more observations of an environment; receiving an input text sequence that describes a task to be performed by a robot in the environment; generating an encoded representation of the input text sequence in an embedding space; generating a corresponding encoded representation of each of the one or more observations in the embedding space; generating a sequence of input tokens that comprises the encoded representation of the input text sequence and the corresponding encoded representation of each observation; processing the sequence of input tokens using a language model neural network to generate an output text sequence that comprises high-level natural language instructions; and determining, from the high-level natural language instructions, one or more actions to be performed by the robot.

Cookies help us deliver our services. By using our services, you agree to our use of cookies.