Patent Applications by DeepMind Technologies Limited on June 19th, 2025

DeepMind Technologies Limited: 3 patent applications

DeepMind Technologies Limited has applied for patents in the areas of G06N3/092 (Reinforcement learning, 2), B25J9/1661 ({characterised by task planning, object-oriented languages}, 1)

With keywords such as: more, describes, this, computers, specification, systems, methods, implemented, computer, programs in patent application abstracts.

Top Inventors:

Martin Riedmiller of Balgheim DE (1 patents)
Roland Hafner of Balgheim DE (1 patents)
Tim Hertweck of Lauchringen DE (1 patents)
Hubert Josef Soyer of London GB (1 patents)
Feryal Behbahani of London GB (1 patents)

Patent Applications by DeepMind Technologies Limited

20250196347. DISPATCHER-EXECUTOR SYSTEMS MULTI-TASK LEARNING (DeepMind Technologies Limited)

Abstract: this specification describes systems and methods, implemented as computer programs on one or more computers in one or more locations, for controlling an agent to perform multiple different tasks in an environment. the described techniques partition the architecture of a controller into a dispatcher that understands the environment and an executor that understands how to control the agent, with a control channel between them that structures the partitioning. this allows implementations of the controller to generalize better.

20250200379. HIERARCHICAL REINFORCEMENT LEARNING SCALE (DeepMind Technologies Limited)

Abstract: the invention describes a system and a method for controlling an agent interacting with an environment to perform a task, the method comprising, at each of a plurality of first time steps from a plurality of time steps: receiving an observation characterizing a state of the environment at the first time step; determining a goal representation for the first time step that characterizes a goal state of the environment to be reached by the agent; processing the observation and the goal representation using a low-level controller neural network to generate a low-level policy output that defines an action to be performed by the agent in response to the observation, wherein the low-level controller neural network comprises: a representation neural network configured to process the observation to generate an internal state representation of the observation, and a low-level policy head configured to process the state observation representation and the goal representation to generate the low-level policy output; and controlling the agent using the low-level policy output.

20250200380. REINFORCEMENT LEARNING EXPLORE ENVIRONMENTS (DeepMind Technologies Limited)

Abstract: the invention describes the method performed by one or more computers and for training a base policy neural network that is configured to receive a base policy input comprising an observation of a state of an environment and to process the policy input to generate a base policy output that defines an action to be performed by an agent in response to the observation, the method comprising: generating training data for training the base policy neural network by controlling an agent using (i) the base policy neural network and (ii) an exploration strategy that maps, in accordance with a set of one or more parameters, base policy outputs generated by the base policy neural network to actions performed by the agent to interact with an environment, the generating comprising, at each of a plurality of time points: determining that criteria for updating the exploration strategy are satisfied at the time point; and in response to determining that the criteria are satisfied: generating a meta policy input that comprises data characterizing a performance of the base policy neural network in controlling the agent at the time point; processing the meta policy input using a meta policy to generate a meta policy output that specifies respective values for each of the set of one or more parameters that define the exploration strategy; and controlling the agent using the base policy neural network and in accordance with the exploration strategy defined by the respective values for the set of one or more parameters specified by the meta policy output.