DeepMind Technologies Limited patent applications on June 6th, 2024

From WikiPatents
Jump to navigation Jump to search

Patent Applications by DeepMind Technologies Limited on June 6th, 2024

DeepMind Technologies Limited: 6 patent applications

DeepMind Technologies Limited has applied for patents in the areas of G06N3/092 (6), G06F40/20 (2), G06N3/08 (2), G06N7/01 (2), G10L15/16 (2)

With keywords such as: network, methods, policy, training, latent, computer, demonstrator, environment, action, and trajectory in patent application abstracts.



Patent Applications by DeepMind Technologies Limited

20240184982.HIERARCHICAL TEXT GENERATION USING LANGUAGE MODEL NEURAL NETWORKS_simplified_abstract_(deepmind technologies limited)

Inventor(s): Kory Wallace Mathewson of Montreal (CA) for deepmind technologies limited, Piotr Wojciech Mirowski of London (GB) for deepmind technologies limited, Richard Andrew Evans of London (GB) for deepmind technologies limited

IPC Code(s): G06F40/20



Abstract: methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating long textual works using language model neural networks. for example, the textual works can be generated hierarchically by performing a hierarchy of generation steps using the same language model neural network.


20240185070.TRAINING ACTION SELECTION NEURAL NETWORKS USING LOOK-AHEAD SEARCH_simplified_abstract_(deepmind technologies limited)

Inventor(s): Karen Simonyan of London (GB) for deepmind technologies limited, David Silver of Hitchin (GB) for deepmind technologies limited, Julian Schrittwieser of London (GB) for deepmind technologies limited

IPC Code(s): G06N3/08, G06N7/01



Abstract: methods, systems and apparatus, including computer programs encoded on computer storage media, for training an action selection neural network. one of the methods includes receiving an observation characterizing a current state of the environment; determining a target network output for the observation by performing a look ahead search of possible future states of the environment starting from the current state until the environment reaches a possible future state that satisfies one or more termination criteria, wherein the look ahead search is guided by the neural network in accordance with current values of the network parameters; selecting an action to be performed by the agent in response to the observation using the target network output generated by performing the look ahead search; and storing, in an exploration history data store, the target network output in association with the observation for use in updating the current values of the network parameters.


20240185082.IMITATION LEARNING BASED ON PREDICTION OF OUTCOMES_simplified_abstract_(deepmind technologies limited)

Inventor(s): Andrew Coulter Jaegle of London (GB) for deepmind technologies limited, Yury Sulsky of London (GB) for deepmind technologies limited, Gregory Duncan Wayne of London (GB) for deepmind technologies limited, Robert David Fergus of New York NY (US) for deepmind technologies limited

IPC Code(s): G06N3/092



Abstract: a method is proposed of training a policy model to generate action data for controlling an agent to perform a task in an environment. the method comprises: obtaining, for each of a plurality of performances of the task, a corresponding demonstrator trajectory comprising a plurality of sets of state data characterizing the environment at each of a plurality of corresponding successive time steps during the performance of the task; using the demonstrator trajectories to generate a demonstrator model, the demonstrator model being operative to generate, for any said demonstrator trajectory, a value indicative of the probability of the demonstrator trajectory occurring; and jointly training an imitator model and a policy model. the joint training is performed by: generating a plurality of imitation trajectories, each imitation trajectory being generated by repeatedly receiving state data indicating a state of the environment, using the policy model to generate action data indicative of an action, and causing the action to be performed by the agent; training the imitator model using the imitation trajectories, the imitator model being operative to generate, for any said imitation trajectory, a value indicative of the probability of the imitation trajectory occurring; and training the policy model using a reward function which is a measure of the similarity of the demonstrator model and the imitator model.


20240185083.LEARNING DIVERSE SKILLS FOR TASKS USING SEQUENTIAL LATENT VARIABLES FOR ENVIRONMENT DYNAMICS_simplified_abstract_(deepmind technologies limited)

Inventor(s): Steven Stenberg Hansen of London (GB) for deepmind technologies limited, Guillaume Desjardins of London (GB) for deepmind technologies limited

IPC Code(s): G06N3/092



Abstract: this specification relates to methods for controlling agents to perform actions according to a goal (or option) comprising a sequence of local goals (or local options) and corresponding methods for training. as discussed herein, environment dynamics may be modelled sequentially by sampling latent variables, each latent variable relating to a local goal and being dependent on a previous latent variable. these latent variables are used to condition an action-selection policy neural network to select actions according to the local goal. this allows the agents to reach more diverse states than would be possible through a fixed latent variable or goal, thereby encouraging exploratory behavior. in addition, specific methods described herein model the sequence of latent variables through a simple linear and recurrent relationship that allows the system to be trained more efficiently. this avoids the need to learn a state-dependent higher level policy for selecting the latent variables which can be difficult to train in practice.


20240185084.MULTI-OBJECTIVE REINFORCEMENT LEARNING USING WEIGHTED POLICY PROJECTION_simplified_abstract_(deepmind technologies limited)

Inventor(s): Abbas Abdolmaleki of London (GB) for deepmind technologies limited, Sandy Han Huang of London (GB) for deepmind technologies limited, Martin Riedmiller of Balgheim (DE) for deepmind technologies limited

IPC Code(s): G06N3/092



Abstract: computer implemented systems and methods for training an action selection policy neural network to select actions to be performed by an agent to control the agent to perform a task. the techniques are able to optimize multiple objectives one of which may be to stay close to a behavioral policy of a teacher. the behavioral policy of the teacher may be defined by a predetermined dataset of behaviors and the systems and methods may then learn offline. the described techniques provide a mechanism for explicitly defining a trade-off between the multiple objectives.


20240185842.INTERACTIVE DECODING OF WORDS FROM PHONEME SCORE DISTRIBUTIONS_simplified_abstract_(deepmind technologies limited)

Inventor(s): Ioannis Alexandros Assael of London (GB) for deepmind technologies limited, Brendan Shillingford of London (GB) for deepmind technologies limited, Misha Man Ray Denil of London (GB) for deepmind technologies limited

IPC Code(s): G10L15/16, G06V40/20, G10L15/187



Abstract: methods, systems, and apparatus, including computer programs encoded on computer storage media, for interactive decoding of a word sequence.


DeepMind Technologies Limited patent applications on June 6th, 2024