18589910. DEVICE AND METHOD FOR IMPROVED POLICY LEARNING FOR ROBOTS simplified abstract (Robert Bosch GmbH)

From WikiPatents
Jump to navigation Jump to search

DEVICE AND METHOD FOR IMPROVED POLICY LEARNING FOR ROBOTS

Organization Name

Robert Bosch GmbH

Inventor(s)

Felix Berkenkamp of Muenchen (DE)

Gaurav Manek of Pittsburgh PA (US)

Jeremy Zieg Kolter of Pittsburgh PA (US)

Melrose Roderick of Pittsburgh PA (US)

DEVICE AND METHOD FOR IMPROVED POLICY LEARNING FOR ROBOTS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18589910 titled 'DEVICE AND METHOD FOR IMPROVED POLICY LEARNING FOR ROBOTS

The abstract describes a computer-implemented method for learning a policy for an agent using neural networks and auxiliary parameters.

  • Receiving an initialized first neural network (Q-function or value-function), an initialized second neural network, auxiliary parameters, and an initialized policy.
  • Sampling pairs of states, actions, rewards, and new states from a storage.
  • Sampling actions for current states and actions for new sampled states.
  • Computing features from a penultimate layer of the first neural network based on sampled states and actions.
  • Updating the second neural network, auxiliary parameters, and parameters of the first neural network using a re-weighted loss.

Potential Applications: - Reinforcement learning algorithms - Autonomous agents in robotics - Game playing AI

Problems Solved: - Learning optimal policies for agents - Improving decision-making processes - Enhancing the efficiency of neural network training

Benefits: - Faster convergence of learning algorithms - Improved performance of agents - Adaptability to various environments

Commercial Applications: Title: "Enhanced Reinforcement Learning for Autonomous Systems" This technology can be used in autonomous vehicles, smart home systems, and industrial automation to optimize decision-making processes and enhance overall performance.

Questions about the technology: 1. How does this method improve the learning process for agents compared to traditional algorithms? 2. What are the potential limitations of using neural networks in learning policies for agents?

Frequently Updated Research: Stay updated on advancements in reinforcement learning algorithms and neural network optimization techniques to enhance the efficiency of this method.


Original Abstract Submitted

A computer-implemented method of learning a policy for an agent. The method includes: receiving an initialized first neural network, in particular a Q-functionor value-function, an initialized second neural network, auxiliary parameters, and the initialized policy; repeating the following steps until a termination condition is fulfilled: sampling a plurality of pairs of states, actions, rewards and new states from a storage. Sampling actions for the current states, and actions for the new sampled states; computing features from a penultimate layer of the first neural network based on the sampled states and actions and updating the second neural network and the auxiliary parameters as well as updating parameters the first neural network using a re-weighted loss.