18589910. DEVICE AND METHOD FOR IMPROVED POLICY LEARNING FOR ROBOTS simplified abstract (Robert Bosch GmbH)
Contents
DEVICE AND METHOD FOR IMPROVED POLICY LEARNING FOR ROBOTS
Organization Name
Inventor(s)
Felix Berkenkamp of Muenchen (DE)
Gaurav Manek of Pittsburgh PA (US)
Jeremy Zieg Kolter of Pittsburgh PA (US)
Melrose Roderick of Pittsburgh PA (US)
DEVICE AND METHOD FOR IMPROVED POLICY LEARNING FOR ROBOTS - A simplified explanation of the abstract
This abstract first appeared for US patent application 18589910 titled 'DEVICE AND METHOD FOR IMPROVED POLICY LEARNING FOR ROBOTS
The abstract describes a computer-implemented method for learning a policy for an agent using neural networks and auxiliary parameters.
- Receiving an initialized first neural network (Q-function or value-function), an initialized second neural network, auxiliary parameters, and an initialized policy.
- Sampling pairs of states, actions, rewards, and new states from a storage.
- Sampling actions for current states and actions for new sampled states.
- Computing features from a penultimate layer of the first neural network based on sampled states and actions.
- Updating the second neural network, auxiliary parameters, and parameters of the first neural network using a re-weighted loss.
Potential Applications: - Reinforcement learning algorithms - Autonomous agents in robotics - Game playing AI
Problems Solved: - Learning optimal policies for agents - Improving decision-making processes - Enhancing the efficiency of neural network training
Benefits: - Faster convergence of learning algorithms - Improved performance of agents - Adaptability to various environments
Commercial Applications: Title: "Enhanced Reinforcement Learning for Autonomous Systems" This technology can be used in autonomous vehicles, smart home systems, and industrial automation to optimize decision-making processes and enhance overall performance.
Questions about the technology: 1. How does this method improve the learning process for agents compared to traditional algorithms? 2. What are the potential limitations of using neural networks in learning policies for agents?
Frequently Updated Research: Stay updated on advancements in reinforcement learning algorithms and neural network optimization techniques to enhance the efficiency of this method.
Original Abstract Submitted
A computer-implemented method of learning a policy for an agent. The method includes: receiving an initialized first neural network, in particular a Q-functionor value-function, an initialized second neural network, auxiliary parameters, and the initialized policy; repeating the following steps until a termination condition is fulfilled: sampling a plurality of pairs of states, actions, rewards and new states from a storage. Sampling actions for the current states, and actions for the new sampled states; computing features from a penultimate layer of the first neural network based on the sampled states and actions and updating the second neural network and the auxiliary parameters as well as updating parameters the first neural network using a re-weighted loss.