18268664. LEARNING DEVICE, LEARNING METHOD, AND LEARNING PROGRAM simplified abstract (NEC Corporation)
Contents
LEARNING DEVICE, LEARNING METHOD, AND LEARNING PROGRAM
Organization Name
Inventor(s)
LEARNING DEVICE, LEARNING METHOD, AND LEARNING PROGRAM - A simplified explanation of the abstract
This abstract first appeared for US patent application 18268664 titled 'LEARNING DEVICE, LEARNING METHOD, AND LEARNING PROGRAM
Simplified Explanation
The patent application describes a method that involves using a reward function with specific features and satisfying a Lipschitz continuity condition. This function is used as input for the estimation process, which aims to find a trajectory that minimizes the Wasserstein distance. The Wasserstein distance represents the difference between the probability distribution of an expert's trajectory and the probability distribution of a trajectory determined using the reward function's parameters. The update process then adjusts the parameters of the reward function to maximize the Wasserstein distance based on the estimated trajectory.
- The patent application describes a method for optimizing a reward function.
- The method involves using a reward function with specific features and satisfying a Lipschitz continuity condition.
- The estimation process aims to find a trajectory that minimizes the Wasserstein distance.
- The Wasserstein distance represents the difference between the probability distributions of expert and estimated trajectories.
- The update process adjusts the parameters of the reward function to maximize the Wasserstein distance based on the estimated trajectory.
Potential Applications:
- Reinforcement learning: This method can be applied in reinforcement learning algorithms to optimize reward functions and improve the learning process.
- Robotics: The method can be used in robotics to enhance trajectory planning and control by optimizing reward functions.
- Autonomous vehicles: By optimizing reward functions, this technology can improve the decision-making process of autonomous vehicles.
Problems Solved:
- Reward function optimization: The method solves the problem of efficiently optimizing reward functions to improve learning and decision-making processes.
- Trajectory estimation: By minimizing the Wasserstein distance, the method solves the problem of accurately estimating trajectories based on expert data.
Benefits:
- Improved learning performance: By optimizing reward functions, the method can enhance the learning performance of reinforcement learning algorithms.
- Enhanced decision-making: The technology can improve the decision-making capabilities of autonomous systems by accurately estimating trajectories.
- Efficient optimization: The method provides an efficient way to optimize reward functions, reducing the computational complexity of the process.
Original Abstract Submitted
A function input means accepts input of a reward function whose features are set to satisfy a Lipschitz continuity condition. An estimation means estimates a trajectory that minimizes Wasserstein distance, which represents distance between probability distribution of a trajectory of an expert and probability distribution of a trajectory determined based on parameters of the reward function. An update means updates the parameters of the reward function to maximize the Wasserstein distance based on the estimated trajectory.