LEARNING DEVICE, LEARNING METHOD, AND LEARNING PROGRAM

Organization Name

Inventor(s)

LEARNING DEVICE, LEARNING METHOD, AND LEARNING PROGRAM - A simplified explanation of the abstract

This abstract first appeared for US patent application 18268664 titled 'LEARNING DEVICE, LEARNING METHOD, AND LEARNING PROGRAM

Simplified Explanation

The patent application describes a method that involves using a reward function with specific features and satisfying a Lipschitz continuity condition. This function is used as input for the estimation process, which aims to find a trajectory that minimizes the Wasserstein distance. The Wasserstein distance represents the difference between the probability distribution of an expert's trajectory and the probability distribution of a trajectory determined using the reward function's parameters. The update process then adjusts the parameters of the reward function to maximize the Wasserstein distance based on the estimated trajectory.

The patent application describes a method for optimizing a reward function.
The method involves using a reward function with specific features and satisfying a Lipschitz continuity condition.
The estimation process aims to find a trajectory that minimizes the Wasserstein distance.
The Wasserstein distance represents the difference between the probability distributions of expert and estimated trajectories.
The update process adjusts the parameters of the reward function to maximize the Wasserstein distance based on the estimated trajectory.

Potential Applications:

Reinforcement learning: This method can be applied in reinforcement learning algorithms to optimize reward functions and improve the learning process.
Robotics: The method can be used in robotics to enhance trajectory planning and control by optimizing reward functions.
Autonomous vehicles: By optimizing reward functions, this technology can improve the decision-making process of autonomous vehicles.

Problems Solved:

Reward function optimization: The method solves the problem of efficiently optimizing reward functions to improve learning and decision-making processes.
Trajectory estimation: By minimizing the Wasserstein distance, the method solves the problem of accurately estimating trajectories based on expert data.

Benefits:

Improved learning performance: By optimizing reward functions, the method can enhance the learning performance of reinforcement learning algorithms.
Enhanced decision-making: The technology can improve the decision-making capabilities of autonomous systems by accurately estimating trajectories.
Efficient optimization: The method provides an efficient way to optimize reward functions, reducing the computational complexity of the process.

Original Abstract Submitted

A function input means accepts input of a reward function whose features are set to satisfy a Lipschitz continuity condition. An estimation means estimates a trajectory that minimizes Wasserstein distance, which represents distance between probability distribution of a trajectory of an expert and probability distribution of a trajectory determined based on parameters of the reward function. An update means updates the parameters of the reward function to maximize the Wasserstein distance based on the estimated trajectory.

18268664. LEARNING DEVICE, LEARNING METHOD, AND LEARNING PROGRAM simplified abstract (NEC Corporation)

Contents

LEARNING DEVICE, LEARNING METHOD, AND LEARNING PROGRAM

Organization Name

Inventor(s)