LEARNING DEVICE, LEARNING METHOD, AND RECORDING MEDIUM

Organization Name

Inventor(s)

LEARNING DEVICE, LEARNING METHOD, AND RECORDING MEDIUM - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240161009 titled 'LEARNING DEVICE, LEARNING METHOD, AND RECORDING MEDIUM

Simplified Explanation

The abstract of the patent application describes a learning device that acquires a next state and a reward as a result of an action, calculates a state value of the next state using a teacher model, generates a shaped reward from the state value, updates a policy of a student model using the shaped reward and a discount factor, and updates the discount factor.

Acquisition means acquires next state and reward
Calculation means calculates state value using teacher model
Generation means generates shaped reward from state value
Policy updating means updates student model policy using shaped reward and discount factor
Parameter updating means updates discount factor

Potential Applications

The technology described in this patent application could be applied in various fields such as:

Reinforcement learning systems
Autonomous vehicles
Robotics
Gaming industry

Problems Solved

This technology helps in addressing the following issues:

Improving learning efficiency
Enhancing decision-making processes
Optimizing resource allocation

Benefits

The benefits of this technology include:

Faster learning and adaptation
Increased accuracy in decision-making
Improved performance in complex environments

Potential Commercial Applications

The potential commercial applications of this technology could be seen in:

Education technology
Healthcare systems
Financial services
Manufacturing industry

Possible Prior Art

One possible prior art for this technology could be:

Q-learning algorithm
Deep reinforcement learning models

What is the specific algorithm used in the policy updating means?

The specific algorithm used in the policy updating means is not mentioned in the abstract. It would be helpful to know the exact method or approach employed for updating the policy of the student model.

How does the discount factor affect the learning process in this technology?

The abstract mentions the updating of the discount factor, but it does not elaborate on how this factor impacts the learning process. Understanding the role and significance of the discount factor in this technology would provide more insights into its functionality.

Original Abstract Submitted

in a learning device, the acquisition means acquires a next state and a reward as a result of an action. the calculation means calculates a state value of the next state using the next state and a state value function of a teacher model. the generation means generates a shaped reward from the state value. the policy updating means updates a policy of a student model using the shaped reward and a discount factor of the student model to be leaned. the parameter updating means updates the discount factor.

Nec corporation (20240161009). LEARNING DEVICE, LEARNING METHOD, AND RECORDING MEDIUM simplified abstract

Contents

LEARNING DEVICE, LEARNING METHOD, AND RECORDING MEDIUM

Organization Name

Inventor(s)