Nec corporation (20240161009). LEARNING DEVICE, LEARNING METHOD, AND RECORDING MEDIUM simplified abstract

From WikiPatents
Jump to navigation Jump to search

LEARNING DEVICE, LEARNING METHOD, AND RECORDING MEDIUM

Organization Name

nec corporation

Inventor(s)

Yuki Nakaguchi of Tokyo (JP)

Dai Kubota of Tokyo (JP)

LEARNING DEVICE, LEARNING METHOD, AND RECORDING MEDIUM - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240161009 titled 'LEARNING DEVICE, LEARNING METHOD, AND RECORDING MEDIUM

Simplified Explanation

The abstract of the patent application describes a learning device that acquires a next state and a reward as a result of an action, calculates a state value of the next state using a teacher model, generates a shaped reward from the state value, updates a policy of a student model using the shaped reward and a discount factor, and updates the discount factor.

  • Acquisition means acquires next state and reward
  • Calculation means calculates state value using teacher model
  • Generation means generates shaped reward from state value
  • Policy updating means updates student model policy using shaped reward and discount factor
  • Parameter updating means updates discount factor

Potential Applications

The technology described in this patent application could be applied in various fields such as:

  • Reinforcement learning systems
  • Autonomous vehicles
  • Robotics
  • Gaming industry

Problems Solved

This technology helps in addressing the following issues:

  • Improving learning efficiency
  • Enhancing decision-making processes
  • Optimizing resource allocation

Benefits

The benefits of this technology include:

  • Faster learning and adaptation
  • Increased accuracy in decision-making
  • Improved performance in complex environments

Potential Commercial Applications

The potential commercial applications of this technology could be seen in:

  • Education technology
  • Healthcare systems
  • Financial services
  • Manufacturing industry

Possible Prior Art

One possible prior art for this technology could be:

  • Q-learning algorithm
  • Deep reinforcement learning models

What is the specific algorithm used in the policy updating means?

The specific algorithm used in the policy updating means is not mentioned in the abstract. It would be helpful to know the exact method or approach employed for updating the policy of the student model.

How does the discount factor affect the learning process in this technology?

The abstract mentions the updating of the discount factor, but it does not elaborate on how this factor impacts the learning process. Understanding the role and significance of the discount factor in this technology would provide more insights into its functionality.


Original Abstract Submitted

in a learning device, the acquisition means acquires a next state and a reward as a result of an action. the calculation means calculates a state value of the next state using the next state and a state value function of a teacher model. the generation means generates a shaped reward from the state value. the policy updating means updates a policy of a student model using the shaped reward and a discount factor of the student model to be leaned. the parameter updating means updates the discount factor.