REINFORCEMENT LEARNING DEVICE, REINFORCEMENT LEARNING METHOD, AND REINFORCEMENT LEARNING PROGRAM

Abstract: provided is a reinforcement learning device that performs reinforcement learning for a continuous behavior space. in the reinforcement learning device, predetermined settings for simulation and an agent model are stored. the reinforcement learning device includes: an agent model estimation unit that inputs a state acquired by the simulation to the agent model and acquires a measure; a behavior determination unit that calculates a behavior based on the measure and a search amount defined in advance; and a search amount estimation unit for estimating the search amount, the agent model estimation unit updates the agent model according to the setting of the agent model based on the state, a reward, a flag, and the behavior, the search amount estimation unit updates the search amount based on a prediction reward obtained for the reward and the search amount in a previous trial, and the calculation of the behavior, the update of the agent model, and the update of the search amount are repeated until a predetermined condition according to the flag and the setting is satisfied.

Inventor(s): Midori KODAMA, Sotaro MAEJIMA, Ryohei MATSUYAMA, Takahiro HATA, Masato KAMIYA

CPC Classification: F24F11/63 (AIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING (removing dirt or fumes from areas where they are produced ; vertical ducts for carrying away waste gases from buildings ; tops for chimneys or ventilating shafts, terminals for flues ))

Search for rejections for patent application number 20250189158