20230177348. REINFORCEMENT LEARNING METHOD AND APPARATUS USING TASK DECOMPOSITION simplified abstract (RESEARCH & BUSINESS FOUNDATION SUNGKYUNKWAN UNIVERSITY)
Contents
REINFORCEMENT LEARNING METHOD AND APPARATUS USING TASK DECOMPOSITION
Organization Name
RESEARCH & BUSINESS FOUNDATION SUNGKYUNKWAN UNIVERSITY
Inventor(s)
Gwang Pyo Yoo of Suwon-si (KR)
REINFORCEMENT LEARNING METHOD AND APPARATUS USING TASK DECOMPOSITION - A simplified explanation of the abstract
This abstract first appeared for US patent application 20230177348 titled 'REINFORCEMENT LEARNING METHOD AND APPARATUS USING TASK DECOMPOSITION
Simplified Explanation
The patent application describes a reinforcement learning method that uses a task decomposition inference model in a time-variant environment. Here are the key points:
- The method involves selecting paired transitions from a dataset, where each pair has a common characteristic that does not change over time and a different environmental characteristic that does change over time.
- A cycle generative adversarial network (GAN) is used to identify these paired transitions.
- An auto encoder is trained to embed the time-variant and time-invariant parts of each paired transition into a latent space.
- Reinforcement learning is then performed on transitions collected in the time-variant environment using the trained auto encoder.
Potential applications of this technology:
- This method can be applied to various reinforcement learning tasks in dynamic environments, such as robotics, autonomous vehicles, and game playing.
- It can be used to improve the efficiency and effectiveness of learning in time-variant environments.
Problems solved by this technology:
- Traditional reinforcement learning methods struggle to adapt to time-variant environments, as they do not consider the changing characteristics of the environment.
- This method addresses this problem by decomposing the task into time-invariant and time-variant parts, allowing for more effective learning in dynamic environments.
Benefits of this technology:
- By considering both time-invariant and time-variant characteristics, this method enables more accurate and efficient learning in dynamic environments.
- It allows for better adaptation to changing environmental conditions, leading to improved performance in tasks that require real-time decision making.
Original Abstract Submitted
according to an exemplary embodiment of the present invention, a reinforcement learning method using a task decomposition inference model in a time-variant environment includes selecting a plurality of paired transitions having a time-invariant common characteristic and a time-variant different environmental characteristic from a dataset including a plurality of transition data, based on a cycle generative adversarial network (gan), training an auto encoder to embed each of the time-variant part and the time-invariant part with respect to the plurality of paired transitions into a latent space, and performing reinforcement learning on a transition corresponding to data collected in the time-variant environment, using the trained auto encoder.