18439222. Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning simplified abstract (The Regents of the University of California)

From WikiPatents
Jump to navigation Jump to search

Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning

Organization Name

The Regents of the University of California

Inventor(s)

Minglei Huang of Novi MI (US)

Wei Zhan of Berkeley CA (US)

Masayoshi Tomizuka of Berkeley CA (US)

Chen Tang of Berkeley CA (US)

Jinning Li of Berkeley CA (US)

Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning - A simplified explanation of the abstract

This abstract first appeared for US patent application 18439222 titled 'Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning

Simplified Explanation:

The patent application describes a method and system for controlling a device by training a low-level policy and value function, training a high-level goal planner, obtaining device observations, and generating executable actions to operate the device.

  • Training a low-level policy and value function using goal-conditioned episodes
  • Training a high-level goal planner with high-level goals and sub-goals
  • Maximizing cumulative reward over sub-goals for future time steps
  • Generating executable actions based on observations and goals
  • Operating the device with the generated actions

Key Features and Innovation:

  • Utilizes goal-conditioned episodes for training
  • Integrates low-level policy and value function with high-level goal planner
  • Maximizes cumulative reward over future time steps for goal achievement
  • Enables efficient device control based on observations and goals

Potential Applications:

  • Robotics control systems
  • Autonomous vehicles
  • Industrial automation
  • Smart home devices

Problems Solved:

  • Efficient device control based on high-level goals
  • Seamless integration of low-level and high-level control strategies
  • Optimization of actions for goal achievement

Benefits:

  • Enhanced device performance
  • Improved goal achievement
  • Streamlined control processes

Commercial Applications:

The technology can be applied in various industries such as robotics, autonomous vehicles, industrial automation, and smart home devices to optimize control processes and enhance performance.

Questions about the technology: 1. How does the training process for the low-level policy and value function work? 2. What are the specific benefits of using goal-conditioned episodes for training?

Frequently Updated Research:

Stay updated on advancements in reinforcement learning algorithms and control systems to enhance the efficiency and effectiveness of the technology.


Original Abstract Submitted

A method and system for controlling a device includes training a low-level policy to form a trained low-level policy and a low-level value function to form a trained goal conditioned value function, wherein training is performed using a static data set using goal conditioned episodes, training a high-level goal planner having high level goals having high-level sub-goals corresponding to a plurality of future time steps using the low-level value function to maximize a cumulative reward over the sub-goals for the plurality of future time steps so that the sub-goals are reachable by the low-level policy, obtaining an observation of a device, and generating an executable action using the low-level policy and the high-level goal planner and operating the device with the executable action.