17551708. STEPWISE UNCERTAINTY-AWARE OFFLINE REINFORCEMENT LEARNING UNDER CONSTRAINTS simplified abstract (INTERNATIONAL BUSINESS MACHINES CORPORATION)

From WikiPatents
Jump to navigation Jump to search

STEPWISE UNCERTAINTY-AWARE OFFLINE REINFORCEMENT LEARNING UNDER CONSTRAINTS

Organization Name

INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor(s)

Akifumi Wachi of Tokyo (JP)

Takayuki Osogami of Yamato-shi (JP)

STEPWISE UNCERTAINTY-AWARE OFFLINE REINFORCEMENT LEARNING UNDER CONSTRAINTS - A simplified explanation of the abstract

This abstract first appeared for US patent application 17551708 titled 'STEPWISE UNCERTAINTY-AWARE OFFLINE REINFORCEMENT LEARNING UNDER CONSTRAINTS

Simplified Explanation

The patent application describes a computer-implemented method for offline reinforcement learning using a dataset. Here are the key points:

  • The method involves training a neural network that takes a state-action pair as input and outputs a Q function for both reward and safety constraints.
  • The neural network has a linear output layer and non-linear layers represented by a feature mapping function.
  • The training process includes constructing Q functions based on the dataset using an offline reinforcement algorithm to obtain the feature mapping function.
  • The training also involves tuning a weight between the reward and safety constraints using the feature mapping function.
  • During the obtaining and tuning steps, an estimate of the Q function is provided by subtracting an uncertainty from the expected value of the Q function.
  • The uncertainty is determined by a function that maps the state-action pair to an error size.

Potential applications of this technology:

  • Autonomous driving systems: The method could be used to train a neural network to make decisions based on both reward and safety constraints, improving the safety of autonomous vehicles.
  • Robotics: The method could be applied to train robots to perform tasks while considering both reward and safety factors, ensuring safe and efficient operation.
  • Healthcare: The method could be used to train AI systems that make medical decisions, taking into account both patient well-being and safety considerations.

Problems solved by this technology:

  • Balancing reward and safety: The method addresses the challenge of training AI systems to consider both reward and safety constraints, ensuring that decisions are made with a balance between the two.
  • Offline reinforcement learning: The method allows for training the neural network using a dataset, eliminating the need for real-time interaction with the environment, which can be time-consuming and costly.

Benefits of this technology:

  • Improved safety: By considering safety constraints during the training process, the method helps to ensure that AI systems make decisions that prioritize safety.
  • Efficient training: The use of a dataset for offline reinforcement learning allows for more efficient training of the neural network, reducing the need for real-time interaction and speeding up the learning process.
  • Versatility: The method can be applied to various domains and tasks, allowing for the training of AI systems that consider both reward and safety factors.


Original Abstract Submitted

A computer-implemented method is provided for offline reinforcement learning with a dataset. The method includes training a neural network which inputs a state-action pair and outputs a respective Q function for each of a reward and one or more safety constraints, respectively. The neural network has a linear output layer and remaining non-linear layers being represented by a feature mapping function. The training includes obtaining the feature mapping function by constructing Q-functions based on the dataset according to an offline reinforcement algorithm. The training further includes tuning, using the feature mapping function, a weight between the reward and the one or more safety constraints, wherein during the obtaining and the tuning steps, an estimate of a Q-function is provided by subtracting an uncertainty from an expected value of the Q-function. The uncertainty is a function to map the state-action pair to an error size.