MODEL AGGREGATION FOR FITTED Q-EVALUATION

Organization Name

INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor(s)

MODEL AGGREGATION FOR FITTED Q-EVALUATION - A simplified explanation of the abstract

This abstract first appeared for US patent application 17546629 titled 'MODEL AGGREGATION FOR FITTED Q-EVALUATION

Simplified Explanation

The abstract describes a computer-based method for evaluating decision-making policies. The method estimates the effectiveness of a given policy by analyzing a dataset of state-action-reward-state combinations. It uses a set of candidate bootstrapping estimators and a criterion function to automatically select the best estimator. The method aims to produce a policy-value estimate with a small estimation error.

The method estimates the utility of a decision-making policy.
It uses a dataset of state-action-reward-state combinations.
A set of candidate bootstrapping estimators is used.
A criterion function is employed to select the best estimator.
The method aims to produce a policy-value estimate with a small estimation error.

Potential Applications

This technology has potential applications in various fields, including:

Reinforcement learning algorithms
Artificial intelligence systems
Decision-making optimization
Policy evaluation in complex systems

Problems Solved

The technology addresses the following problems:

Evaluating the effectiveness of decision-making policies
Selecting the best bootstrapping estimator for policy evaluation
Reducing estimation errors in policy-value estimates

Benefits

The technology offers the following benefits:

Automated selection of the best estimator
Accurate estimation of policy values
Improved decision-making optimization
Enhanced performance of reinforcement learning algorithms

Original Abstract Submitted

A computer-implemented method is provided for policy evaluation. In the method, the utility of the given decision-making policy is estimated based on a dataset of state-action-reward-state tuples, a set of candidate bootstrapping estimators of the fitted Q-evaluation (FQE) algorithm, and a criterion function. The method automatically selects the best bootstrapping estimator from the candidates based on the criterion function and, when the criterion function is appropriately designed, it produces a good policy-value estimate such that the estimation error is small (below a threshold).

17546629. MODEL AGGREGATION FOR FITTED Q-EVALUATION simplified abstract (INTERNATIONAL BUSINESS MACHINES CORPORATION)

Contents

MODEL AGGREGATION FOR FITTED Q-EVALUATION

Organization Name

Inventor(s)