17546629. MODEL AGGREGATION FOR FITTED Q-EVALUATION simplified abstract (INTERNATIONAL BUSINESS MACHINES CORPORATION)

From WikiPatents
Jump to navigation Jump to search

MODEL AGGREGATION FOR FITTED Q-EVALUATION

Organization Name

INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor(s)

KOHEI Miyaguchi of Tokyo (JP)

MODEL AGGREGATION FOR FITTED Q-EVALUATION - A simplified explanation of the abstract

This abstract first appeared for US patent application 17546629 titled 'MODEL AGGREGATION FOR FITTED Q-EVALUATION

Simplified Explanation

The abstract describes a computer-based method for evaluating decision-making policies. The method estimates the effectiveness of a given policy by analyzing a dataset of state-action-reward-state combinations. It uses a set of candidate bootstrapping estimators and a criterion function to automatically select the best estimator. The method aims to produce a policy-value estimate with a small estimation error.

  • The method estimates the utility of a decision-making policy.
  • It uses a dataset of state-action-reward-state combinations.
  • A set of candidate bootstrapping estimators is used.
  • A criterion function is employed to select the best estimator.
  • The method aims to produce a policy-value estimate with a small estimation error.

Potential Applications

This technology has potential applications in various fields, including:

  • Reinforcement learning algorithms
  • Artificial intelligence systems
  • Decision-making optimization
  • Policy evaluation in complex systems

Problems Solved

The technology addresses the following problems:

  • Evaluating the effectiveness of decision-making policies
  • Selecting the best bootstrapping estimator for policy evaluation
  • Reducing estimation errors in policy-value estimates

Benefits

The technology offers the following benefits:

  • Automated selection of the best estimator
  • Accurate estimation of policy values
  • Improved decision-making optimization
  • Enhanced performance of reinforcement learning algorithms


Original Abstract Submitted

A computer-implemented method is provided for policy evaluation. In the method, the utility of the given decision-making policy is estimated based on a dataset of state-action-reward-state tuples, a set of candidate bootstrapping estimators of the fitted Q-evaluation (FQE) algorithm, and a criterion function. The method automatically selects the best bootstrapping estimator from the candidates based on the criterion function and, when the criterion function is appropriately designed, it produces a good policy-value estimate such that the estimation error is small (below a threshold).