17546629. MODEL AGGREGATION FOR FITTED Q-EVALUATION simplified abstract (INTERNATIONAL BUSINESS MACHINES CORPORATION)
MODEL AGGREGATION FOR FITTED Q-EVALUATION
Organization Name
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor(s)
MODEL AGGREGATION FOR FITTED Q-EVALUATION - A simplified explanation of the abstract
This abstract first appeared for US patent application 17546629 titled 'MODEL AGGREGATION FOR FITTED Q-EVALUATION
Simplified Explanation
The abstract describes a computer-based method for evaluating decision-making policies. The method estimates the effectiveness of a given policy by analyzing a dataset of state-action-reward-state combinations. It uses a set of candidate bootstrapping estimators and a criterion function to automatically select the best estimator. The method aims to produce a policy-value estimate with a small estimation error.
- The method estimates the utility of a decision-making policy.
- It uses a dataset of state-action-reward-state combinations.
- A set of candidate bootstrapping estimators is used.
- A criterion function is employed to select the best estimator.
- The method aims to produce a policy-value estimate with a small estimation error.
Potential Applications
This technology has potential applications in various fields, including:
- Reinforcement learning algorithms
- Artificial intelligence systems
- Decision-making optimization
- Policy evaluation in complex systems
Problems Solved
The technology addresses the following problems:
- Evaluating the effectiveness of decision-making policies
- Selecting the best bootstrapping estimator for policy evaluation
- Reducing estimation errors in policy-value estimates
Benefits
The technology offers the following benefits:
- Automated selection of the best estimator
- Accurate estimation of policy values
- Improved decision-making optimization
- Enhanced performance of reinforcement learning algorithms
Original Abstract Submitted
A computer-implemented method is provided for policy evaluation. In the method, the utility of the given decision-making policy is estimated based on a dataset of state-action-reward-state tuples, a set of candidate bootstrapping estimators of the fitted Q-evaluation (FQE) algorithm, and a criterion function. The method automatically selects the best bootstrapping estimator from the candidates based on the criterion function and, when the criterion function is appropriately designed, it produces a good policy-value estimate such that the estimation error is small (below a threshold).