Reinforcement Learning with Information Retrieval Feedback

Organization Name

Inventor(s)

Reinforcement Learning with Information Retrieval Feedback

This abstract first appeared for US patent application 18348687 titled 'Reinforcement Learning with Information Retrieval Feedback

Original Abstract Submitted

In one example aspect, the present disclosure provides an example computer-implemented method for generating feedback signals for training a machine-learned agent model. The example method can include obtaining an output of a machine-learned agent model, the output including a next state feature generated by the machine-learned agent model based on a sequence of preceding states. The example method can include processing, using a machine-learned reward model, the output and the sequence of preceding states to generate a quality indicator indicating a quality of the next state feature in view of the preceding states. The machine-learned reward model could be trained by retrieving reference data from a reference data source and computing one or more quality indicators in view of a respective training input and output(s), and the reference data. The example method can include outputting the quality indicator to a model trainer for updating the machine-learned agent model.