ONLINE SYSTEM WITH BANDIT FEATURE AND AUTO-REGRESSIVE TEMPORAL STRUCTURE

Organization Name

Inventor(s)

Djallel Bouneffouf of Poughkeepsie NY (US)

ONLINE SYSTEM WITH BANDIT FEATURE AND AUTO-REGRESSIVE TEMPORAL STRUCTURE

This abstract first appeared for US patent application 18627702 titled 'ONLINE SYSTEM WITH BANDIT FEATURE AND AUTO-REGRESSIVE TEMPORAL STRUCTURE

Original Abstract Submitted

A multi-armed bandit (MAB) problem is obtained and a per-round regret lower bound is determined, wherein a corresponding regret is measured against a benchmark. The multi-armed bandit problem is provided to an algorithm that has a per-round regret that is close to the determined per-round regret lower bound, wherein the algorithm dynamically adapts to changes and discards irrelevant past information by alternating between recently pulled arms and unpulled arms having potential, wherein the alternating comprises updating an estimate of an expected reward of each arm within each epoch and an estimate for an error bound that captures an amount of error contained in the estimate of the expected reward for each arm within each epoch based on the auto-regressive temporal structure with trend components, and restarting the algorithm.