18627702. ONLINE SYSTEM WITH BANDIT FEATURE AND AUTO-REGRESSIVE TEMPORAL STRUCTURE (Massachusetts Institute of Technology)
ONLINE SYSTEM WITH BANDIT FEATURE AND AUTO-REGRESSIVE TEMPORAL STRUCTURE
Organization Name
Massachusetts Institute of Technology
Inventor(s)
Qinyi Chen of Cambridge MA (US)
Negin Golrezaei of Cambridge MA (US)
Djallel Bouneffouf of Poughkeepsie NY (US)
ONLINE SYSTEM WITH BANDIT FEATURE AND AUTO-REGRESSIVE TEMPORAL STRUCTURE
This abstract first appeared for US patent application 18627702 titled 'ONLINE SYSTEM WITH BANDIT FEATURE AND AUTO-REGRESSIVE TEMPORAL STRUCTURE
Original Abstract Submitted
A multi-armed bandit (MAB) problem is obtained and a per-round regret lower bound is determined, wherein a corresponding regret is measured against a benchmark. The multi-armed bandit problem is provided to an algorithm that has a per-round regret that is close to the determined per-round regret lower bound, wherein the algorithm dynamically adapts to changes and discards irrelevant past information by alternating between recently pulled arms and unpulled arms having potential, wherein the alternating comprises updating an estimate of an expected reward of each arm within each epoch and an estimate for an error bound that captures an amount of error contained in the estimate of the expected reward for each arm within each epoch based on the auto-regressive temporal structure with trend components, and restarting the algorithm.