A Learning agent with the ability to wait or terminate a process for maximizing reward.
— Sharoff (@onsharoff) March 16, 2020
- with theoretical regret bound guarantees!
- with empirical performance comparison
My work is now available in arXiv !!! give a read https://t.co/ZD1Qd83YjT