Reinforcement Learning - CSE 8004

Lecture Slides

* Main references are:

Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. Vol. 1. No. 1. Cambridge: MIT press, 1998.
Sutton, Richard S., Doina Precup, and Satinder Singh. "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning." Artificial intelligence 112.1-2 (1999): 181-211.

Reference Books

Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. Vol. 1. No. 1. Cambridge: MIT press, 1998. (Second edition is in progress, but the draft can be found here.)

Related Papers

Da Silva, Bruno C., et al. "Dealing with non-stationary environments using context detection." Proceedings of the 23rd international conference on Machine learning. ACM, 2006.
Hadoux, Emmanuel, Aurélie Beynier, and Paul Weng. "Sequential decision-making under non-stationary environments via sequential change-point detection." Learning over Multiple Contexts (LMCE). 2014.
Jaulmes, Robin, Joelle Pineau, and Doina Precup. "Learning in non-stationary partially observable Markov decision processes." ECML Workshop on Reinforcement Learning in non-stationary environments. Vol. 25. 2005.
Choi, Samuel PM, Dit-Yan Yeung, and Nevin L. Zhang. "Hidden-mode markov decision processes for nonstationary sequential decision making." Sequence Learning. Springer, Berlin, Heidelberg, 2000. 264-287.
Choi, Samuel PM, Dit-Yan Yeung, and Nevin Lianwen Zhang. "An environment model for nonstationary reinforcement learning." Advances in neural information processing systems. 2000.

Sutton, Richard S., Doina Precup, and Satinder Singh. "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning." Artificial intelligence 112.1-2 (1999): 181-211.
Yücesoy, Yigit E., and M. Borahan Tümer. "Hierarchical Reinforcement Learning with Context Detection (HRL-CD)." International Journal of Machine Learning and Computing 5.5 (2015): 353.
Stolle, Martin, and Doina Precup. "Learning options in reinforcement learning." International Symposium on abstraction, reformulation, and approximation. Springer, Berlin, Heidelberg, 2002.
Botvinick, Matthew M., Yael Niv, and Andrew C. Barto. "Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective." Cognition 113.3 (2009): 262-280.
Şimşek, Özgür, and Andrew G. Barto. "Skill characterization based on betweenness." Advances in neural information processing systems. 2009.
McGovern, Amy, and Andrew G. Barto. "Automatic discovery of subgoals in reinforcement learning using diverse density." ICML. Vol. 1. 2001.

Taylor, Matthew E., and Peter Stone. "Transfer learning for reinforcement learning domains: A survey." Journal of Machine Learning Research 10.Jul (2009): 1633-1685.

Goncu, Burak, and M. Borahan Tümer. Reinforcement learning in non-stationary environments using spatiotemporal analysis (under review process)

Whiteson, Shimon Azariah. Adaptive representations for reinforcement learning. The University of Texas at Austin, 2007.

Moore, Andrew W., and Christopher G. Atkeson. "Prioritized sweeping: Reinforcement learning with less data and less time." Machine learning 13.1 (1993): 103-130.