Continuous state Q-learning



Journal Title

Journal ISSN

Volume Title


Texas Tech University


Q-learning is a solution technique developed to solve classical Markov Decision Processes, MDPs. Markov Decision Processes are models for sequential decision making problems and address many classical control problems. In Chapter I, this paper discusses the model and some standard solution techniques used in Markov Decision Processes and its limitations [6].

Q-learning was developed by Watkins to broaden the scope of problems that dynamic programming, MDP techniques, can solve. Classical Q-learning is a model free solution technique and is therefore able to address a variety of poorly modeled decision problems which were unsolvable using standard MDP techniques. Watkins development of Q-learning is based on Markov Decision Processes with discrete action and state spaces. The model and algorithm associated with classical Q-learning are described in Chapter II. To extend the set of problems which can be addressed using Q-learning, Chapter III addresses solution techniques for poorly modeled problems with continuous state and/or action spaces. The model is slightly altered and the algorithm is adjusted to account for the continuous state and action spaces. Numerical example show that continuous Q-learning does determine the optimal policy over time. Ongoing research is being carried on to improve both the current classical Q-learning method and to prove the convergence in the continuous case.



Learning models, Markov processes, Dynamic programming