An Overview of ADPRL

    by Derong Liu, derongliu@gmail.com

    
    

    TABLE OF CONTENTS

    1. Adaptive/Approximate Dynamic Programming
    2. Reinforcement Learning
    3. Remarks of ADPRL for Controls
    4. ADPRL Related Activities
    5. ADPRL Related Books

    
    

    1. Adaptive/Approximate Dynamic Programming

    The very first journal article published by Bellman on dynamic programming was the 1952 PNAS paper. He then published frequently every year on this topic until 1957, when his first book on dynamic programming [1] was published.

    • [1] R. Bellman, Dynamic Programming. Princeton, NJ: Princeton University Press, 1957.

    Not long after, it was recognized that DP involves too much computation due to the well-known "curse of dimensionality". That is to say, for a problem of moderate size, the computational complexity of the original dynamic programming approach cannot be handled by most computers (even nowadays). In other words, we really cannot implement true dynamic programming in practice when the problem size is big, and therefore, we need to approximate the solutions of dynamic programming. In a 1958 PNAS paper, Bellman himself presented a "successive approximation" method for dynamic programming, as the first work on approximate dynamic programming. In a 1967 TAC survey paper, Larson summarized several methods for approximating dynamic programming solutions, including value iteration (approximation in function space) and policy iteration (approximation in policy space), in addition to successive approximation, among others. Howard, Bellman, and Dreyfus [2, 3, 4] were the first to propose policy iteration in the early 1960's. In a 1977 PIEE paper on optimizing water resources, Jamshidi discussed limitations of several approximation techniques for DP, including successive approximation (Larson and Keckler), incremental dynamic programming [5] (Larson), and corridoring (Heidari, Chow, Kokotovic, and Meredith).

    • [2] R. A. Howard, Dynamic Programming and Markov Processes. New York: Wiley, 1960.
    • [3] R. Bellman, Adaptive Control Processes. Princeton, NJ: Princeton University Press, 1961.
    • [4] R. Bellman and S. Dreyfus, Applied Dynamic Programming. Princeton, NJ: Princeton University Press, 1962.
    • [5] R. E. Larson, State Increment Dynamic Programming. New York: Elsevier, 1968.

    The acronym "ADP" stands for either "adaptive dynamic programming" or "approximate dynamic programming." The term "adaptive dynamic programming" was probably mentioned for the first time in a 1975 paper published in The Quarterly Journal of Economics studying optimal solutions for consuming depletable natural resources. Then, "adaptive dynamic programming" was formally used in the title of a PhD thesis referenced by a 1977 MS paper on inventory control. Another 1976 REE paper also mentioned it for fault detection. These are the earliest mentioning of "adaptive dynamic programming" in the literature.

    In a 1987 SMC paper by Paul Werbos, he presented approximate dynamic programming approach for factory automation and brain research. His approach on solving dynamic programming problems was named "heuristic dynamic programming/dual heuristic programming" and was first outlined in a paper published in General Systems Yearbook back in 1977. Later, Werbos called the approach "adaptive critic designs" (ACD). See Chapter 3 in [6] and Chapter 13 in [7].

    • [6] W. T. Miller, R. S. Sutton, and P. J. Werbos, Editors, Neural Networks for Control. Cambridge, MA, USA: MIT Press, 1990.
    • [7] D. A. White and D. A. Sofge, Editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. New York: Van Nostrand Reinhold, 1992.

    Over the years, in addition to "adaptive/approximate dynamic programming" and "adaptive critic designs", related terms used in the literature have included

    • "Neuro-dynamic programming" (Bertsekas [8]),
    • "Asymptotic dynamic programming" (Saeks),
    • "Neural dynamic programming" (Si),
    • "Relaxing dynamic programming" (Rantzer).

    • [8] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.

    No matter what it is called (ADP, ACD, NDP, or RDP), in this case, we all try to approximate the solutions of dynamic programming. Because of this, a lot of people like to use the term "approximate dynamic programming", especially in management science. I use "adaptive dynamic programming" since I work on control applications. ADP has potential applications in many fields, including controls, management, logistics, aerospace, economy, military, games, ....... Many researchers have started to use the term "adaptive dynamic programming". In Google Scholar, it shows 18300 hits for "adaptive dynamic programming" now, compared to "approximate dynamic programming" with 27900 hits. The number of hits for ACD is 2570 and for NDP is 9522 which includes neuro-dynamic programming (7730), neurodynamic programming (823), and neural dynamic programming (969).

    The 2002 SMC paper by Murray et al. developed an adaptive dynamic programming algorithm for optimal control of continuous-time affine nonlinear systems, with the complete proof of its main theorem given later in 2003. Since then, more and more researchers joined the effort on using adaptive dynamic programming for control applications.

    2. Reinforcement Learning

    On the other hand, Barto and Sutton, who won the 2024 Turin Award, have led the study of reinforcement learning for developing the conceptual and algorithmic foundations. In a series of papers beginning in the 1980s, Barto and Sutton introduced the main ideas, constructed the mathematical foundations, and developed important algorithms for reinforcement learning - one of the most important approaches for creating intelligent systems. The first edition of their book on reinforcement learning was published in 1998 [9], with the second edition in 2018.

    • [9] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: The MIT Press, 1998, 1st Edition, 322 pages. 2018, 2nd Edition, 552 pages.

    The term "reinforcement learning" appeared in the literature as early as 1930s. Barto and his students began their work on reinforcement learning in the 1980s. One of their most important contributions was the SMC paper in 1983 (see Matlab codes here). They also had several papers to show the relationship between reinforcement learning and dynamic programming, such as the 1990 paper by Sutton and the 1995 paper by Barto et al.

    The two most widely cited RL algorithms are Temporal-Difference (TD) Learning by Sutton (1988) and Q-Learning by Watkins (1992). There are several survey papers on reinforcement learning published earlier, e.g., Kaelbling (1996), Gosavi (2009), and Littman (2015).

    3. Remarks of ADPRL for Controls

    After two years working at General Motors R&D Center in Warren, Michigan (thanks to Man-Feng Chang and Mark Costin), I found a teaching job at Stevens Institute of Technology (thanks to Professor Stanley Smith) in 1995. I started to look for a new research area to begin my academic career.

    A very promising research area called adaptive critic designs caught my attention! The two edited books [6, 7] were recommended by Paul Werbos. The 1997 TNN survey paper by Prokhorov and Wunch was a big help! They summarized all the major literature regarding adaptive critic designs up to 1997. I started to work on adaptive critic designs with PhD students. It was a surprise to find Fei-Yue Wang's earlier work published in 1992 and in 1994, respectively, which he claimed to have the papers written after a homework assignment from optimal control course at RPI covering dynamic programming. He presented some very interesting ideas on how to solve dynamic programming approximately. That paper also referenced an earlier paper by Leake and Ruey-Wen Liu published in 1967, on the construction of suboptimal control sequences for dynamic programming.

    Most of my recent papers have referenced the works of Frank Lewis (1, 2, 3) and Anders Rantzer (1, 2, 3). I have done a few applications in the past supported by the NSF (thanks to Paul Werbos) and General Motors (thanks to Hossein Javaherian) that include call admission control, engine control, and residential energy system control and management.

    In an early 1965 TAC paper, Waltz and K. S. Fu studied reinforcement learning for control systems. Jerry Mendel also had a paper on similar topic in 1970. It has been a long way to achieve what we have today.

    4. ADPRL Related Activities

    In 2002, a workshop was held in Mexico on learning and approximate dynamic programming. The US National Science Foundation sponsored the workshop by paying all expenses for everyone who attended the workshop. There were 29 researchers around the globe who came to the workshop. The main product of the workshop was an edited book - Handbook of Learning and Approximate Dynamic Programming [10], published in 2004. Bernard Widrow from Stanford also spoke at the workshop. Widrow was the one who coined the term "adaptive critics" earlier in the 1970's. More info about the 2002 workshop can be found at Grantome (Jennie Si).

    • [10] J. Si, A. G. Barto, W. B. Powell, and D. Wunsch, Editors, Handbook of Learning and Approximate Dynamic Programming. Piscataway, NJ: IEEE, 2004.

    Again in 2006, NSF sponsored another workshop in Mexico on approximate dynamic programming. This time, 42 researchers were invited to attend the workshop, including Dimitri Bertsekas. The main objective of the 2006 workshop included outreach to Mexican students and researchers. See Grantome (Jennie Si and Warren Powell).

    In the past many years, special sessions (invited sessions) have been organized at IJCNN/WCCI each year on topics related to ADPRL. Since 2007, IEEE started a symposium on ADP and RL and it is organized every two years until 2013. In 2007, the first ADPRL symposium was held in Honolulu, Hawaii and in 2009 the second one in Nashville, Tennessee. The symposium is part of the IEEE Symposium Series on Computational Intelligence. Since 2014, it became an annual event in winter.

    • IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, Apr. 2007, Honolulu, HI, USA. Chair: Derong Liu.
    • IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Mar. 2009, Nashville, TN, USA. Chair: Derong Liu.
    • IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Apr. 2011, Paris, France. Chair: Jagannathan Sarangapani.
    • IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Apr. 2013, Sinpapore. Chair: Marco Wiering.
    • IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Dec. 2014, Orlando, FL, USA. Chair: Huaguang Zhang.
    • IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Dec. 2015, Cape Town, South Africa. Chair: Madalina Drugang.
    • IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Dec. 2016, Athans, Greece. Chair: Dongbin Zhao.
    • IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Dec. 2017, Honolulu, HI. Chair: Dongbin Zhao.
    • IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Nov. 2018, Bengaluru, India. Chair: Jagannathan Sarangapani.
    • ......

    Still now, every year, there are special/invited sessions organized at IJCNN/WCCI on ADPRL.

    In July 2007, a special issue on Neural Networks for Feedback Control was published by the IEEE Trans. Neural Networks. There are a couple of papers on ADP and RL among the 20 papers published. In August 2008, a special issue on Adaptive Dynamic Programming and Reinforcement Learning in Feedback Control was published by IEEE Trans. Systems, Man and Cybernetics-B. The Guest editors were Frank Lewis, Derong Liu, and George Lendaris. In August 2011, a special issue on Approximate Dynamic Programming and Reinforcement Learning was published by Journal of Control Theory and Applications. In June 2018, a special issue on Deep Reinforcement Learning and Adaptive Dynamic Programming was published by IEEE Trans. Neural Networks and Learning Systems. The Guest editors were Dongbin Zhao, Derong Liu, Frank Lewis, Jose Principe, and Stefano Squartini.

    In 2008, a technical committee on ADPRL was formed within the IEEE Computational Intelligence Society. The founding chair of the committee was Derong Liu. Click here and here for info about the TC. In 2016, a technical committee on ADPRL was formed within the Chinese Association of Automation. The founding chair of the committee was also Derong Liu. Click here for info about the TC.

    In 2009, two survey articles on ADPRL were published, one by the IEEE CIS Magazine and one by the IEEE CAS Magazine. IEEE Control Systems Magazine published another survey paper in 2012. I have also written short articles introducing ADPRL, including an IEEE CIS Newsletter feature article (2005) and an ACTA Automatica Sinica review paper (2005). There were several other survey papers published on ADPRL since then, including IJAC (2015), IEEE Trans. Cybernetics (2017), IEEE Trans. SMC-Systems (2021), CAAI Artificial Intelligence Research (2022), and 自动化学报 (2025).

    5. ADPRL Related Books

    To the best of my knowledge, the following is a list of books published on ADPRL (please feel free to email me your book info).

    • [11] W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd Edition. Hoboken, NJ: Wiley, 2011. 1st Edition, 2007.
    • [12] L. Busoniu, R. Babuska, B. De Schutter, and D. Ernst, Reinforcement Learning and Dynamic Programming Using Function Approximators. Boca Raton, FL: CRC Press, 2010.
    • [13] 徐昕,增强学习与近似动态规划.北京:科学出版社,2010.
    • [14] D. P. Bertsekas, Dynamic Programming and Optimal Control: Approximate Dynamic Programming, Vol. II, 4th Edition. Belmont, MA: Athena Scientific, 2012.
    • [15] D. Vrabie, K. G. Vamvoudakis, and F. L. Lewis, Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles. London: IET, 2013.
    • [16] H. Zhang, D. Liu, Y. Luo, and D. Wang, Adaptive Dynamic Programming for Control: Algorithms and Stability. London: Springer-Verlag, 2013.
    • [17] F. L. Lewis and D. Liu, Editors, Approximate Dynamic Programming and Reinforcement Learning for Feedback Control. Hoboken, NJ: Wiley, 2013.
    • [18] 魏庆来,宋睿卓,孙秋野,迭代自适应动态规划理论及应用.北京:科学出版社,2015.
    • [19] D. Liu, Q. Wei, D. Wang, X. Yang, and H. Li, Adaptive Dynamic Programming with Applications in Optimal Control. Cham, Switzerland: Springer, 2017.
    • [20] Y. Jiang and Z.-P. Jiang, Robust Adaptive Dynamic Programming. Hoboken, NJ: Wiley, 2017.
    • [21] Q. Wei, R. Song, B. Li, and X. Lin, Self-Learning Optimal Control of Nonlinear Systems: Adaptive Dynamic Programming Approach. Singapore: Springer, 2018.
    • [22] D. P. Bertsekas, Reinforcement Learning and Optimal Control. Belmont, MA: Athena Scientific, 2019.
    • [23] R. Song, Q. Wei, and Q. Li, Adaptive Dynamic Programming: Single and Multiple Controllers. Singapore: Springer Nature, 2019.
    • [24] 王鼎, 不确定动态系统智能评判学习与控制.北京: 科学出版社, 2020.
    • [25] 魏庆来,王飞跃,强化学习.北京:清华大学出版社,2021.
    • [26] 杨永亮,基于强化学习的数据驱动优化控制方法.北京:科学出版社,2022.
    • [27] S. Jagannathan, V. Narayanan, and A. Sahoo, Optimal Event-Triggered Control Using Adaptive Dynamic Programming. Boca Raton, FL: CRC Press, 2023.
    • [28] B. Lian, W. Xue, F. L. Lewis, H. Modares, and B. Kiumarsi, Integral and Inverse Reinforcement Learning for Optimal Control Systems and Games. Cham, Switzerland: Springer, 2024.
    • [29] 王鼎, 赵明明, 哈明鸣, 任进, 智能控制与强化学习:先进值迭代评判设计.北京: 人民邮电出版社, 2024.
    • [30] 赵冬斌,朱圆恒,唐振韬,邵坤,游戏人工智能方法.北京:科学出版社,2024.
    • [31] Q. Wei, R. Song, and H. Li, Iterative Adaptive Dynamic Programming for Self-Learning Optimal Control. Singapore: World Scientific, 2025.

    We are now working on a brochure for ADPRL. Take a first glance of the brochure's cover page.

    Click here to go back.