Not long after, it was recognized that DP involves too much computation due to the well-known "curse of dimensionality". That is to say, for a problem of moderate size, the computational complexity of the original dynamic programming approach cannot be handled by most computers (even nowadays). In other words, we really cannot implement true dynamic programming in practice when the problem size is big, and therefore, we need to approximate the solutions of dynamic programming. In a 1958 PNAS paper, Bellman himself presented a "successive approximation" method for dynamic programming, as the first work on approximate dynamic programming. In a 1967 TAC survey paper, Larson summarized several methods for approximating dynamic programming solutions, including value iteration (approximation in function space) and policy iteration (approximation in policy space), in addition to successive approximation, among others. Howard, Bellman, and Dreyfus [2, 3, 4] were the first to propose policy iteration in the early 1960's. In a 1977 PIEE paper on optimizing water resources, Jamshidi discussed limitations of several approximation techniques for DP, including successive approximation (Larson and Keckler), incremental dynamic programming [5] (Larson), and corridoring (Heidari, Chow, Kokotovic, and Meredith).
The acronym "ADP" stands for either "adaptive dynamic programming" or "approximate dynamic programming." The term "adaptive dynamic programming" was probably mentioned for the first time in a 1975 paper published in The Quarterly Journal of Economics studying optimal solutions for consuming depletable natural resources. Then, "adaptive dynamic programming" was formally used in the title of a PhD thesis referenced by a 1977 MS paper on inventory control. Another 1976 REE paper also mentioned it for fault detection. These are the earliest mentioning of "adaptive dynamic programming" in the literature.
In a 1987 SMC paper by Paul Werbos, he presented approximate dynamic programming approach for factory automation and brain research. His approach on solving dynamic programming problems was named "heuristic dynamic programming/dual heuristic programming" and was first outlined in a paper published in General Systems Yearbook back in 1977. Later, Werbos called the approach "adaptive critic designs" (ACD). See Chapter 3 in [6] and Chapter 13 in [7].
Over the years, in addition to "adaptive/approximate dynamic programming" and "adaptive critic designs", related terms used in the literature have included
No matter what it is called (ADP, ACD, NDP, or RDP), in this case, we all try to approximate the solutions of dynamic programming. Because of this, a lot of people like to use the term "approximate dynamic programming", especially in management science. I use "adaptive dynamic programming" since I work on control applications. ADP has potential applications in many fields, including controls, management, logistics, aerospace, economy, military, games, ....... Many researchers have started to use the term "adaptive dynamic programming". In Google Scholar, it shows 18300 hits for "adaptive dynamic programming" now, compared to "approximate dynamic programming" with 27900 hits. The number of hits for ACD is 2570 and for NDP is 9522 which includes neuro-dynamic programming (7730), neurodynamic programming (823), and neural dynamic programming (969).
The 2002 SMC paper by Murray et al. developed an adaptive dynamic programming algorithm for optimal control of continuous-time affine nonlinear systems, with the complete proof of its main theorem given later in 2003. Since then, more and more researchers joined the effort on using adaptive dynamic programming for control applications.
The term "reinforcement learning" appeared in the literature as early as 1930s. Barto and his students began their work on reinforcement learning in the 1980s. One of their most important contributions was the SMC paper in 1983 (see Matlab codes here). They also had several papers to show the relationship between reinforcement learning and dynamic programming, such as the 1990 paper by Sutton and the 1995 paper by Barto et al.
The two most widely cited RL algorithms are Temporal-Difference (TD) Learning by Sutton (1988) and Q-Learning by Watkins (1992). There are several survey papers on reinforcement learning published earlier, e.g., Kaelbling (1996), Gosavi (2009), and Littman (2015).
A very promising research area called adaptive critic designs caught my attention! The two edited books [6, 7] were recommended by Paul Werbos. The 1997 TNN survey paper by Prokhorov and Wunch was a big help! They summarized all the major literature regarding adaptive critic designs up to 1997. I started to work on adaptive critic designs with PhD students. It was a surprise to find Fei-Yue Wang's earlier work published in 1992 and in 1994, respectively, which he claimed to have the papers written after a homework assignment from optimal control course at RPI covering dynamic programming. He presented some very interesting ideas on how to solve dynamic programming approximately. That paper also referenced an earlier paper by Leake and Ruey-Wen Liu published in 1967, on the construction of suboptimal control sequences for dynamic programming.
Most of my recent papers have referenced the works of Frank Lewis (1, 2, 3) and Anders Rantzer (1, 2, 3). I have done a few applications in the past supported by the NSF (thanks to Paul Werbos) and General Motors (thanks to Hossein Javaherian) that include call admission control, engine control, and residential energy system control and management.
In an early 1965 TAC paper, Waltz and K. S. Fu studied reinforcement learning for control systems. Jerry Mendel also had a paper on similar topic in 1970. It has been a long way to achieve what we have today.
Again in 2006, NSF sponsored another workshop in Mexico on approximate dynamic programming. This time, 42 researchers were invited to attend the workshop, including Dimitri Bertsekas. The main objective of the 2006 workshop included outreach to Mexican students and researchers. See Grantome (Jennie Si and Warren Powell).
In the past many years, special sessions (invited sessions) have been organized at IJCNN/WCCI each year on topics related to ADPRL. Since 2007, IEEE started a symposium on ADP and RL and it is organized every two years until 2013. In 2007, the first ADPRL symposium was held in Honolulu, Hawaii and in 2009 the second one in Nashville, Tennessee. The symposium is part of the IEEE Symposium Series on Computational Intelligence. Since 2014, it became an annual event in winter.
Still now, every year, there are special/invited sessions organized at IJCNN/WCCI on ADPRL.
In July 2007, a special issue on Neural Networks for Feedback Control was published by the IEEE Trans. Neural Networks. There are a couple of papers on ADP and RL among the 20 papers published. In August 2008, a special issue on Adaptive Dynamic Programming and Reinforcement Learning in Feedback Control was published by IEEE Trans. Systems, Man and Cybernetics-B. The Guest editors were Frank Lewis, Derong Liu, and George Lendaris. In August 2011, a special issue on Approximate Dynamic Programming and Reinforcement Learning was published by Journal of Control Theory and Applications. In June 2018, a special issue on Deep Reinforcement Learning and Adaptive Dynamic Programming was published by IEEE Trans. Neural Networks and Learning Systems. The Guest editors were Dongbin Zhao, Derong Liu, Frank Lewis, Jose Principe, and Stefano Squartini.
In 2008, a technical committee on ADPRL was formed within the IEEE Computational Intelligence Society. The founding chair of the committee was Derong Liu. Click here and here for info about the TC. In 2016, a technical committee on ADPRL was formed within the Chinese Association of Automation. The founding chair of the committee was also Derong Liu. Click here for info about the TC.
In 2009, two survey articles on ADPRL were published, one by the IEEE CIS Magazine and one by the IEEE CAS Magazine. IEEE Control Systems Magazine published another survey paper in 2012. I have also written short articles introducing ADPRL, including an IEEE CIS Newsletter feature article (2005) and an ACTA Automatica Sinica review paper (2005). There were several other survey papers published on ADPRL since then, including IJAC (2015), IEEE Trans. Cybernetics (2017), IEEE Trans. SMC-Systems (2021), CAAI Artificial Intelligence Research (2022), and 自动化学报 (2025).
To the best of my knowledge, the following is a list of books published on ADPRL (please feel free to email me your book info).
We are now working on a brochure for ADPRL. Take a first glance of the brochure's cover page.