Father of RL, Richard Sutton早已给过答案:R. S. Sutton, A. G. Barto and R. J. Williams, "Reinforcement learning is direct adaptive optimal control," inIEEE Control Systems Magazine, vol. 12, no. 2, pp. 19-22, April 1992.
doi: 10.1109/37.126844