Father of RL, Richard Sutton早已给过答案:R. S. Sutton, A. G. Barto and R. J. Williams, "Reinforcement learning is direct adaptive optimal control," inIEEE Control Systems Magazine, vol. 12, no. 2, pp. 19-22, April 1992.
doi: 10.1109/37.126844
本站所有内容均为互联网搜索引擎提供的公开搜索信息,本站不存储任何数据与内容,任何内容与数据均与本站无关,如有需要请联系相关搜索引擎包括但不限于百度,google,bing,sogou 等
© 2025 tinynews.org All Rights Reserved. 百科问答小站 版权所有