Latest
Email Bulletin
Video
- 2016
- 2017
- 2018
- 2019
- 2020
- 2021
- 2022
- 2023
- 2024
- 2025
- 2026
Events
About

Reinforcement Learning Progress

Sam Altman ·Sam Altman ·25 June 2018

Today, OpenAI released a new result . We used PPO (Proximal Policy Optimization), a general reinforcement learning algorithm invented by OpenAI, to train a team of 5 agents to play Dota and beat semi-pros. This is the game that to me feels closest to the real world and complex decision making (combining strategy, tactics, coordinating, and real-time action) of any game AI had made real progress against so far. The agents we train consistently outperform two-week old agents with a win rate of 90-95%. We did this without training on human-played games—we did design the reward functions, of course, but the algorithm figured out how to play by training against itself. This is a big deal because it shows that deep reinforcement learning can solve extremely hard problems whenever you can throw e

Read the full article at Sam Altman →