Very good explainer from the Economist:
AlphaGo uses some of the same technologies as those older programs. But its big idea is to combine them with new approaches that try to get the computer to develop its own intuition about how to play—to discover for itself the rules that human players understand but cannot explain. It does that using a technique called deep learning, which lets computers work out, by repeatedly applying complicated statistics, how to extract general rules from masses of noisy data.
Deep learning requires two things: plenty of processing grunt and plenty of data to learn from. DeepMind trained its machine on a sample of 30m Go positions culled from online servers where amateurs and professionals gather to play. And by having AlphaGo play against another, slightly tweaked version of itself, more training data can be generated quickly.
Those data are fed into two deep-learning algorithms. One, called the policy network, is trained to imitate human play. After watching millions of games, it has learned to extract features, principles and rules of thumb. Its job during a game is to look at the board’s state and generate a handful of promising-looking moves for the second algorithm to consider.
This algorithm, called the value network, evaluates how strong a move is. The machine plays out the suggestions of the policy network, making moves and countermoves for the thousands of possible daughter games those suggestions could give rise to. Because Go is so complex, playing all conceivable games through to the end is impossible. Instead, the value network looks at the likely state of the board several moves ahead and compares those states with examples it has seen before. The idea is to find the board state that looks, statistically speaking, most like the sorts of board states that have led to wins in the past. Together, the policy and value networks embody the Go-playing wisdom that human players accumulate over years of practice.
As I write this, the score in the best-of-five games between AlphaGo and Lee Sedol, who is generally reckoned to be the world’s best player, stands at 2-nil in favour of AlphaGo.
LATER AlphaGo won the third match. Game over.