"Bitter lesson": the scientist said that 70 years in the field of AI research were spent almost in vain

The biggest lesson to learn from 70years of AI research, lies in the fact that general methods using computation ultimately prove to be the most effective - and with a large margin. The ultimate cause of this is Moore's law. Or rather, its generalization: the continuing, exponential depreciation of computational processors. About this "bitter lesson" said Richard Sutton, a Canadian computer scientist. Next - from the first person.

Why has artificial intelligence research been at an impasse for 70 years?

Most research is artificialthe intellects were carried out as if the calculations available to the agent were constant (and in this case the use of human knowledge would be one of the only ways to increase productivity). But after a while - much more than is needed for a typical research project - inevitably much more calculations become available. In the search for improvements that can help in the short term, scientists are trying to use the maximum of human knowledge in this area, but the only thing that matters in the long run is the increasing use of computation. These two aspects should not go against each other, but in practice go. The time spent on one of them is not equal to the time spent on the other. There are psychological obligations to invest in this or that approach. And a knowledge-based approach tends to complicate methods in such a way that they become less suitable for taking advantage of common methods that use computation.

Conclusion: you need to immediately reject the attempt to solve the AI ​​problem with the “head”, because time will pass and it will be solved much faster and easier - due to the increase in computing power

There were many examples when AI researchers belatedly understood this bitter lesson. It will be instructive to consider some of the most outstanding examples.

In computer chess, the methods that wonworld champion Kasparov in 1997, were based on a massive, deep search. At that time, most computer chess researchers were anxious about them, who used methods based on the human understanding of the particular structure of chess. When a simpler, search-based approach with special hardware and software turned out to be much more efficient, researchers pushing from a human understanding of chess did not recognize defeat. They said: “This time the approach of brute force may have won, but it will not become a common strategy and certainly people do not play chess in this way. These scientists wanted methods based on human input to win, and very disappointed when this did not happen.

Conclusion: simple brute force of calculations will take its, sooner or later

A similar picture of progress in researchwas seen in computer go, only with a delay of another 20 years. Initially, tremendous efforts were made to avoid searching using human knowledge or game features, but all these efforts were unnecessary or even worse once the search was applied efficiently and on a large scale. It was also important to use training in the process of independent play to learn the value function (as it was in many other games and even in chess, only learning did not play a large role in the 1997 program, which for the first time beat the world champion). Learning to play with yourself, learning in general, is like a search that allows you to apply huge amounts of computing. Search and training are the two most important classes of technicians that use huge amounts of computation in AI research. In computer go, as in computer chess, the initial efforts of researchers were aimed at using human understanding (so as to use less search), and only much later much greater success was achieved through the use of search and training.

Conclusion: search and training, powered by computational power, far exceed attempts to solve the problem by “non-standard approach of thinking”

In the field of speech recognition in the 1970s,competition sponsored by DARPA. Participants presented various methods that used the benefits of human knowledge — knowledge of words or phonemes, the human vocal tract, and so on. On the other side of the barricades were newer methods, statistical in nature and performing more calculations based on hidden Markov models (HMM). And again, statistical methods won methods based on human knowledge. This led to major changes in the whole processing of natural language, gradually being introduced over the course of decades, until eventually statistics and calculations began to dominate this area. The recent growth of deep learning in speech recognition is the very last step in this consistent direction. In-depth training methods rely even less on human knowledge and use even more computation, along with training on huge sample sets, and produce awesome speech recognition systems.

Richard Sutton, Canadian computer scientist

As in games, scientists have always tried to createthe systems that would work as they imagined in their heads — they tried to put this knowledge into their systems — but it all came out extremely unproductively, the scientists simply spent time until — as a result of Moore’s law — more massive calculations were available and they found beautiful application.

Conclusion: the same mistake has been repeated for decades

A similar picture was in the field of computerview. The first methods were perceived as a search for certain contours, generalized cylinders, or using the possibilities of SIFT (scale-invariant transformation of features). But today it all thrown into the furnace. Modern deep-learning neural networks use only the concept of convolution and certain invariants and work much better.

This is a great lesson.

Wherever we look, we are everywherecontinue to make the same mistakes. To see this and effectively overcome it, you need to understand why these mistakes are so attractive. We have to learn a bitter lesson that building how we think, starting from how we think, will not work in the long run. A bitter lesson based on historical observations shows that: 1) AI researchers often tried to build knowledge into their agents; 2) it always helped in the short term and brought satisfaction to scientists; 3) but in the long run, everything reached an impasse and retarded further progress; 4) breakthrough progress inevitably came with the use of the opposite approach, based on the scaling of calculations through search and training. Success was a bitter taste and was often not fully assimilated, because it was the success of calculations, and not the success of human-centered approaches.

From this bitter lesson one should learn: the tremendous power of general-purpose methods, methods that continue to scale with the growth of calculations, even when the available calculations become very large. Two methods that seem to scale arbitrarily in this way are search and training.

The second thing that should be learned from this bitterthe lesson consists in the fact that the actual content of the mind is extremely and unreasonably complex; we should stop trying to find simple ways to comprehend the contents of the mind, similar to the simple ways of understanding space, objects, multiple agents or symmetries. All of them are part of an arbitrarily complex external world. We should not try to make a start from them, because their complexity is infinite; we should build on meta-methods that can find and capture this arbitrary complexity. These methods can find good approximations, but their search should be carried out by our methods, not by us. We need AI agents that can open just like us, and not contain what we discovered. Building on our discoveries only complicates the process of discovery and search.

Conclusion: you need to trust the calculations, and not trytrace human thoughts and attempts to explain complex discovery and search methods with simple diagrams; in the long run, the first will work, not the last.

Discuss the bitter lesson of AI researchers can be in our channel in Telegram.