AI company DeepMind not cope with school math test

In the media latelyvery often there are flashes of the rapid development of artificial intelligence technology. Moreover, in some areas, AI is already able to demonstrate outstanding success and to some extent even superiority over man. There is no need to go far for examples. The site wrote more than once about how the AI ​​defeats a man in the go board game of logic, chess, and more recently showed excellent progress in the StarCraft computer strategy game. In fact, there are many more such examples, and they are not necessarily related to the entertainment disciplines.

A simple man in the street (a person not connected withIT-sphere) it may seem that a real, “big” artificial intelligence is about to appear, about which I write fiction writers and make films. But in fact, not so rosy. The online repository of scientific papers arXiv has an article entitled “Analyzing Mathematical Reasoning Abilities of Neural Models”, which tells how DeepMind’s artificial intelligence failed to cope with the standard mathematical test that British high school students usually pass, reports

The reasons for the failure can be explained easily. Thus, a person in solving mathematical problems involves the following abilities and capabilities:

  • Modifies for itself characters in essence, such as numbers, arithmetic operators, variables (which together form functions) and words (defining the question, the meaning of the problem);
  • Conducts planning (for example, ranking functions in the order required to solve a mathematical problem);
  • Uses auxiliary algorithms for composing functions (addition, multiplication);
  • Uses short-term memory to store intermediate values ​​(for example, h (f (x)));
  • Applies to the practice of previously obtained knowledge of the rules, transformations, processes and axioms.

DeepMind trained and tested your AI onselection of various types of mathematical problems and problems. The developers did not use crowdsourcing, instead they synthesized a data set to generate a large number of test problems, control their level of complexity, etc. The development team used the “free form” text data format.

The data were based on tasks from sets of tasks for British schoolchildren under the age of 16 years. Tasks were taken from areas such as arithmetic, algebra, probability theory, and others.

When choosing a neural network architecture to solvemath problems, the DeepMind team focused on LSTM (long short-term memory) and Transformer (neural network architecture for working with sequences). The specialists tested two LSTM models for working with mathematical problems: simple LSTM and Attentional LSTM. His scheme of work is shown in the figure below.

Attentional LSTM Architecture

Transformer Architecture Model

In the article describing the results of thisThe research notes that these results were not very good. Artificial intelligence was able to cope with only 35 percent of the tasks (out of 40 tasks submitted), giving them the correct answers. By the standards of any school - unsatisfactory.

You can discuss the news in our Telegram-chat.