What is cleaner for the environment: teaching an AI model or five cars?

Artificial intelligence area oftencompared with the oil industry: after extraction and processing, data, like oil, can become a very profitable commodity. However, it is now becoming clear that this metaphor is expanding. Like fossil fuels, deep learning has a huge environmental impact. In the new work, scientists from the University of Massachusetts at Amherst carried out an assessment of the life cycle of learning of several common large-scale models of artificial intelligence.

It turned out that as a result of this processemit more than 626,000 pounds (about 300,000 kg) in carbon dioxide equivalent, which is almost five times the emissions of a typical car in five years (including the production of the car itself).

How AI models are taught

This is an amazing quantitative definition of what researchers of artificial intelligence have long suspected.

“Although many of us think about it onthe abstract, blurry level, the figures show the scale of the problem, ”says Carlos Gomez-Rodriguez, an informatics specialist at the University of A Coruña in Spain, who did not participate in the study. "Neither I, nor the other researchers with whom I discussed them, thought that the impact on the environment would be so significant."

Coal footprint processing natural language

The work specifically considers the processlearning a model for natural language processing (NLP), an AI subfield that teaches human language machines. Over the past two years, the NLP community has reached several important stages in the field of machine translation, the completion of sentences and other standardized assessment tasks. The notorious OpenAI GPT-2 model, as an example, succeeded in writing compelling fake news stories.

But such achievements required more and more training.large models on stretched data sets from sentences pulled from the Internet. This approach is computationally costly and very energy intensive.

The researchers examined four models inAreas responsible for the biggest jumps in performance: Transformer, ELMo, BERT and GPT-2. They trained each of them on a single GPU during the day to measure power consumption.

Then they took the number of hours of training,specified in the source documents of the model, to calculate the total energy consumed for the entire learning process. This amount was converted to pounds equivalent of carbon dioxide, which corresponded to AWS's energy consumption structure from Amazon, the largest cloud service provider.

It turned out that computational and environmentaltraining costs grew in proportion to the size of the model, and then increased many times when the final accuracy of the model was adjusted. Finding a neural architecture that attempts to optimize a model by gradually changing the structure of a neural network through trial and error incurs extremely high costs with a small performance gain. Without it, the most expensive BERT model left a carbon footprint of 1,400 pounds (635 kg), which is close to a two-way trans-American flight.

Moreover, these figures should be considered only as baselines.

"Learning one model is the minimum amountthe work you can do, ”says Emma Strubell, lead author of the article. In practice, it is much more likely that AI researchers will develop a new model from scratch or adapt an existing one, which will require many more training and tuning cycles.

In general, according to scientists, the process of creating andtesting the final model, worthy of publication, required training 4789 models in six months. In terms of CO2 equivalent, this is about 35,000 kg.

The significance of these numbers is enormous, especially ifTake into account current trends in AI research. In general, research in the field of AI neglects efficiency, since large neural networks are considered useful for various tasks, and companies with unlimited computing resources will use them to gain competitive advantage.

But for the climate it will not be very good. Watch for neural networks in our Telegram channel.