Adversary attacks: why is a neural network easy to fool?

In recent years, as the systemDeep learning is becoming more common, scientists have demonstrated how competitive samples can affect anything - from a simple image classifier to cancer diagnostic systems - and even create a life-threatening situation. Despite all their danger, however, competitive examples are poorly understood. And scientists are worried: can this problem be solved?

What is adversarial attack (controversialattack)? This is a way to deceive the neural network so that it gives an incorrect result. They are mainly used in scientific research to test the stability of models to non-standard data. But in real life, the example can be a change of several pixels in the image of a panda so that the neural network will be sure that there is a gibbon in the image. Although scientists only add to the image of "noise".

Adversary attack: how to fool a neural network?

New Work Massachusetts TechnologyThe institute indicates a possible way to overcome this problem. Having solved it, we could create much more reliable deep learning models that would be much more difficult to manipulate in malicious ways. But let's first take a look at the basics of the adversary samples.

As you know, the power of deep learningstems from the superior ability to recognize patterns (patterns, patterns, patterns, patterns) in the data. Feed the neural network tens of thousands of tagged photos of animals, and she will know which patterns are associated with the panda, and which - with the monkey. Then she will be able to use these patterns to recognize new images of animals she has never seen before.

But deep learning models are also very fragile. Because the image recognition system relies only on pixel patterns, and not on a more conceptual understanding of what it sees, it is easy to deceive it, to make it see something completely different - just by breaking patterns in a certain way. A classic example: add some noise to a panda image, and the system classifies it as a gibbon with almost 100 percent certainty. This noise will be a competitive attack.

For several years, scientists have observed thisa phenomenon, especially in computer vision systems, without really knowing how to get rid of such vulnerabilities. In fact, the work presented last week at a major conference devoted to the research of artificial intelligence - ICLR - calls into question the inevitability of competitive attacks. It may seem that no matter how many images of pandas you feed to the image classifier, there will always be a kind of indignation with which you break the system.

But the new work of MIT demonstrates that wethought wrong about competitive attacks. Instead of coming up with ways to collect more qualitative data that feeds the system, we need to fundamentally reconsider our approach to its training.

The work demonstrates this by revealing a ratherinteresting properties of competitive examples that help us understand the reason for their effectiveness. What is the trick: random, seemingly noise or stickers that confuse the neural network, in fact involve very pinpointed, barely noticeable patterns that the visualization system has learned to associate strongly with specific objects. In other words, the machine does not crash at the sight of a gibbon where we see a panda. In fact, she sees the regular arrangement of pixels, imperceptible to humans, which appeared much more often in pictures with gibbons than in pictures with pandas during training.

Scientists have demonstrated this experiment: they created a data set with images of dogs that were all modified in such a way that the standard image classifier mistakenly identified them as cats. Then they tagged these images with “cats” and used them to train a new neural network from scratch. After training, they showed neural networks real images of cats, and she correctly identified them all as cats.

The researchers suggested that in each setThere are two types of correlations in the data: patterns that actually correlate with the meaning of the data, like whiskers in snapshots with cats or fur colors on snapshots with pandas, and patterns that exist in the training data but do not apply to other contexts. These latter “misleading” correlations, let's call them as they are used in competitive attacks. A recognition system, trained to recognize “misleading” patterns, finds them and believes that it sees a monkey.

This tells us that if we want to eliminate the riskcompetitive attack, we need to change the way we train our models. Currently, we allow the neural network to choose the correlations that it wants to use to identify objects in the image. As a result, we cannot control the correlations that it finds, regardless of whether they are real or misleading. If, instead, we would train our models to remember only real patterns — which are tied to semantic pixels — in theory it would be possible to produce deep learning systems that could not be misled.

When scientists tested this idea using onlyreal correlations to train her model, they actually reduced her vulnerability: she succumbed to manipulation only 50% of cases, while the model trained on real and false correlations succumbed to manipulation in 95% of cases.

If you summarize, you can defend yourself against competitive attacks. But we need more research to eliminate them completely.

But then the neural network can not be "fooled." Is it good or bad? Tell us in our chat in Telegram.