General

You immediately realize that before you porn. Does the computer understand?

At the beginning of last month, Tumblr announced thatwill ban porn. When the new content policy came into force, after about two weeks - December 17 - it became obvious that there would be problems. After deploying an artificial intelligence system that was supposed to banish all pornography on the site, she mistakenly tagged innocent posts in 455.4 million blogs on the site among 168.2 billion posts: vases, witches, fish, and so on.

Pornography for artificial intelligence

Although it is not clear which automatic filterused Tumblr or created your own - the company did not respond to requests on this topic - it is obvious that the social network is stuck between its own policies and technologies. For example, the site’s inconsistent position with regard to “women showing nipples” and artistic nudity, for example, led to contextual decisions that demonstrate that even Tumblr does not know what to prohibit on the platform. How does a private company determine what it considers obscene?

First, block risky contentdifficult, because initially it is difficult to determine what it is. The definition of obscenity is a bear trap, which is more than a hundred years old, back in 1896, the United States first passed laws regulating obscenity. In 1964, in the case of Jacobellis v. Ohio, regarding whether Ohio can prohibit showing a film by Louis Malle, the Supreme Court gave, perhaps, the most famous definition of hard pornography for today: as I understand it, will be included in the shorthand description; and maybe I can never make it intelligible, ”said Judge Potter Stewart. “But I know what it is when I see it, and the movie associated with this case is not that.”

The machine learning algorithms have the same problem. This problem is trying to solve Brian Delorge, CEO of Picnix, a company that sells specialized technology of artificial intelligence. One of their products, Iris, is a client-side application that detects pornography to “help people,” as Delorge says, “who don't want porn in their lives.” He notes that a separate problem with porn is that there can be anything, a bunch of different things - and images that are not pornographic can have similar elements. The image of a party on the beach can be blocked, not because it has more skin than on the photo in the office, but because it is on the verge. “That's why it’s very difficult to train an image recognition algorithm all at once,” says Delorge. "When the definition becomes difficult for people, the computer also faces difficulties." If people cannot agree on what porn is and what is not, can a computer even hope to know the difference?

In order to teach the AI ​​to detect porn,the first thing you need to do is feed him porn. A lot of pornography. Where to get it? Well, the first thing people do is download a bunch of vidos from Pornhub, XVideos, says Dan Shapiro, co-founder of the startup Lemay.ai, who creates AI filters for his clients. "This is one of those gray areas of a legal nature - for example, if you are learning from other people's content, does it belong to you?"

After programmers download tonsporn, they cut out from the video frames, which are not pornography, to make sure that the frames used do not lead to the blocking of pizza peddlers. Platforms pay people, for the most part, outside the US, for tagging such content; work is low-paid and boring, as if introducing "captcha". They just sit and say: this is porn, this is this. Filter has a little, because all porn comes with a label. Training is better if you use not just photographs, but large data samples.

"Often it’s not just filtering porn,but rather a companion material, ”says Shapiro. "Like fake profiles with a photo of a girl and a telephone." He means sex workers looking for clients, but it could be anything, not entirely legal. “This is not porn, but you don’t want to watch these kind of things on your platform, right?” A good automated moderator learns millions — if not tens of millions — of examples of content, which means it can save a lot of man-hours.

"You can compare this with the difference between a child andadults, ”says Matt Zeiler, CEO and founder of Clarifai, a computer vision startup who does this kind of image filtering for corporate clients. “I can tell you for sure - a couple of months ago we had a child. They do not know anything about the world, everything is new for them. ” We have to show the child (algorithm) a lot of things to make it clear. “Millions and millions of examples. But as adults, when we created so much context about the world and understood how it works, we can learn something new from just a couple of examples. ” (Yes, teaching AI to filter adult content is like showing a lot of porn to a child). Companies like Clarifai are growing fast today. They have a good database of the world, they can distinguish dogs from cats dressed from naked. Zeiler’s company uses its models to train new algorithms for its customers — since the original model has processed a lot of data, personal versions will require only new data sets to work.

However, the algorithm is difficult to do everythingright. With content that is obviously pornographic, he does well; but the classifier may incorrectly mark the underwear ad as forbidden, because the picture has more skin than, say, in the office. (With bikini and underwear, according to Zeiler, it is very difficult). This means that marking people should focus on these extreme cases in their work, giving priority to the fact that it is difficult to classify models.

And what is the hardest thing?

"Anime porn," says Zayler. "The first version of our nudity detector did not use cartoon pornography for training." Many times the AI ​​miscalculated because hentai did not recognize. “Having worked on this for the client, we introduced a bunch of his data into the model and significantly improved the accuracy of the filter of animated images, while maintaining the accuracy of real photos,” says Zayler.

Technology that is taught to sniff out porn,can be used on other things. The technologies underlying this system are surprisingly flexible. This is more than anime tits. Alphabet jigsaw, for example, is widely used as an automatic comment moderator in a newspaper. This software works similarly to image classifiers, except that it sorts by toxicity, not nudity. (Toxicity in text comments is as difficult to determine as pornography in the pictures). Facebook uses similar automatic filtering to detect suicidal messages and content related to terrorism, and he tried to use this technology to detect fake news on his massive platform.

It all still depends on humansupervision; we better deal with ambiguity and ambiguous context. Zeiler says he does not think that his product has deprived someone of work. It solves the problem of scaling the Internet. People will still teach the AI, sorting and labeling the content so that the AI ​​can distinguish it.

This is the future of moderation: individual, ready-made solutions provided by companies that make their entire business to train more and more advanced classifiers for more data. In the same way as Stripe and Square offer ready-made payment solutions for enterprises that do not want to process them on their own, startups like Clarifai, Picnix and Lemay.ai will implement online moderation.

Dan Shapiro from Lemay.ai is full of hope. “As with any other technology, it is still in the process of invention. Therefore, I do not think that we will yield in case of failure. ” But can the AI ​​ever act autonomously without human supervision? Unclear. "There is no little man in a snuff box that filters every picture," he says. "We need to get data from everywhere to train the algorithm on them."

Zeiler, on the other hand, thinks that onceartificial intelligence will moderate everything on their own. In the end, the number of interventions on the part of people will be reduced to zero, or to insignificant efforts. Gradually, the efforts of the person will turn into what the AI ​​cannot do now, like high-level reasoning, self-awareness — everything that people have.

Recognizing pornography is part of it. Identification is a relatively trivial task for people, but it is much more difficult to train an algorithm to recognize nuances. Determining the threshold when a filter marks an image as pornographic or non-pornographic is also a difficult task, partly mathematical.

Artificial intelligence is imperfect.the mirror of how we see the world, just as pornography is a reflection of what happens between people when they are alone. There is some truth in it, but the full picture is not.

Do not forget to subscribe to our news feed.