Will neural networks kill the Internet?

Based on materials The verge

In recent months, bad omens have been accumulating at an alarming rate. Google

trying to kill 10 blue links.Twitter* is left to the mercy of bots and blue ticks. Amazon is cluttered and TikTok is crap. Layoffs drain the lifeblood of online media. The job posting for "AI Editor" calls for "200 to 250 articles per week" output. ChatGPT is used to create entire spam sites. Etsy is flooded with garbage created by artificial intelligence. Chatbots quote each other, creating a misinformation ouroboros that bites its own tail. LinkedIn* uses artificial intelligence to stir up tired users. Snapchat and Instagram* hope that bots will talk to you when your friends are not up to it. Reddit users stage blackouts. Stack Overflow moderators are on strike. AI gutting Wikipedia. The old Internet is dying, and the new one is struggling to be born.

Of course, the Internet dies all the time.It's been dying for years, with apps stealing traffic from websites and algorithms rewarding fast attention shifts. But in 2023, he dies again - and, as follows from the above, a new factor comes into play. This is artificial intelligence.

In the most general terms, the problem isin what. Many years ago, the Internet was a place where people did things. They created homepages, forums, and mailing lists, and made little money along the way. The companies then decided that they could do it better. They created convenient and multifunctional platforms and opened their doors to everyone. It's like they put boxes in front of us, and we filled those boxes with text and images, and people come to see the contents of those boxes. Companies were chasing scale because when there are a lot of people going somewhere, there is usually a way to make money from them. But AI changes those premises.

Given money and computing resources, systemsartificial intelligence - especially generative models that are now fashionable - are easily scalable. They produce text and images in abundance, and music and video will soon be. Their products have the potential to surpass the platforms we're used to looking for news, information and entertainment. But the quality of these systems often does not stand up to criticism, and they are built in such a way that so far they can only parasitize on the network. These models are trained on datasets laid down during the last age of the Internet, they recreate them, and they don't do it perfectly. Companies are extracting information from the open Internet and turning it into machine-made content that is cheap to create but less reliable. That product then competes for attention with the platforms and people that came before it. Sites and users have to reckon with these changes and try to figure out how to adapt to this and whether it can even be done.

Google reformatted search by placing AI-generated results above sources

In recent months, discussions and experiments onSome of the most popular and useful sites on the Internet - such as Reddit, Wikipedia, Stack Overflow and Google itself - have revealed the tension caused by the advent of artificial intelligence systems.

Reddit moderators blackout afterthe company's announcement that it will dramatically increase the fees for accessing its API. And company executives say the changes are (in part) a response to AI companies scraping data. “The Reddit dataset is really valuable,” Reddit founder and CEO Steve Huffman told The New York Times. “But we shouldn’t be giving away all these values ​​to some of the world’s largest companies for free.” That's not the only reason though - Reddit is trying to squeeze more revenue out of the platform ahead of its planned IPO later this year. But it shows how much this use of data is both a threat and an opportunity for the current web, forcing companies to rethink the openness of their platforms.


As an advertisement

Robot vacuum cleaner 360 Botslab P7

Budget robot vacuum cleaner with voice control, the ability to build room maps and an operating time of up to 90 minutes, as well as wet cleaning.


Saturday coffee #257

Pour a cup of fragrant Saturday coffee andcheck out the news of the week. Zenfone 10 is officially presented, new TCL tablets have been brought to Russia, replenishment on the pickup truck market, and Futurama will return to the screens ...

GAC GN8 test. It's a minivan!

Today, on our test, probably, one of the most interesting representatives of the GAC brand is the GN8 minivan.

Overview of the premium version of the laptop Huawei Matebook X Pro 2023

A stylish and durable typewriter with a great screen and a fast but hot 13th generation Intel processor. A laptop woven from contradictions for the wealthy public.

For example, using your dataI'm familiar with Wikipedia. Its information has long been used by Google to form "knowledge blocks", and in recent years the search giant has begun paying for this information. But Wikipedia moderators are debating how to use the new AI language models to write articles for the site itself. They are well aware of the problems associated with these systems, which fabricate facts and sources with a deceptive speed. But at the same time, they know that these systems offer clear advantages in terms of speed and scale. “The risk to Wikipedia is that people can reduce the quality by adding content they haven't reviewed,” says Amy Brookman, online community specialist and author of Should Wikipedia Be Trusted? “I don’t think there is anything wrong with using them as a blank, but each item should be checked.”

Stack Overflow - similar but maybe evenmore extreme case. Like Reddit, its moderators are also on strike, and like Wikipedia's editors, they worry about the quality of the AI-generated content. When ChatGPT launched last year, Stack Overflow was the first major platform to ban the results of its work. The moderators wrote: "The main problem is that although the answers that ChatGPT produces contain a high percentage of incorrect ones, they usually look like they can be correct and are very easy to give." Sorting the results takes too long, so the moderators decided on a complete ban.

However, the site's management had other plans.The company has since essentially lifted the ban, increasing the amount of evidence needed to allow users to post AI-generated content, and has announced that it wants to take advantage of the technology instead. Like Reddit, Stack Overflow plans to charge firms that take its data to create their own AI tools, presumably to compete with them. The struggle with moderators is in the plane of compliance with the standards of the site and who should ensure them. Moderators argue that the results of AI work cannot be trusted, but executives say the risk is worth it.

However, all these difficulties pale in comparison tochanges taking place at Google. Google Search supports the economy of today's Internet by spreading attention, and therefore revenue, across much of the web. Google has been spurred on by the popularity of Bing and ChatGPT as alternative search engines, and the search giant is experimenting with replacing its traditional 10 blue links with AI-generated summaries. But if everything goes according to plan, the changes will be like an avalanche.

In an article by Avram Pilch, Editor-in-ChiefTom's Hardware, a tech site about Google's AI search beta, highlights a number of issues, including Piltch saying that Google's new system is essentially a "plagiarism engine." Its AI-generated summaries often copy text from websites verbatim but place that content above the original links, depriving them of traffic. This is a change that Google has been promoting for a long time, but look at the screenshots in Pilch's article and you can see how the balance has shifted in favor of AI-mined content. If this new search model becomes the norm, it could damage the entire network, Pilch writes. Sites with limited revenue are likely to be forced out of business, and Google itself will run out of human-generated content for AI to shuffle.

This is the essence of the development of AI - the creationcheap content based on someone else's work. And if Google continues to use its current variant of AI-powered search, the consequences will be difficult to predict. This will potentially hurt entire segments of the web that most of us find useful, from product reviews to recipe blogs, hobby pages, news aggregators, and wikis of all kinds. Sites could protect themselves by blocking entry and charging access fees, but that would also mean a huge overhaul of the web economy. Ultimately, Google may kill the ecosystem that gave rise to its own value, or change it so irrevocably that the company's existence is in jeopardy.

But what happens if we let AI take the helm and start feeding information to the masses? What will change?

Alas, the data available to dateindicate that this will degrade the quality of the network as a whole. As Piltch points out in his review, for all of AI’s vaunted ability to recombine text, it’s humans who ultimately create the original data—whether journalists making phone calls and fact-checking, or Reddit users having a specific battery issue with a specific device. and who are happy to tell you how they solved it. In contrast, the information generated by AI language models and chatbots is often incorrect. The trick is that when something is wrong, the "wrong" is hard to spot.

Here is an example - AI agents, systems that uselanguage models such as ChatGPT that connect to web services and act on behalf of the user to order groceries or book flights. In one of the many viral Twitter threads* extolling the potential of this technology, the author presents a scenario in which a waterproof footwear company wants to commission a market study and uses AutoGPT (a system built on OpenAI Language Models) to generate a report on potential competitors. The resulting description is simple and predictable. It lists five companies, including Columbia, Salomon, and Merrell, along with items that purportedly describe the pros and cons of their products. “Columbia is a well-known and respected brand of outdoor gear and footwear,” we are told. "Their waterproof shoes come in a variety of styles" and "their prices are competitive in the market." You might look at this and think it's so trite that it's basically useless (and you'd be right), but besides, the information is a bit wrong.

r/hiking subreddit moderator named Chris,whom the author of the original article brought in as an expert, said that the result of the work of the AI ​​was essentially a dummy. "A lot of words, but there is no real value in what is written." Important factors such as the difference between men's and women's shoes or the types of fabric used are not mentioned. AI distorts the facts and ranks brands with more online presence as more worthy. In general, as Chris says, there is no expertise in the information received - only guesswork. “If I had been asked the same question, I would have given a completely different answer,” he said. “Following the advice of the AI ​​is likely to result in leg injury on the trail.”

This is exactly what the Stack Overflow moderators are complaining about:AI-generated misinformation is insidious because it is often invisible. It is issued instantly, but is not based on real experience, so it takes time and personal experience to evaluate it. If machine content replaces human authorship, it will be difficult—impossible, even—to fully assess the damage. And yes, humans are also numerous sources of misinformation, but if AI systems stifle platforms that currently run the roost of the human experience, we will have less opportunity to correct our collective mistakes.

More and more sites are being littered with cheap and junk AI content

The effects that AI produces on the Internet,not easy to generalize. Even in the few examples above, many different mechanisms are involved. In some cases, the perceived threat of AI seems to be used to justify changes that are otherwise desirable (as in the case of Reddit), while in other cases, AI is a weapon in the fight between those who create the value of the site and those who they steer (Stack Overflow). There are other areas in which the ability of AI to fill the boxes mentioned at the beginning of the text has a different effect - from social networks experimenting with attracting AI to marketplaces where AI-created garbage competes with other goods.

In each case, there is something in the ability of AIscale - in the simple fact of its crude distribution and dominance - which changes the platform. Many of the most successful websites are those that use scale to their advantage, either by multiplying social connections or product choices, or by sorting through the vast conglomerate of information that makes up the Internet itself. But this scale depends on the mass of people who create the underlying value, and people cannot surpass AI when it comes to mass production (despite the fact that behind the scenes a lot of human labor is spent on creating AI.) There is a famous essay on the topic of machine learning A “bitter lesson” that notes that decades of research proves that the best way to improve AI systems is not to apply engineering talent, but simply to use more computing power and data for a specific task. The lesson is bitter because it shows that the scale of machines is beyond the capacity of human oversight. The same can be said about the network.

But is this something bad?Will the network as we know it change in the face of artificial abundance? Some will say that this is how the world works, noting that the network itself killed what came before it, and in many ways for the better. For example, printed encyclopedias are nearly extinct, but most would prefer the breadth and accessibility of Wikipedia to the weight and persuasiveness of Britannica. And for all the challenges of AI writing, there are plenty of ways to improve it, from better citation features to more human control. Also, even if the web is littered with AI-generated junk, it could prove useful in encouraging the development of funded platforms. For example, if Google consistently gives you junk search results, you might prefer to pay for sources you trust and go directly to them.

In fact, the changes that are currentlytime breeds AI are simply the most recent in the long history of fighting on the Internet. It's essentially a battle over information—who creates it, how you access it, and who gets paid for it. But just because it's a familiar battle doesn't mean it doesn't make sense, nor does it guarantee that the new system will be better than the one we have now. The new network is struggling to be born, and the decisions we make now will determine its development.

*Meta Platforms, which includes the social networks Facebook and Instagram, has been recognized as an extremist organization and banned in the Russian Federation.