Artificial intelligence and copyright: what lies ahead for us?

According to The Verge

The outgoing year was marked by a real flourishing of programs based on artificial

intelligence that creates worksvisual arts, music, or writing code while learning from the work of others. But as the role of these tools grows, it becomes clear that unanswered legal questions may determine the future of the industry.

Generative artificial intelligence hasvery fruitful year. Corporations such as Microsoft, Adobe, and GitHub are integrating this technology into their products, and startups are raising hundreds of millions to compete with them. The programs are also gaining cultural influence, as AI models that convert text into images are generating countless memes. However, in any discussion of generative AI, the question that is asked by both its supporters and critics is in the background: is all this legal?

This question arises from the very principles by whichgenerative AI systems are trained. Like most machine learning programs, they work by identifying and replicating patterns in data. But because these programs are used to generate code, text, music, and drawings, that data is itself created by humans, pulled from the Internet, and somehow protected by copyright.

For AI researchers in the distant hazy past(aka 2010s) it wasn't much of a problem. The AI of the day was only capable of generating blurry black-and-white images of faces the size of fingernails. It posed no obvious threat to humans. But in 2022, when a lone craftsman can use a program like Stable Diffusion to copy an artist's style in a matter of hours, or when companies are selling AI-generated prints and social media filters that are blatant knockoffs of contemporary designers, questions of legality and ethics are much more acute.

Take the case of Holly Mengert, an illustrator.Disney, who discovered that her style was copied by a Canadian student during an experiment with artificial intelligence. The student downloaded 32 of Mengert's pieces and spent several hours training a machine learning model that could reproduce her style. As Mengert told tech blogger Andy Baio, who reported the case, “Personally, I take it that someone takes the work that I have done, you know, the things that I have learned – and I have been an artist since I graduated from art school. in 2011 - and uses it to create works of art that I did not consent to or give permission for."

Is it fair? And can Mengert do something about it?

To answer these questions and considerthe legal environment surrounding Generative AI, The Verge spoke with a range of experts including lawyers, analysts and AI startups. Some said with confidence that these systems are certainly capable of copyright infringement and may face serious legal problems in the near future. Others have equally confidently suggested that just the opposite is true: everything that is happening now in the field of generative AI is legally open and any lawsuits are doomed to failure.

"I see people on both sides, extremelyconfident in their positions, but what will happen in reality, no one knows,” Bayo, who closely follows the development of generative artificial intelligence, answers The Verge. “And anyone who says they know exactly how it will turn out in court is wrong.”

Miscellaneous

Affiliate material

Reality and prospects of the IT professions market

What professions are the most popular and highly paid?

Saturday coffee #228

Pour a cup of fragrant Saturday coffee andcheck out the news of the week. Honor introduced a folding smartphone, premium electric cars are going to Russia, and the first Moskvich cars left the assembly line ...

Opel Grandland X test. It's back

In 2019, the French returned the Opel brand to ourmarket. The first passenger model with which PSA decided to return German cars to Russia was the Grandland X mid-size crossover, which will be discussed in today's material ...

Review of e-book Onyx Boox Volta 3

One of the most affordable electronic readers on the Android OS with a SIDE control case included, but without Google services and Asahi protective glass, as well as with an extremely simplified interface.

Andrés Guadamuz, academic specializing inAI and Intellectual Property Law at the University of Sussex in the UK, suggested that despite the large number of unknowns, there are only a few key questions from which all the uncertainty associated with this topic stems. First, can you copyright the results of a generative AI model, and if so, who owns them? Second, if you own the copyright to the input used to train the AI, does that give you any legal rights to the model or the content it creates? Once these questions are answered, an even more important question will arise: what to do with the consequences of using this technology? What legal restrictions can or should be placed on data collection? And can the people who create these systems, and those whose data is needed to create them, live in peace?

Let's look at these questions in order.

Inference question: Is it possible to register copyright on what creates an AI model?

The answer to this question is not that difficult.In the United States, there is no practice of registering copyright for works created exclusively by a machine. However, apparently, copyright can be claimed in cases where the creator is able to prove the degree of human participation.

In September, the United States Copyright Officeapproved the first-of-its-kind registration of a comic book created using Midjourney artificial intelligence, which converts text into an image. The comic is a complete work: it is an 18-page narrative with characters, dialogue and traditional comic book composition. And although it was subsequently reported that the Bureau is reconsidering its decision, the copyright registration for the comic has not yet been canceled. It appears that one factor in the revision will be the degree of human involvement in the creation of the comic. Kristina Kashtanova, the artist who created the comic, told IPWatchdog that the Bureau asked her to "provide details of my work to show that there was significant human involvement in the process of creating this graphic novel." (The organization itself does not comment on specific cases.)

According to Guadamus, this will become permanentproblem when you need to obtain copyright for works created with the help of AI. “If you just type Van Gogh cat, I don’t think it will be enough to get a US copyright,” he says. “But if you start experimenting with queries and create a few images, start processing your images, use different sources and design more, then in my opinion, you can certainly claim copyright.”

In the light of the above, it is likely thatthe vast majority of works created by generative AI models cannot be copyrighted. Typically, this is a mass product, and only a few keywords are used as a query. But more complex processes can help achieve a better result. Among the results can also be controversial works, such as a high-profile work of artificial intelligence that won a local art competition. The creator of the work said he spent weeks honing his requests and manually editing the finished work, which suggests a relatively high degree of human intellectual involvement.

Giorgio Franceselli, computer scientist,who writes about AI copyright issues, says measuring human input will be "particularly true" for processes in the EU. And in the UK, another major jurisdiction of concern to Western AI startups, the laws are different. The UK is one of the few countries that protects copyright for works created solely by computer, but the author is considered to be "the person who takes the steps necessary to create the work". Again, there is room for ambiguity (will this "person" be the designer of the model or its operator?), but this sets a precedent for copyright protection.

However, ultimately registration of copyrightbeing right is only the first step, as Guadamuz warns. “The US Copyright Office is not a court,” he says. "You need registration if you're going to sue someone for copyright infringement, but it's up to the courts to decide if that's legal."

Input Question: Can you use copyrighted data to train a neural network?

For most experts, the key question inThe area of AI and copyright lies in what data is used to train the neural network. Most systems are trained on huge amounts of content taken from the web, be it text, code or images. For example, the training database for Stable Diffusion, one of the largest and most influential voice-to-image translation systems, contains billions of images from hundreds of different domains. Everything from personal blogs on WordPress or Blogspot to specialty art platforms like DeviantArt and stock photo repositories like Shutterstock and Getty Images came into play. The reality is that if you posted your creativity online, then most likely it already got into the database for training one of the AIs. There are even services that offer to check who exactly uses your creativity.

At the same time, all researchers working with AI,startups and wealthy corporations there is a loophole in the law (again, this is true in the US) that was created to preserve freedom of expression when using a copyrighted product.

The question of what is considered“fair use,” explains Vanderbilt University Law School professor Daniel Gervais. He specializes in intellectual property law and writes extensively about how it intersects with AI. “In general, there are many factors that are taken into account when shaping the understanding of “fair use” in the field of intellectual property law, of which two are of the greatest importance. It was what was the purpose or nature of the use and what was the impact on the market. In other words, whether the use involved modifying the item in some way (a “transformative” use) and whether such use has an impact on the income of the original creator, competing with his works.”

Considering the significance of these factors, Prof.Gervais says the answer to the question of whether the use of AI training content is considered fair use is “more likely than not.” But this cannot be said with the same confidence in relation to the content created by neural networks. In other words, you can train your neural network as much as you want using data created by other people, but what you create with AI can violate proprietary rights. The difference is about the same as between printing counterfeit banknotes for a movie and trying to pay with them in a store.

Compare a couple of use cases for the samethe same AI model for converting text into an image. If a neural network has been trained on many millions of images and is being used to create a graphic novel, it is highly unlikely that it will cause any kind of copyright infringement controversy. The data used for training has been transformed in the process, and the resulting result does not affect the market for the original work. But if you hone a model on 100 paintings by a particular artist in order to obtain images that mimic his style, then the frustrated author will have very strong arguments in court against you.

“If you give artificial intelligence 10Stephen King novels and say, "Make a Stephen King novel," you're in direct competition with Stephen King. Would this be fair use? Probably not,” says Gervais.

It is important that between these two polesfair use and unfair use, there are an infinite number of scenarios in which input, purpose and result are mixed in different proportions and can tip the scales in court to any of the results.

Chairman of the Board of Directors of Wombo,developer of generative AI, Ryan Khurana, says that most companies selling such services are quite aware of such differences. “Intentionally using samples based on copyrighted works […] violates the rules for using the service prescribed by any major player,” he tells The Verge. But he also added that it is difficult to verify compliance with the requirements, and the companies themselves are more interested in preventing copyright infringement when using neural networks than in limiting the use of data for AI training. This is especially true for open source text-to-image models such as Stable Diffusion, which can be trained and used without any oversight or filters. The company may have covered its rear, but that could encourage copyright infringement.

Another important component to determinefair use - whether the training data and the neural network itself were used for the purpose of academic research, and not for profit. This criterion reinforces evidence of fair use well, and companies are aware of this. For example, Stability AI, which is responsible for the distribution of Stable Diffusion, does not directly collect data to train a neural network and does not train it. Instead, she is responsible for funding and coordinating the work of scientists, and the Stable Diffusion model itself is owned, according to a license, by one of the German universities. This allows Stability AI to use the neural network as a commercial service (DreamStudio), but at the same time distance itself from everything that is created with its help.

The Verge called this practice “AI laundering.”data.” Such methods have been used since the invention of AI programs for face recognition. An example is the MegaFace story, for which researchers from the University of Washington collected data by simply pulling photos from the Flickr service. Academic researchers took the data, laundered it, and then commercial companies used it in good conscience. Once it was research, but now the data, including millions of personal photos, has ended up in the hands of the Clearview AI facial recognition company, state law enforcement agencies and the Chinese government. Such a ready-made and proven way of data laundering already serves to protect developers of generative AI from legal claims.

Operators

Vladimir Nimin

Operator news: new Tele2 tariff

It seems that the new tariff from Tele2 looks like a great reason to throw a gamepad into your backpack so that you always have it at hand.

Review of smart watches Samsung Galaxy Watch5 Pro (SM-R925F)

The second generation of Samsung's WearOS smartwatches from Google - better battery life, more navigation options, eSIM, two days of guaranteed battery life and more.

Oppo A57s review: sophisticated style

Nice smartphone from Oppo with history and idea.

Huawei PixLab X1 Quick Review

A great all-in-one for home and business document printing…

And now let's add another plot twist to thisdeed. Professor Gervais points out that the current definition of "fair use" could change in the coming months as the US Supreme Court hears a case that mentions Andy Warhol and Prince. The court is considering whether Andy Warhol's work using photographs of Prince was fair use. Or is it copyright infringement?

The Supreme Court rarely hears casesfair use, but if they get to it, it's usually something significant. I think this is just such a case, says Professor Gervais. “So to say something definite before the Supreme Court has delivered its verdict would be too presumptuous.”

How can artists and AI companies find a compromise?

Even if we assume that learning generativeAI models fall under the definition of fair use, it still doesn't solve the industry's problems. This in no way soothes artists who are unhappy with the fact that their work is used by commercial neural networks, nor does it cover other generative models that work with code or music. With that in mind, let's think about what remedies, technical or otherwise, can be put in place to ensure that generative AI can thrive, but still respect the interests of content creators or compensate them for their losses? Indeed, without this, the whole direction simply cannot exist.

The most obvious proposal is to createdata licensing system and simply pay some funds to their creators. But if you think about it, it will just kill the industry. Brian Casey and Mark Lemley, authors of the article “Fair Learning,” which formed the basis of the argument in favor of the fair use of generative AI, argue that the data sets needed to train a neural network are so large that it is not possible to license all the photos, videos included in them - and audio recordings or texts for new use. They argue that any satisfaction of copyright claims will result not in the receipt of remuneration by the authors, but a complete ban on use. Permitting "good learning," as they call it, not only encourages innovation, but will also allow the development of better AI models.

On the other hand, it is argued that wefaced a copyright crisis of similar magnitude and found a successful solution to the problem. A number of experts contacted by The Verge recalled the era of music piracy, when file distribution programs were based on numerous copyright infringements but flourished only until a series of courts produced new agreements that allowed the preservation of copyright.

“In the early 2000s, you had everyone’s favorite, butcompletely illegal Napster. And today you have services like Spotify and iTunes,” says lawyer Matthew Butterick, who is handling cases against companies that collected data to train neural networks. Some time ago The Verge published an interview with him. “And how did this system come about? Due to the fact that companies were able to conclude license agreements and transfer all content to the legal field. Yes, all shareholders had to participate in the process to make it work, but no catastrophe would happen if a similar process was repeated with neural networks.

Ryan Khurana of Wombo suggests a similaroutcome: “In the music industry, copyright rules are much more complex, because there are different types of licensing, a variety of copyright holders, and many intermediate instances. Given the nuances [of data legalization issues for AI], I believe that the entire generative AI industry will evolve towards licensing along the lines of the music industry.”

There are other options that might work as well.For example, Shutterstock plans to create a fund that will compensate people whose work was sold to AI companies to train neural networks, and DeviantArt created a special metadata tag for images, the authors of which warn developers about the undesirability of using these images. The system does not work on DeviantArt itself, but the small social network Cohost has already implemented this tag and claims that if the data is used regardless of it, then the court will not win. However, the artistic community took all these initiatives with mixed feelings. Can a one-time license fee compensate for the loss of a source of income? How will the tag prohibiting data collection help those whose work has already been included in the collections for training commercial neural networks?

Many authors have already been harmed, butrepresentatives of AI developers at least offer some solutions for the future. One of the easiest ways for neural network developers is to create databases that won't infringe on copyright, either because the work was properly licensed or because the data was created solely for AI training. One such project called The Stack already exists. The database contains only the code with the widest possible open license, and there is also a tool for the fastest and easiest extraction of data on demand. The developers claim that such a model would suit the entire industry.

“The Stack approach without problems can beused by other media,” says Yasin Jernit, head of Machine Learning & Society at Hugging Face, who helped build The Stack with ServiceNow. “This is an important first step in mastering the mechanisms that serve to achieve agreement between the parties, mechanisms that work best when everyone abides by the rules of the platform from which the data was taken for AI training.” Jernit argues that Hugging Face wants to help bring about a fundamental shift in the attitude of AI developers towards content creators. However, at the moment this is an extremely rare case.

What will happen next?

What part of the complex of questions on legalizationWe have not touched on the work of generative AI, it is clear that all participants in the process are ready to change the situation. Companies that extract millions from this technology dig in their positions, repeating that all their activities are completely legal, when in reality they only hope that no one will challenge this claim. On the other side of the no man's land, copyright holders are voicing their sensitive stance on the issue, but are not at all eager to take any real action. Getty Images recently banned AI-generated content because it poses some potential risk to buyers. CEO Craig Peters said bluntly last month, “I think that would be irresponsible. I think it might be illegal." At the same time, the RIAA, the US Recording Industry Association, has declared that AI-generated mixes and data extraction infringe the rights of its members, although it has yet to initiate any legal action.

And the first shot of the AI copyright war is alreadysounded. Last week, a lawsuit was initiated against Microsoft, GitHub and OpenAI. The lawsuit alleges that all three companies knowingly reproduced open source code using the Copilot AI coding assistant, but without the proper licenses. Speaking to The Verge, the lawyers handling the case said it could set a precedent for the entire field of generative AI (although other experts dispute that claim, saying any copyright issues related to the code are likely to be severed). from issues related to content such as art and music).

But Guadamuz and Bayo unanimously declare thatthey are surprised that there are still no mass lawsuits. “To be honest, I'm amazed,” says Guadamoose. – But I think this is all because everyone in the industry is afraid to be the first and lose. As soon as someone breaks the ice, I think lawsuits will start to be filed right and left.

Baio suggests another complication.Many who are affected by this technology, artists and others, are simply in an unfortunate position to file a lawsuit. “They don't have the opportunity. Such lawsuits are very expensive and time-consuming, so you will only file them if you are confident that you will win. For this reason, for a while I thought that stock image sites would be the first to file lawsuits. It seems that they are the ones who lose the most from the development of this technology, they can easily prove that a significant part of their bases was used to train generative models, and they have money to finance such a process.

Guadamoose agrees:“Everyone knows how expensive it will cost. Whoever filed a lawsuit, first the lower court will decide, the loser will file an appeal, then there will be an appeal on appeal, and so on until they get to the Supreme Court.”