Monday, April 29, 2024
How-tos

How to keep your art out of AI generators


AI-generated imagery feels inescapable. It’s in the video games you play, in the movies you watch, and has flooded social media platforms. It’s even been used to promote the physical hardware that real, human artists use to create digital paintings and illustrations, to the immense frustration of those who already feel displaced by the technology. 

The pervasive nature of it seems especially egregious to creators who are fighting to stop their works from being used, without consent or compensation, to improve the very thing that threatens to disrupt their careers and livelihoods. The data pools that go into training generative AI models often contain images that are indiscriminately scraped from the internet, and some AI image generator tools allow users to upload reference images they want to imitate. Many creative professionals need to advertise their work via social media and online portfolios, so simply taking everything offline isn’t a viable solution. And a lack of legal clarity around AI technology has created something of a Wild-West environment that’s difficult to resist. Difficult, but not impossible.

While the tools are often complicated and time consuming, several AI companies provide creators with ways to opt their work out of training. And for visual artists who want broader protections there are tools like Glaze and Kin.Art, which make the works useless for training. Here’s how to navigate the best solutions we’ve found so far.

Opting Out

Generative AI models depend on training datasets, and the companies behind them are motivated to avoid restricting those potential data pools. So while they often do allow artists to opt their work out, the process can be crude and labor intensive — especially if you have a sizable catalog of work. 

Opting out typically requires submitting a request to an AI provider, either via a dedicated form or directly via email, along with the copies and written descriptions of images you want to protect. Additionally, if you’ve agreed to let third parties license your images, the terms may include a license for AI training. It’s worth scanning the user agreements for any platforms hosting your work to check what rights they hold over it. But different AI tools’ policies vary — here’s how to opt out of some popular ones.

OpenAI DALL-E

OpenAI started allowing creators to remove their work from its training data alongside its DALL-E 3 generative AI model last September, and it’s one of the easier processes to follow. Content creators or owners just need to submit a form to OpenAI to request that the work be excluded from future training datasets, including a copy of the image, a description of it, and a ticked checkbox confirming that you have the rights for said image.

Unfortunately, you’ll have to submit a separate form for every image you want excluded from OpenAI’s datasets, which could amount to thousands of works for some people; OpenAI hasn’t disclosed how many artists have undertaken this ordeal

You have to submit a single form for every artwork you want opting out of OpenAIs training, which simply isn’t realistic for creatives with vast portfolios.
Image: OpenAI

If you only host your works on your own website, there might be a more efficient option. You can follow the instructions linked here to block the “GPTBot” web crawler used to scrape data from publicly available internet sources, which should protect all the content on it. A downside to this method, however, is that images posted anywhere outside of those walled protections, such as on social media, are still at risk of being scraped. Submitting a form at least ensures that your work is protected by a wider net, providing OpenAI hasn’t already obtained the images via a licensed third party.

Both these processes only offer protection against being swept into future training datasets. OpenAI claims that its AI models don’t retain any information they’ve already been trained on, so if you believe your work was already consumed by DALL-E 3 or its previous iterations, it’s too late to have it removed.

DALL-E 3 is also the model used by Image Creator from Designer, the Microsoft tool previously known as Bing Image Creator. As such, the process of opting out with OpenAI directly should also prevent Image Creator from being trained on your works.

Adobe Firefly

Of course, for every AI company that does allow artists to remove their works from training data, many others don’t openly advertise having such a service. And if they’re training models on a platform they own, users of that platform may not be allowed to opt out at all. That’s the case with creative software giant Adobe, which uses a model called Firefly across its Creative Cloud suite, including in Photoshop’s generative fill tool.

Adobe proclaims that Firefly is commercially and legally safe because it’s entirely trained on the company’s own stock image platform, Adobe Stock. But there’s no means for Adobe Stock contributors to opt out of training Adobe’s AI models, which has resulted in some existing users criticizing the company for not seeking their permission. If you don’t want your work used to improve Firefly, you can’t put it on Adobe Stock, period.

It doesn’t get much clearer than this line from Adobe’s FAQs. If you don’t want to train Firefly, avoid Adobe Stock.
Image: Adobe

In principle, Adobe’s approach should mean that non-Stock users don’t have to worry about Firefly. But the reality is that there’s plenty of pirated work uploaded to the platform. If you find that someone has fraudulently uploaded your work to Adobe Stock, you can send Adobe an IP infringement notice to get it removed from the platform. 

Meta 

Creatives who want to avoid training Meta’s AI models will have to jump through similar hoops. Meta is using “information from its products and services” to train its generative AI models, so anything personal you upload, or have historically uploaded, to platforms like Facebook, Instagram, and Threads is fair game for AI training. If you don’t have an account on any of those services then you’ve potentially avoided feeding its AI machine, but deleting existing accounts and/or not uploading future works to them is the next best thing.

You can submit a form to Meta to request the company correct or delete personal information that’s being used to train its generative AI models, but only if that information has been supplied by a third party. It won’t let you exclude, for instance, art you’ve been voluntarily showcasing on Instagram. Many artists have also found it to be a frustrating process, criticizing how often the tool is unable to process requests. Conceptual artist Bethany Berg told Wired that the removal form felt like “it was just a fake PR stunt to make it look like they were actually trying to do something.”

Just remember that Meta will hold some rights over any content you upload to its platforms, so the most effective solution is to avoid them entirely.
Image: Meta

Beyond that, you can limit what personal information third parties are sharing with Meta by managing your Off-Facebook Activity. This tool will display which sites and services are giving your data to Meta and allow you to sever the connection that ties your identity with such data. This won’t clear the data that’s already been uploaded, but it should enable users to monitor if platforms they know are hosting their works are potentially feeding that information back to Meta directly.

That said, Meta also uses “information that is publicly available online” to train its generative AI models, and it doesn’t disclose its datasets. So there’s no way of knowing precisely what’s already in that massive content pool — and no surefire way of staying out.

What about Stability AI, Midjourney, and so on?

Two of the most popular generative AI tools — Midjourney and Stability AI’s Stable Diffusion — will remove copyright-infringing materials under the Digital Millennium Copyright Act (DMCA). But this information is buried in their respective Terms of Use policies, and the processes are crude. This also isn’t strictly an opt-out tool, and neither company provides a means to opt work out of being sucked into future training data pools.

For both services, you’ll need to email the companies directly. Midjourney can be reached at takedown@midjourney.com. For Stability AI, email your requests to both mariya@stability.ai and legal@stability.ai. Stability’s user terms don’t specify what you’d need to provide, but the information required by Midjourney, and most DMCA copyright infringement notices generally, includes a description of the original works, where the image infringing on them is located, your contact information, and a copy of your signature. 

Other, smaller AI providers may also provide a similar approach to removing data that infringes on intellectual property rights thanks to regulations like DCMA, to varying success — if you’re unsure, try contacting the AI provider directly.

How else can I protect my work against generative AI?

With all that laid out, it’s clear that artists’ options when dealing directly with AI companies are pretty limited. Externally, however, several tools and services can grant creators better defenses — or even offenses — when fighting back. The various tools work differently, but in general, they run your visual art through processes that confuse or block effective training. That way, even if your work is scraped for an AI model, that model (ideally) won’t learn to reproduce it.

Glaze

When you launch Glaze, you’ll need to give it some time to download the resources it needs to protect your work.
Image: Sand Lab, University of Chicago

One of the most notable anti-training tools is Glaze, a project launched by a team out of the University of Chicago. The free-to-use tool works as a kind of cloak, making pixel-level changes to images that confuse AI software trying to read them. Real people can’t typically see these alterations on highly-detailed images so there’s little impact on the human viewing experience, but AI image generators that are fed the same materials will recognize it as something else entirely — meaning anyone who tries to replicate its specific art style will be unable to do so.

Glaze is available for Windows or macOS. There are GPU and non-GPU versions available for Windows, but running the GPU variant specifically requires an Nvidia GPU from this list with at least 3.6GB of memory. (The developers say Glaze generally uses around 5GB of system memory to run.) Using it is straightforward: at first launch, the application will automatically download a number of machine learning libraries and other resources it needs to cloud your images. When that’s complete, head to the “Select” box at the top left and choose which images on your computer you’d like to Glaze. These can be uploaded in batches, so it’s much quicker than making individual opt-out requests.

You may want to experiment with the strength of the Glaze application — on simple illustrations like this, Glazing at max intensity can distort the results.
Image: Jess Weatherbed / The Verge and Image: Jess Weatherbed / The Verge

You can then adjust the intensity of the Glaze cloaking from “very low” to “very high,” with the latter offering greater protection against AI but increasing the possibility of changes being visible to humans. Render quality, another option, determines the overall quality of the finished image — higher-quality rendering looks better and offers greater protection but will also take much longer to process. Generally, the finished result should look virtually unchanged from your original. But a close inspection will reveal tiny differences, almost like a textured wash has been applied to it.

Nightshade

Nightshade shares a very similar UI to Glaze, which is unsurprising considering it’s being developed by the same team.
Image: Sand Lab, University of Chicago

Nightshade, from the team behind Glaze, takes a similar but more extreme approach. Images passed through this cloaking tool are actually intended to “poison” generative AI models that train on them, sabotaging the outputs for text prompts. If you upload a batch of dog pictures, for instance, Nightshade is supposed to fool models into seeing some other object like cars — rather than just confusing the model like Glaze does. The idea is that if a model takes in enough confusing images, it will start building rules based on them, so any dog-related prompt might become distorted with wheels and windshields. 

You can’t specify what you’d like your poisoned images to masquerade as because Nightshade is built around algorithms that can’t accommodate that kind of personalization. If you want a better insight into how it works, check out this breakdown provided by data scientist Dorian Drost.

Like Glaze, Nightshade applies a filter-like film over the image that shouldn’t massively impact the human viewing experience, depending on the intensity of the protection layer and how detailed the original art is. (You can apply both Glaze and Nightshade to images without them interfering with each other.) Nightshade is also available for Windows and macOS systems, though only machines running Apple’s own silicon are supported for the latter.

At default intensity, Nightshade should produce similar-looking results to Glazed images. The poisoned results on the right are nearly identical to our Glaze tests.
Image: Jess Weatherbed / The Verge and Image: Jess Weatherbed / The Verge

Most of the overall process is the same as Glaze: you wait for the tool to download machine learning libraries, upload your work, and set the intensity and rendering options. But there’s one extra step. Nightshade will analyze the images and fill the “current tag” field with a single-word description identifying the content, like “dog” or “girl.” For the poisoning effect to work, this needs to be accurate — so you can change it if it’s wrong. Then, when you upload the images online, make sure that single-word tag is included in the metadata or alt text. 

Some generative AI advocates argue Nightshade won’t be much of a hindrance. AI systems are trained on truly vast amounts of data, so you’d need a lot of poisoning to affect any given prompt. And companies can develop workarounds that detect Nightshade. But most of these workarounds only filter out images that use it, rather than removing the protections — so the end result is just having art excluded from the training data, which is still a win. The Glaze project team is also continually working to update the applications to close any loopholes that are being exploited by workarounds.

Mist

Mist can be tricky to set up, but its another option to try if you’re unhappy with results from Glaze and Nightshade.
Image: Mist

Mist is a “preprocessing tool” developed by Psyker Group that, like Glaze and Nightshade, also prevents generative AI applications from effectively imitating a creator’s unique style and works. Mist’s approach is more akin to watermarking images. If an AI model is trained on “misted images,” any attempt to mimic them will see the output completely covered in visual distortions that render it unfit for most purposes and generally unpleasant to look at.

Here’s an example of what’s produced by AI generation tools that reference Misted images.
Image: Mist / Sang Delan

Elements of the original image can still be seen in some of these outputs, like similarities in photography or art styles, but the chaotic, noisy filter over the generated image isn’t something that can be easily corrected. Mist requires a graphics card with at least 6GB of VRAM, which isn’t a lot of computational resources, but it’s still greater than the 3.6GB Glaze requires. Mist has been open-sourced on GitHub to allow developers to build their own tools around it, and its creators have committed to offering long-term support and continuously improving its function.

There are currently two ways for non-developers to use Mist. Windows PC users running an Nvidia GPU can download Mist for free via this Google Drive package. The software doesn’t require installation and can be used almost immediately after downloading — though it’s a little finicky to set up if you lack any coding or development experience.

Misting images can also produce a faint, swirling filter over the results, but like Glaze, it’s harder to spot on detailed art or photography.
Image: Mist / Sang Delan and Image: Mist / Sang Delan

A detailed handbook is available that will walk you through the entire process, along with a community Discord channel for troubleshooting. First, make sure you’ve installed the .NET desktop runtime. When that’s done, you just select the “ENG” file inside Google Drive and download the zipped Mist_V2 folder within it. Create a new folder called “IMG” in mist-v2 > src > data >. Drop any images that you plan on Misting into the new folder when completed. Then, go back to the main folder (which should be titled “mist-v2_gui_free_version”) and run the Mist GUI booter. Mist allows you to adjust the strength of protection applied to images and select between using your device’s GPU or CPU, which may prove useful if you’re running old or inefficient hardware.

For anyone who’s using macOS or doesn’t possess an Nvidia GPU, you can also run Mist via Colab Notebook, a cloud-based Jupyter Notebook environment that runs in your web browser. Detailed instructions for how to do this are available here, but it’s a much more complicated process to set up than its Windows equivalent. Glaze and Nightshade, generally, will be much easier to navigate for folks who aren’t familiar with coding processes.

Kin.Art

Kin.Art isn’t so much an AI protection tool as it is an entire portfolio platform that artists can use to host and sell their work. It goes beyond just banning AI-generated works — though that’s appreciated, given the backlash against sites like DeviantArt and ArtStation — and actively makes AI scraping and training harder.

Kin.Art uses two different techniques to thwart AI companies. The first is image segmentation, which is used to break apart images and muddle them into something unrecognizable. It’s undetectable to human eyes but disrupts generative AI models from being able to read the image. This visual scrambling will also be present if anyone attempts to save or download the image, though it doesn’t block manual screenshots. The second technique involves scrambling the metadata, like title and description, so any labels the AI model reads won’t accurately reflect the content.

Kin.Art’s AI protections just require users to tick a box when uploading their works to the platform.
Image: Kin.Art

These protections are automatically applied on the Kin.Art platform, so you just need to create an account and upload your works to benefit from them, and that works like practically any social media platform. There are some neat creator-focused features included, like the ability to add a commission status to advertise your availability to accept requests, and you can link out to external platforms like social media pages directly on your user profile. You can toggle the protections on or off when uploading images, and the service is currently free to use. Instead, Kin.Art will start placing a 5 percent service fee on top of commissions made through the service in March.

What about music, writing, and other media?

Our guide covers what protections are available for image-based art largely because that format has more tools available than other mediums, and the opting-out processes tend to be clearer (when they are available). That said, creatives in other fields, like writing, voice acting, and music, are also fighting to protect their work. It’s much harder to disrupt how AI models are trained on this kind of data without noticeably affecting the original content, but there are still precautions you can take to reduce the risk of it being swept into AI training datasets.

As with art, always check the user terms of the hosting platform to which you’re uploading your works. Services will generally disclose if they’re handing platform data over to third parties for AI training or using it to develop their own models — if there’s no explicit opt-out process, you may unknowingly be giving consent simply by signing up. Instead, look for platforms like Medium, which have committed to blocking attempts to use content hosted on the site to train AI models. If you’re hosting work on your own site, you can also do things like block GPTBot to avoid pages being scraped.

Some rights distributors have made similar commitments, like the Society of Authors, Composers and Publishers of Music (SACEM) — a French association that announced it was exercising its right to opt out on behalf of its members last year. Another tip for writers, courtesy of the Authors Guild, is to place a short warning notice on your published works that clearly states you don’t consent to it being used to train AI. This is the example provided by the guild:

“NO AI TRAINING: Without in any way limiting the author’s [and publisher’s] exclusive rights under copyright, any use of this publication to “train” generative artificial intelligence (AI) technologies to generate text is expressly prohibited. The author reserves all rights to license uses of this work for generative AI training and development of machine learning language models.”

These warnings serve to clearly flag that the work isn’t freely available to use, which may be useful in any future lawsuits raised against companies that violate your ownership rights. If bots scraping web data are also intuitive enough to filter out results with such warnings then this could also potentially provide another layer of proactive protection, but there’s little evidence to show how many actually observe such information. Otherwise, performers and writers will need to submit copyright takedown notices to AI companies if they believe their works have been infringed.



READ SOURCE

This website uses cookies. By continuing to use this site, you accept our use of cookies.