by Soren Ryan-Jensen
Last year, the case of Andersen et al v. Stability AI Ltd. was dismissed in front of a federal judge. This class action against generative AI companies claimed that they had used copyrighted material in their AI training databases, without obtaining the consent of the artist, arguing that this constituted an infringement of their copyright.
But what even is generative AI?
It wasn’t that long ago we saw the “AI-powered” craze, with smart furniture and devices claiming that some form of AI ran in the background. If you really wanted, you could even kickstart an AI-powered alarm clock that would choose when to wake you up.
These AI devices were ultimately “predictive” AIs, essentially AIs that utilize sets of data and conditional statements to predict what preexisting pattern or behavior would be most useful. Generative AIs work in a similar fashion, except that they analyze a large database to mirror appropriate patterns. For example, say you tell a generative AI you want an image of a cat. It would then scour its database for any image with the same markers as “cat” before applying all patterns it detects. In this case, it might recognize that under “cat” there are patterns of whiskers, cat ears, and 4 legs. It would then provide you with an image containing all the concepts of what it thinks a “cat” contains.
Ultimately, generative AI uses preexisting data to create outputs based on the conditions requested of it. Alongside predictive AI, generative AI can be used for all sorts of applications. As in the case of artists and writers, there is an immediate threat to many professional job positions. And often the threatening programs are trained on the data these professionals produce.
Generative AI services require mountains of data. For example, ChatGPT is trained on 570 GB of data collected from countless internet sources from web pages to books. The problem is that these training databases often incorporate copyrighted material without the consent of their creators. This means that artists and writers can be sampled without ever knowing, and have their work train a non-human competitor.
To hinder these hulking behemoths of data, Ben Zhao, a professor at the University of Chicago, led a team in the creation of software called “Nightshade”. Nightshade is a tool used by artists to apply small changes to an image that hinder machine-learning models, ultimately causing the models to output useless or incorrect results. This tactic of “data poisoning” intends to prevent software companies from sampling artists’ work in the creation of training databases. However, these data poisoning methods are often an uphill battle. While artists may be poisoning their current works, Midjourney’s founder claimed in a Forbes interview that their software was built on “a hundred million images”. With so much data already incorporated into these technologies, it’s uncertain as to how much of an effect data poisoning can really cause.
Others are taking a stand offline, with the Writers Guild of America winning job security protections against the use of AI by major film studios. However, these advancements against the use of AI are grappling with a wider market that is increasingly invested in generative AI technology. Billions of dollars are being invested into generative AI, with the hope of reducing labor costs and increasing efficiency. And while there are concerns surrounding AI becoming another dot-com bubble, the technology is already here and is already being used.
In the last couple of years, generative AI firms have seen their most threatening legal challenges resolved alongside massive investments. So it seems that, at least for now, the AIs are here to stay. The question that remains: How will these technologies be used, and who will stand to benefit?