I think I'm taking a risk writing this post, because unfortunately it's going to involve nuance. I want you to know up front, I'm probably on the same side of the issue as you. I think generative AI in the 2020s is destroying the public sphere and we need to do a lot of different things to stop it.
If you end up reading this post and thinking that I'm making a slippery-slope argument that says generative AI is inevitable and you should give up and let it take over our culture, please back up and read this. Holy shit no I am not saying anything like that. I do not think we should give up. I think we should target it, say what's bad about it, and stop it.
The thing that's been bouncing around my mind is that, if you take such a hard-line position that you want to "ban AI" without being able to say what you mean by that, you are not going to be effective enough at opposing it. I understand taking a nuance-free position as a tactic, especially if your position is that "well I don't understand this and I shouldn't have to understand this, I just know it's bad", but I don't think that works as the only tactic.
Here's the outline of what I mean:
- The definition of AI is constantly changing
- Many AI techniques in the past have been normalized to the point where it sounds really silly to call them "AI"
- If today's generative AI follows that trend and gets normalized, that would be a problem, even though this sounds like recency bias
- Something changed in the 2020s that made generative AI more dangerous and insidious, and if we pin down what changed, we will be able to oppose it better. There were warning signs before, but the call to action is now.
I don't disagree with the arguments being advanced here, particularly the social and moral arguments. The world needs living artists and the art they make that reflects their fundamental personhood. However, I do think something important is being swept under the rug when emphasizing AI (and machine learning more generally) in terms of its use cases and its behaviors.
For example, ChatGPT and StableDiffusion feel like they are qualitatively different from one another because their outputs are so different. "300 words of Pokémon fan fiction written in the style of Herman Melville" and "uncannily smooth image of a tree with boobs wearing Mario's overalls" wouldn't be creative objects that a human being would make in the same way. Drawing doesn't feel like writing, and being good at one isn't a transferable skill to being good at the other.
(continued below)
However, almost all of the hype around "2020s AI" has, at its roots, variations on a single architectural innovation: the transformer. A lot of the very specific problems with the current AI boom/bubble are direct consequences of how transformer architectures work, what they're good at, and what they need to succeed. And perhaps the most surprising thing about this architecture is how generalizable it is. It has its origins in automated translation, one of the list of things that "used to be AI" until we got so used to Google Translate being able to do a passable job for day-to-day purposes (understanding the directions to your hostel, say) that it stopped feeling like magic.
There are a lot of potentially very sneaky ways to leverage this architecture that won't feel like "AI." A good analogy here is pitch correction in the music industry: Producers were digitally altering pop music performances as soon as the tools existed for doing so, but audiences only began to become aware of how sanitized and synthetic the music was becoming when Auto-Tune, developed in the late 90s, became the "vocorder for the 21st century" among artists who overcranked its settings to produce humanly impossible vocal performances. At an inhuman extreme, it's a fun novelty and an aesthetic choice. (Yes, back in the 90s, there were fans of both Cher and the Diva Plavalaguna who insisted that their signature pitch-corrected performances were "entirely authentic.") When kept discrete and subtle, it can instead be used to "cheat" by artists whose fans are led to believe they have superlative singing abilities.
So where's the problem? Who cares of Taylor Swift's producers massage her waveforms to eliminate irregularities? Who cares if your favorite digital artist uses a blending brush to "fake" an oil painting technique? Who cares if content-aware fill is used to remove trash from a photo? I don't present these questions as entirely rhetorical - they aren't obviously trivial to me, but neither are they obviously problems. There are computationally cheap solutions to each of these problems that, when used transparently, feel ethical to me.
What makes this sort of question more concerning in the present context is the cost of applying transformer architectures to any of these problems. Part of what makes transformer architectures so perilous is that the sophistication of their output scales remarkably well to the size of their training data. In order to get better results than the next tech company, you have to not only steal content, but to steal more content than your competitors. Theft isn't an inevitable outcome for machine learning in general, but it is for architectures with such enormous training set requirements.
The computational costs (and thus, energy and environmental costs) of transformers follow directly from this scalability. You don't have to run entire GPU farms to make a computer generate text in response to prompts (as demonstrated by ELIZA), but you do have to if you want to train a transformer with billions of parameters on an enormous corpus of data in any reasonable timeframe.
My point in posting my reply is this: Yes, we need to push back and to organize, and to build consensus about the values we want art to uphold. However, do not assume that the aesthetics of AI-generated content are a sufficient indicator of whether they are the result of an ethical process. There are definitely already images and text in circulation that you are being fooled by, whereas some "obviously generated" content doesn't have its roots in the objectionable "content furnace" that is a large-scale transformer architecture, and so does not deserve commensurate condemnation. "Make AI art uncool" is good as far as it goes, and necessary in the present context, but it's not sufficient to undermine the things that make transformer architectures so destructive at scale.
When evaluating art, and organizing to protect artists, we need to care about not just how the art was made, but how the tools were made. Even artists who have no intention of ever using this sort of AI should probably familiarize themselves with the corresponding computer science. Furthermore, we need to encourage a culture of transparency among artists with respect to their creative process, which is hard to balance with a desire to police moral vs. immoral means of production. Nuance is hard in this domain, and misinformation is easy, so in order to keep AI from going back underground and enduring as another "open industry secret," we need to emphasize process more and superficial aesthetics less.