meow-d

aspiring catgirl

 
recently played music
My Last.fm
 
sauce
pfp: Mizuki from Project Sekai from the Kitty music video
banner source: some random post from 小红书
 
here's a totally real picture of me
(inspired by @blep's profile)

 
source unknown
'meow_d' text with yuri background. the yuri is from https://seiga.nicovideo.jp/seiga/im10931700

Fedi (Mastodon)
@meow_d@mas.to

engineeredd
@engineeredd
Posts like the one above rub me the wrong way. They tend to feature superstar artists and create the impression that their artwork is simply fed to a robot like in that Kenny Who Judge Dredd
story
with the machine understanding their style, being able to apply it creatively in the same way a (super)human would. This after seeing only a couple of pages. That's not really how it works. And if people truly want to fight art generated by machine learning, they should understand how it does.

I have some problems with the
premise
I don't agree that the generative art models really are that good at reproducing artists' styles in a credible manner, to the point that it's detrimental to their career and to the broader art world ecology. For the most part, whatever other flaws and dangers these models present, uncanny mimicry doesn't seem to be one of them. I've said earlier, the way people react to such examples rather shows a lack of critical thinking by ceding the authority to the AI, charmed by the fact it created images with maths, not by the actual result.
of those stories above, but I'll take it for granted in what follows and discuss what has to have happened for that to be true.

First of all, there are (at least) two components in the datasets that train models like Dall-E 2, Midjourney and Stable Diffusion. The image and the description, which are delivered as a pair. How well the generated image matches the prompt is only as good as the description. The AI doesn't understand anything about the picture, it (very, very reductively) just matches particular pixel distributions with particular word sequences.

The huge leap forward in generative AI models that we've seen this year comes more or less from the huge quantity of training data used to train the models. And from some smart engineering by creating systems of specialized neural networks. But the images aren't really the hard part to come by. We live in a culture saturated with images. For the longest part, the classifying part was the hardest, assembling the pair. That's because it was done by hand, by people paid or incentivized to do it (or who were somewhat aware of what they were doing, like when we solve a Google Captcha). What was innovative about Dall-E 2, and then opened the floodgates when the other labs realized they could do the same, is using publicly available data from the web. The image descriptions done for SEO and accessibility reasons stored in the alt fields are a treasure trove of readily classified data, for example. These descriptions are sometimes written by the artists, but for the amount of data needed, it's almost certain that they've used images described by critics, curators (from museum websites, for example) and especially by anonymous data entry and SEO people (on stock image sites) or by random internet users assembling image boards on Pinterest, blogs and the likes.

Secondly, this copious amount of classified image data is much more than what a single artist can produce. But the model is not fed only the one artist's works, but also the works of every other artist that tried to understand the style and replicate it (for better or worse).

For example, here are a bunch of artists on ArtStation posting paintings explicitly described as inspired by Karla Ortiz or made as studies after her work. There are hundreds more such examples on ArtStation alone.
Or who simply operated in a similar space, and get described through a reference to another artist as a shorthand.

I think we all know of the way Moebius, even as he went under countless stylistic changes just under that pseudonym, not even counting his work as Gir or Jean Giraud, gets used to describe any kind of art featuring stippling, ambitious perspective and detailed backgrounds. AND/OR Highly detailed realistic figures drawn in a clear line.

But the funniest thing to me is how this drawing by Kei Kobayashi (I think), gets passed down as being done by Moebius.

To the point that it itself gets referenced as an emblematic example of Moebius' style.

I think this shows how deep the rabbit whole goes in terms of how many artists can be responsible for constructing one notable style. It probably also explains why no MidJourney image prompted to generate something in the style of Moebius actually looks anything like a Moebius.

If the model were only fed the one artist's images, with people paid to classify and describe them, it would be piss poor. As they've been for years. When they work, they're not "stealing" the labor and ideas of a couple of people, but of hundred of thousands, maybe millions. Sure, for the vast majority of the people involved it's an infinitesimally small amount of labor time and that's why they more or less can get away with it.

It's also very diffused, starting with the works of a popular artist, then all the other artists that get lumped with them, all the other artists explicitly influenced by them, and finally everyone who posts their attempts at replicating a particular style either through studies or through exercises made for fun. And not even this greatly expanded dataset of images would mean squat, if not for the countless internet users who basically create a raw version of the set of classified images that gets turned into training data. They, we, are the ones doing the labor of understanding the work, of placing it in artistic continuums, of relating pieces of art with styles with artists and so on. The AI just creates probabilistic models based on all this deeply layered human labor of creating, understanding, describing and classifying art.

Sure, the likeliest scenario is that those campaigns will amount to nothing, at best removing the ability to use a few names in a prompt (although this surely can be circumvented).

The training data doesn't have to be stored anywhere, the model doesn't have to look at the image to "sample" it in the way lots of people think it does. The legality of having assembled the data is at most grey, there are even fewer problems with using the model, and there's no actionable way to easily remove an artist's impact on the network. You'd have to retrain the network with a dataset that doesn't use the offending images. And even in this absurd scenario, the effect will be minimal because of the way the dataset is constructed (you'd remove all the images owned by Sam Yang or Karla Ortiz or whatever, but won't remove those by the people drawing like them, save a global class action lawsuit that would take forever anyway), but also because there are network effects at play. "I drink your milkshake" stuff since all the data is transposed in multidimensional mathematical spaces. At the scale that these networks operate, the models can be made fill a void with something they hallucinate. This is how something like the Loab was made.

Expand if it doesn't make sense.
More specifically, it maps the various terms it's trained with and creates a topology between them. So, in a sense, it understands that a bunch of artists should be grouped together because we call them cubists, a couple of artists should be grouped because we call them impressionists, these artworks we call ArtStation and so on. So I can ask it to draw me something in the cubist style of glupitygloop and it won't say that the artist doesn't exist, but will instead interpolate something from what it knows cubist artworks look like.

I can also tell it to generate me something in the cubist style of Renoir. Again, it won't tell me that Renoir is an impressionist painter, not a cubist one, but interpolate between the features it "knows" are specific to cubist artworks and to "Renoir".

I use the quotes around "Renoir" because the model doesn't know who Renoir is and what's his style. For the model, he's just a sort-of position in a particular space. This space can be traversed. We can start at one point

and then slowly move to the other side

And that's not all. It doesn't only learn what something looks like, it also learns what something doesn't look like. When it learns what a Picasso looks like, it learns, to some extent, what everything else doesn't look like. When the model learns what it means to paint in the style of Picasso, it also learns what it means NOT to paint when it's asked for a Renoir or a Dali or a Greg Rutkowski. When it learns what a photo of a cat looks like, it learns what a drawing of Sam Yang doesn't look like. This is the contrastive part from the CLIP encoder used both by Dall-E 2 and Stable Diffusion.

All this has two implications.

  • If the model really can replicate an artist's style, then simply removing their own art pieces from the dataset would be pointless, because the space they occupy is thoroughly mapped out.
  • Every single piece of data that the network was trained with contributed to some degree to the model being able to do that.

But if people try and try, at some point a campaign will succeed, especially if it's coopted by a larger entity that better understands the tech and the nefarious possibilities opened by the "reforms" needed to address these artists' concerns. Think only about the way Hachette Book Group, HarperCollins Publishers, John Wiley & Sons, and Penguin Random House used Chuck Wendig to sue The Internet Archive, with effects that will surely create precedents affecting other non-profits and the library system.

You don't have to think very hard to image some sort of new DRM dictating how images are shared and described online. Let's even invoke a nightmare scenario and bring the blockchain into discussion. Both smart contracts and NFTs could "solve" the problem, in all sorts of ways that I don't want to think too much about because I'll get sick. But some quick examples, just to show I'm not full of shit and paranoid, would be minting a number of NFTs to an artist proportionally to how often their name is used in a prompt; or smart contracts transferring to the original artist a fraction of the money involved in each transaction where an image strongly influenced by them is involved.

I hope it's obvious how such an arrangement would be detrimental to most artists. First of all, as we've established, in order to get the model to do a compelling Kim Jung Gi, it'd need the input of a lot of other people who aren't Kim Jung Gi. His name would absorb the value created by all of them like a sponge. And there's nothing to say that such an arrangement would be relegated only to ML generated art. If all this architecture is put into place, it'd be trivial to implement systems to prohibit saying you're inspired by Ortiz, since her name attracts clients and gets better SEO, unless you've got the token received after following her workshop. This is a recipe for cartelization and monopolization.

But to get any of that, you need a legal framework to enforce it and at the moment you still can't patent or copyright a style. Yet!

People think about intellectual property rights as being something universal and eternal, but in fact the field is very much in flux. The doctrine stated above can be, if not abolished, then at least circumvented. The current worldwide intellectual property architecture is actually very modern, formed in the mid 90s with the TRIPS agreement. The term itself basically didn't exist before the mid '80s. There were other regimes governing what we call IP, because it was acknowledged that it wasn't ... property. A Google ngram graph showing the use over time of the terms "patent", "copyright", "trademark" and "intellectual property", with "intellectual property" appearing only since the 1980s

And the way it changed gravely impacted countries in the Global South. Let's not forget that art and media, even though it's what we most often talk about, is only the tip of the iceberg with IP. We're talking about technology, about patents, it affects industry, agriculture and pharmacology. This system enforces dependency relations that not only impoverish developing countries, but can also have deadly consequences as we've seen during the COVID epidemic, regarding the patents for vaccines.

Evolving what counts as IP on the internet, how it's managed and monetized, isn't a speculative issue, but very much and active concern. Although its goals are noble, GDPR has been an overall mess and actually helped consolidate big tech. The same goes for "link taxes" implemented or proposed in places such as Canada, Australia or the EU. And these are only "small" interventions, lacking a public facing campaign or lawsuit, like the one I mentioned above between Internet Archive and the large publishers or the one between Metallica and Napster, that ended up establishing the DMCA.

I mentioned all these issues to establish three things.

  • The fact that what counts as property shifts all the time both in the intellectual and digital spaces.
  • The fact that what counts as legitimate sharing, using, reusing changes
  • The fact that these changes rarely, if ever, help the little guy or gal, instead entrenching existing hierarchies

The way the neural networks were trained is a problem of the commons, of social ecosystems, of incredibly diffused networks of inspiration, influence and incrementally small transformative work. To adequately address it we need to accept this and work from there.

Any campaign that takes as its starting point the figure of the sole heroic creator exploited by a thinking machine will, first of all, misdiagnose the problem and propose inefficient solutions. But its ideological core is actually dangerous, since it'll propose solutions in the realm of regulations and intellectual property, it'll think in terms of individuals (be them persons or firms) and ownership, not in terms of social bodies.

The endgame of such a project can't be anything other than a form of rights management over style, method and technique. The ML labs will take it on the chin, the star artists will secure the bag and the vast majority of artists will be hurt, all while opening a Pandora's box that, going by past experiences, ushers only terrifying developments for the world at large. Image generators and chatbots are probably the most visible products made with ML, but they're hardly the only ones.


@meow-d shared with:

You must log in to comment.

in reply to @engineeredd's post:

As an aside, I really like that Cohost is very permissive about HTML and CSS, letting me use the tag to make these little footnotes and asides.

It'll be interesting to see how other people use web technologies to have articles reflect the way they think (in my case with lots of asides and branching off), instead of shaping their thoughts into the limitations of the platform.

Fascinating. Yes, excluding one artist's work won't really change matters, because, if I am understanding this right, what these systems really do is aggregate across everything that seems to fit. Isn't this really more similar to parody, satire and (even more so, even less derivatively) aesthetic "schools"? It is as if these systems are, essentually, studying what is out there, and then creating within an identified style... which is precisely what humans do. Else we would have to have banned Raymond Chandler for being inspired by Dashiell Hammett.

if I am understanding this right, what these systems really do is aggregate across everything that seems to fit. Isn't this really more similar to parody, satire and (even more so, even less derivatively) aesthetic "schools"? It is as if these systems are, essentually, studying what is out there, and then creating within an identified style... which is precisely what humans do.

To a certain point and at a certain level of abstraction. These particular systems are doing something that's called semi-supervised learning or learning with weak supervision. Which is to say they are trained on a small dataset where the humans embed into them some expected results and then they are left to discover patterns on their own through unsupervised learning. Unsupervised learning is actually a branch of data science and statistics. It's not so much that they "learn" something, rather it's a way to run an immense set of data through an algorithm that might uncover patterns. We're talking here about datasets that are unsuitable for human analysis because of their size. Here's a very nice description of a very simple algorithm conceptually in the same family with unsupervised learning. But even this is just one step in the process, because you'd then need a human to determine if the uncovered patterns are actually meaningful or are an artefact of poor data or are based on pointless correlations and so on.

Now, getting back to what these networks actually aggregate. They're not computer vision models, they don't that a particular distribution of pixels is a brush stroke or a contour line and a particular brush stroke is specific to impressionist painters, another to the old Dutch masters and so on. They aggregate the knowledge generated about those artworks by humans, over history. They're going more like: "oh, the humans call this 256x256 bitmap an impressionist painting, so I'm going to map it onto the impressionist space". So, without a history of critical thought about images, without massive education programs (public and private), without museography and art collectors, critics and aspiring artists, they wouldn't be able to do anything.

Despite the terminology, instead of thinking about them as learning, I think it's more apt to think that they're storing human thought. We've already done the work of classifying all these artworks, at a social level, and they are aggregating that work.

I don't think the theft trope is wholly inadequate. But the scale and its subjects are all wrong. It's not plagiarism, but something more along the lines of the enclosure of the commons. It's not a single artist that's wronged, but a whole social system. The tech itself isn't really the problem, but the way it's been developed to replace websites like Shutterstock to the benefit of private firms.

I think it offers a very good opportunity to reevaluate a lot of notions, starting from authorship to the mechanisms of cultural production. At the same time, I think we should be critical about how it's being developed and used.