LLMs ("large language models") need a lot of input to generate their statistical text models. Orders of magnitude more than is contained in a single webcomic, even one as large as Achewood. The way this works is, roughly, you train a "base model" on approximately the entire internet. Then you take this "base model" and you add additional training on top to give it a specific flavor, such as "Ray Smuckles". The article alludes to this:
The first challenge was formatting the data in a way the language model could use. Then, there was the matter of picking an underlying language model to train with Ray’s voice. OpenAI’s ChatGPT was a little too sanctimonious for Ray, who likes to color outside of the lines, Hall says. They wound up using a fine-tuned version of OpenAI’s Davinci, which Hall estimates is about 60 times more expensive than ChatGPT.
So, this is not just a matter of "he's only using his own writing so it's fine". The model Onstad is working with is exactly as plagiaristic as anything else OpenAI has put out, it just adds a layer of Smucklesiness on top of that. Whether you think "training a statistical model on the entire internet without authors' consent" is specifically plagiarism, otherwise exploitative, or totally fine is up to you. But you can't draw a clean line and say "training AI nonconsensually is bad but what Onstad is doing is okay."
once upon a time we were looking at selecting an ML vendor for a thing, and the "minimum amount of instances of data" required for decent training across the vendors was as follows:
10k 120k 500k
and this is for something that can be pretty simple to "spot" patterns in
One of these things is not like the others, and we pretty quickly rejected the vendor who confidently said 10k was "big data" -- the 500k vendor cautioned that even that is pretty likely to be not great if the data doesn't represent enough of the problem space. The 100k had the caveat of "if you have perfect data" ... which wasn't exactly comforting, either.
It's not the sort of thing that's on "single human" scale for volume of work necessary to function
