pillowkisser

sometimes artist & gender mess

Hi I'm Pillow, a porn artist and ancient gender mess on the internet!
๐Ÿ’œ
Here thar be porgongraphy, both furry and not-furry, and kinks wot include BDSM, milking, and public-y stuff on occasion!
๐Ÿ”–
Bookmarkable tag for all my art:
#pillowkisser art
โœจ
Currently working on finishing art I owe.
After that? A mystery....


๐Ÿ’ธ See all my art early for $3!
subscribestar.adult/pillowkisser
๐Ÿšง Website (eventually!)
pillowkisser.neocities.org/
๐Ÿฆ Twitter
x.com/thePillowkisser
๐ŸŒ All else
pillowkisser.carrd.co/

mtrc
@mtrc

Recently I've been approached by different news organisations to comment on deepfaked images and videos. In most cases they already know whether the thing is a fake or not, but they want to know why. It's been a pretty fascinating thing to be tasked with, honestly, and some of the examples completely caught me by surprise (warning: Daily Mail link). Many of us see faked images on a daily basis now, but there's not a lot of writing about how fakes are detected other than one-liner folk knowledge on places like Twitter. I thought I'd write a bit about how I approach the problem.


I should stress before I go on - I'm not an 'expert' at detecting deepfakes, I don't think anyone is today really as it's just not a thing people have particularly had to specialise in. I do end up looking at a lot of AI-generated art just as part of my job, so I maybe have a little more experience than average, but I wanted to stress that I'm not saying these are foolproof ways to detect fakes. They're just a second opinion, for you to take and think about and combine with your own experience. Ok, enough disclaimers, let's dive in.

Text

Text is still one of the hardest things for AI image generators to do, and even the models that do it well now often struggle in photorealistic settings (also a lot of fakes aren't made with the most cutting-edge commercial models). Text is a good thing to check because it needs to be both structurally correct - the letters need to look correct, the font needs to be consistent - and semantically correct - it needs to mean something in a real language humans speak.

The biggest tell in the recent Trump deepfake that hit the news headlines was text - look at the hat being worn in the top-left of the image, and to a lesser extent on the shirt right-of-center:

Printed text in particular has high regularity requirements - letters will be the same sizes, aligned together, all consistently styled. In the hat here we can see the letters don't look like letters, but they also aren't centred as we'd expect them to be either. It's obviously possible to make a hat like that (actually making clothing that intentionally has what looks like deepfaked text would be pretty funny) but it's not usual for clothing, so it's a red flag.

Continuity Errors

Check out this image of Joe Biden. It's fake:

There are lots of ways that things in images can be connected to one another. For example, in the image above, we would find it weird if the people in the background were wearing clown costumes instead of US military uniforms. The people are connected through the meaning of the image, there's a consistency we expect. Connections can also be very fine-grained - on the US flag patch on Biden's arm, we expect each stripe on the flag to alternate white and red and be perfectly horizontal. These are examples of things that are loosely connected over a large distance (the people in the image) and more tightly connected to their immediate surroundings (the individual stripes on a flag).

AI image generators can handle both types of context, but they sometimes struggle when the two are combined - if something has to have a consistent connecting detail across a gap, or a big portion of the image. Look at the line on the inside edge of this plate, where it meets the pastry:

If we zoom in really close, we can see there's a line that follows the edge of the plate, curving, disappearing under the pastry. But it doesn't emerge out the other side. In fact, if you look really closely, it just stops abruptly just before it meets the pastry.

This is quite a subtle detail to pick up, but it's sometimes seen whenever a shape is interrupted by something else. To us, the shape should obviously continue underneath the object and appear out of the other side. I like to think of this as the image generator wrestling between making the image more coherent locally, and more coherent globally. It's tried to match the curve of the bottom of the pastry and align the line on the plate with it, instead of realising that it is part of a circular pattern on the plate itself. You can see a similar effect in the same image in a wider shot. Check out the grain on the table:

Intuitively we know that wooden tables should have fairly consistent lines marking where pieces of wood were brought together. We can accept if they aren't perfectly aligned because they might have a more rustic look, but the line separating two bits of wood in the top part of the table just disappears when it emerges on the other side of the plate.

Again, it's not a guaranteed, damning bit of evidence, but it's something to look out for - shapes that you know should be completed, but that are obscured or covered by something else. For a more obvious example, I won't embed this here (cw: cats in distress, although it is fake) but here's a fake photo of two cats hugging amidst wreckage in a war. You can see that one of the paws supporting the ginger cat is at completely the wrong angle, because the bodies of both cats are obscuring it. To the AI, it could be connected to either cat - but the cat it appears to be connected to already has two forelegs visible.

Context Interference

AI image generators aren't very good at drawing archers. In fact, I was surprised to find people complaining about this specific use-case online:

There's actually something really interesting about the example images attached to the Reddit post, something that I see a lot with all sorts of deepfakes. Check this out and see what stands out to you (I mean in fairness there's a lot going on, this is not a good bit of AI art):

Many modern AI image generators use a process called diffusion, that starts with a noisy image and slowly removes the noise, replacing it with pixels that make it look more and more like an image that suits the input. We can imagine it like a big array of sensors, each one measuring something different about the image, that we've all trained to look for things that we associate with images labelled as 'archers'. Because of the images we've been trained on, we'll probably look for forests and the colour green, hoods, bows, that sort of thing. We can imagine each one of these gets their own little sensor (this is not how it works, of course, it's just an analogy) that detects their specific archer-related tell.

The image generator can't satisfy all of our Archer Detectors at once, but by trying to jostle the pixels around and maximise its score, it ends up finding its way to an image that scores relatively highly (this is linked to why AI image generators are good at finding images which are simultaneously both novel and not innovative at all at the same time). Anyway, imagine we're halfway through this process and you, the AI, are looking at an image like this:

You know that archers hold bows which have strings on, and you also know they are often wearing fantasy leather armour with lots of belts on. The bit circled in red here is pleasing both aspects of your archer detector right now. But you've got a problem: you need to keep removing noise and making the image more 'final'. Bowstrings and belts look similar in a blurry, noisy image but they don't look the same at all in a finished one. The AI decides to turn it into a belt, and you get what you see in the finished piece - which is a belt that almost perfectly follows the shape of where the bowstring should be. It's not that the bowstring is missing, it's just been turned into something else entirely. I've tried to highlight it here:

This is a really interesting mode of failure to me. It's also vanishingly rare in real photographs, especially candids, where things are unposed and tend to be randomly arranged. In photographs I most often see it in folds of clothing or wires and cables, which often get confused for one another. You can sort of see another example of it here: a phone cable is draped over Biden's arm, seemingly leading to nothing. It's likely that at some point this also looked a bit like a shadow in the folds of his sleeves, or something else that eventually got completed into a cable.

Texture, Lighting and Composition

AI image generators have an interesting tendency to always try and make pretty images. Not accidentally - this is the result of tweaking them towards making 'good' output, because that's often what people want. I've experienced this myself - when I was testing some generators last year to make slides for my New Scientist talk, I tried very hard to create an image of a particular scene without dramatic sunset lighting, but no matter how hard I tried I simply couldn't do it. Many generators are engineered, fine-tuned or otherwise tweaked to juice the input prompt you give them.

As a result, one of the biggest tells I find in low-effort fakes is the textural qualities of the image. The only problem is that it's quite hard to explain or teach someone what to look for - either it looks right to you or it doesn't. For example, look at this second image from Trump's recent AI deepfake set:

This looks more like a digital painting than a photograph. Most portraiture, especially quick stuff done for the news, political campaigns or candids shot by the public, do not come out slick and fuzzy like this. There are no textural flaws on anyone's skin or clothes, everything is smooth, evenly lit, no blemishes, tears or even strange glares. Photos never look like this.

Here's another image, allegedly a portrait photo of two people on their wedding day. Again, the lighting and skin texture is completely off. Even models on a photoshoot do not look like this without extensive photo editing after the fact, which is just simply unlikely. The problem is that not only is this hard to teach someone, it's also not very compelling as evidence. But it's a good thing to train your eye on as a first-round suspicion of sorts.

Sleight of Hands

You'll notice I didn't mention fingers, hands or arms anywhere in this post. That's because while they're a common trope about AI, thinking about fakes only in terms of these specific examples leave us open to being tricked by AI organisations who simply focus their efforts on those specific issues - and they've done just that. Prompters began to shift towards prompts which hid hands, and more recent models are much better at rendering them. Having too many fingers or hands is a snappy thing everyone remembers about AI generators, but it's important that we try and learn why they were tells in the first place. If we start to think in more general terms about what AI struggles with and how to spot it, we can keep looking for new examples, even when the old ones get fixed.

AI technology is always shifting, so even these guidelines won't be appropriate forever. But the more you learn about how AI systems work, what they try to optimise for, and the things they cannot do, you'll be able to reason about things yourself eventually. In the meantime, don't be drawn into making generalisations and assumptions about AI technology, because I guarantee you those same assumptions will be used to trick you before long.

Video may or may not be a new field for us soon. Sora made a big splash when OpenAI announced it recently, but video is an order of magnitude harder than images, and comes with a lot of new complications. We can generate audio, but there aren't systems that do both at once yet, and then there's issues of lip synching and so on. It's definitely possible to compose all of this stuff together right now, but we're not at a point where it's so trivial that the Internet is flooded with it. When that arrives, we'll probably be able to apply some of the ideas we've listed here, but we'll also need new ones too.

Conclusions

You can't ever know if something is really real or not on the Internet any more - but that has, to some extent, been true for a long, long time. It's also very easy to fool most people with things that don't even look particularly real. What we're experiencing now isn't exactly new, it's just a gear shift that we aren't used to. I see very smart people regularly share TikTok content that is clearly staged or fake and believe it's real - the Internet is and always has been full of lies, and that's also where a lot of its charm and playfulness comes from! I'm only saying all this because I don't want you to despair that this is the end of truth - you will adjust, you will learn to spot new things, and you'll also learn to not trust certain sources that you maybe did trust before. Ultimately that might be a good thing, because pre-genAI we probably fell for a lot of lies without even realising, so a bit of a wakeup call might help us in the long run.

In any case, I hope you found this interesting and my examples helpful! They are not clear-cut rules, just some insight into how I think about some of this stuff. Good luck out there and stay safe.

Thanks to Chris and Fed for feedback on an initial draft of this piece.


You must log in to comment.

in reply to @mtrc's post:

The thing I've also started looking out for is kind of adjacent to your example of everything looking too perfect: the hair. A lot of AI generators love to make hair look so perfectly brushed and combed that the hair is almost totally evenly spaced. Look at the wedding photo here specifically: while the AI is good at stray hairs around the edges of the guy's head, his sideburns and beard are almost mechanically separated to be nearly perfect. It's as if 80% of that man's head was hair plugs. The first Biden fake is the same way: lots of stray hairs around the edge highlights, but otherwise almost combed almost supernaturally even.

Very interesting writeup !
On the texture etc. points I am often wondering if an image is AI-made because it looks so clean that it's unrealistic or not because for all I know it could also just be someone overdoing it in photoshop, and there's been a lot of that a decade ago or so. I don't put it above political communication teams to arrange a candidate so much that it looks like a self-conscious teenager's selfie.
It also made me think of a thing that probably has little to do with it, that when I draw pictures, I am liable to make similar mistakes to AI tells sometimes, especially when inking and being inattentive. Continuity mistakes, filling spaces with random shapes that just look right etc.
It's both artificial and kinda human.

This is true! I think photoshop can create a lot of these effects - especially the soft light glow and smooth texturing - but it's unusual for a lot of cases. Political campaign photos for example are usually more raw, unless maybe they're in a big flashy billboard or something. Even then it's quite unusual to gloss it to that level of shine.

You're absolutely right about human error being similar too, by the way! Of course it's hard to sustain that error in a photorealistic painting without realising, which is why it's an easier tell.

What we're experiencing now isn't exactly new, it's just a gear shift that we aren't used to.

This is such a key point IMO. Iโ€™m old enough to remember when Photoshop first took off, and the associated concerns about โ€œyou wonโ€™t know what to believe anymoreโ€. And yet somehow society did not collapse. Weโ€™re not used to e.g. audio deepfakes yet, but we will be, and weโ€™ll continue to need the same rubrics of โ€œwho do you trust? why?โ€ that have always existed.

Yes I mean, we have to be careful not to overgeneralise, but people are being tricked today by simple spam emails or popup ads or whatever, without any deepfake stuff at all. That said, our response to this shouldn't be "Well it'll be fine probably" but instead to think about how we can combine all of these experiences together and improve our odds. How do we stop scams currently? What needs to change or what can we learn from that?

Oh for sure! So many of the "fake photos" that I've seen lately were real photos being presented as something other than they were (like, "here's a photo of the crowd at event <x> in city <y> last week" --> "no actually that is a picture of a crowd attending event <z> in a different city ten years ago").

I think the core is ultimately not "do I trust this image?" in a vaccuum, but "do I trust the person presenting this image to me? What is the source of this image? Can I verify it?", and building that skillset is an important first step. Because it might be AI, but it might also be photoshopped or recontextualized.

fantastic guide! and also genuinely interesting--if only ai wasnt coopted for so many malicious purposes, there is some really fascinating tech under there. and in contrast it teaches us a lot about seeing and the logics of how our brains process image. really cool stuff!!

Yes! Something I've been thinking a lot lately looking at criticism (valid, of course) of a lot of AI experiments is that if the experiment was done in like 2014 by a plucky hacker in their bedroom people would be pretty psyched about it. It's just because it's being done on top of this layer of capitalism that we know threatens us, it's hard to enjoy even the little bits of fun or excitement to be had.

Another analogy I use for people is to think about an artist painting themselves into a corner. Since AIs have no ability to plan ahead, a line going one way suddenly needs some kind of resolution. Why is this like this? Why is the shadow going over here? And then does its absolute best to make it retroactively make sense -- but only within the context of what's near it ("near" being variable with the power of the generator you're using.)

This is just another way of talking about the bow example, the line on the plate, the phone cord to nowhere, etc. The most rushed artist in the world is faced with a splotchy painting where paint can never be removed and has such intense tunnel vision that they can only see a sixteenth of the painting at a time, and they do their best under intense pressure.

This same kind of sort of retroactive sensemaking is how language models also work. I don't think AIs produce human work, but I do think that there are interesting thoughts they can provoke about how we also rationalize our own behaviour, at least. Furiously making a complex set of lies and excuses for how it's very reasonable that we ended up in this situation...

Yes I like this connection of lies, because people I think do understand that the chatbots are lying to them but it's harder to understand for images (because all images are lies in some way or another, I guess). But I think this is a good way of thinking about it. I avoided talking about locality too specifically in the post because the AI can make connections across larger distances, but the semantic strength of the connections is weaker (and it has to be, it's not a weakness of the system per se, if they made it super strong it would just cause other weird artefacts I think).

I mean something we haven't gotten into in these posts also is that the fundamental idea behind image generation is impossible. There's no way to correctly draw an image in response to the prompt "a photo of a wedding". There's too much unsaid! You can have a dialogue about a photo of a wedding, especially if the artist is a human and you understand each other. But just the base idea of these tools is kind of philosophically flawed.

Thanks for reading and commenting! :)

For me, Ears tend to be a pretty big giveaway, provided the photo has them in shot. I suspect they're a bit of a nightmare case for AI generators: ears have lots of detail, but unless there is something odd about them, no-one takes photos of ears so they'll be underrepresented in training data.

To me, a lot of AI generated ears look like props or grafted on - for example, the second Biden photo looks like it's halfway to a Spock ear.

Yes, the ear was actually one of the things I ended up highlighting for the people who sent me that photo! It wasn't the slam dunk tell I expected though - Biden does have ears... not unusual ears I guess but like, I didn't expect them to look as similar to that photo as they do. They're not identical but they're not as far off as I thought. Enough to be a tell though :)

This looks more like a digital painting than a photograph.

This is also something i've struggled to articulate. It's as if everyone is under studio lighting, everything's soft and diffuse, but nothing matches up. There's just perfect highlights and glints, and never any shadows.

There's also vibes I get where the depth of field seems off, but neither feels easy to put into words. That, and these are just as often signs of photoshop as they are midjourney.

I did wonder if you could train AI to detect consistency on things like focus or depth of field - essentially estimating the scene's composition and trying to work out if it's real or not. I notice a lot of the fake photos now use a very narrow depth of field becuase blurry backgrounds hide most mistakes, too.

This is a great write-up - I think you sell yourself short a little! You clearly know what you're talking about, and I learned something about how these models work. Thanks for the information; I think you've framed this in a very useful way.

Ah that's kind of you to say! I prefer to be cautious about my expertise because a) there's a lot of posing and intimidation that goes on with scientists/AI people and it's really toxic and makes people feel bad, and b) I don't want people to think that because I work "in AI" anything I say about AI is gospel, cos that's dangerous too. But I'm glad you think it was useful, truly :)

Another example of the generator not being good at replicating very reglemented detail like text that you can see in the Biden images: The flag! The US flag on his sleeve has only eleven stripes, and the canton with the stars is a complete mess. And since it is missing two stripes at the bottom, it makes the flag have visually a wrong aspect ratio too.

Though I find it interesting that it made the same mistake in both of them.

yeah, i notice a lot in say, fantasy-themed ai art, it loves to put fancy little filigrees and patterns and emblems and flourishes and baubles on things, but when you actually stop, zoom in and go "hey what is this pattern actually" its rorschach blob nonsense. you can see it with the plate pattern here, the shapes imply a sort of fancy floral pattern of some kind but it's really very detailed non-regular scribbles.

i dont really know how reliable a rule that is? bullshitting the fancy details is something that real artists sometimes do lol. but theres also a thought process of, "ok why would a real artist completely bullshit this golden swirly filgree to save time, but then also go to the trouble of rendering detailed highlights off the bullshit filigree"

Definitely seems like a more reliable tell on supposed photos. An artist putting the wrong number of stripes on a flag or whatever? Yeah that happens. But any official material (say, a uniform) that is photographed would not be getting such details wrong.

I guess that's an area where we are lucky that one of the countries most likely to be targeted for this due to its importance also has a fairly complex flag.

So yes I agree with all of this actually. One example I didn't use is that the plates in the source of that AI image (which I'm going to write more about) often aren't round. And this sort of defies most of our expectations about plates but... it is possible they're just arty weird plates. So it's hard to tell sometimes, and you want to rely on the most guaranteed tells possible.

Yeah so I actually learned a fun new thing from the Biden exercise - I initially noticed, oh, the flag is backwards. But after doing some research I learned that's actually how patches are placed on arms in the US, stars forward. So that was a fun example of where I thought I'd found a tell but I simply didn't have the right cultural knowledge. I had absolutely not noticed the missing stripes either (I did know that there are thirteen stripes I guess, but only as like, a buried pub quiz fact rather than instinctual knowledge)

Imo this tech is fascinating and something they should be studying in university labs, not something desperately marketed to industry and the public by capitalist ghouls who may be cultists (have you read some of this AGI believer stuff? Utterly bonkers.)

Yeah I mean, it's genuinely honestly sad to me that the tech has ended up where it is. The generative community, pre-large machine learning models, was joyful and inventive, and a lot of us are still keeping that going. There's a lot of interesting and cool things to be learned from these systems still. But as you say, it's being run by sociopaths and people with god complexes.

People talk about the hands and letters. The obvious problem is that if someone really WANTS you to fall for a fake, and has even a small amount of talent, they can load it up in Photoshop and fix those problems. Still a fake, it's just taken a little bit more effort than just typing a line of text in. But not as much as making the whole thing from scratch.

That said, I've not seen any actual PROPER fakes like this, where a skilled human fixed the obviously broken stuff up afterwards. It's just not worth the time, effort and money in most cases, because even the slightest fact-checking (was Biden wearing uniform? When? And why?) reveals it's a fake anyway.

That's the funny thing. Those Trump "photos". Just take some stock photos and paste Trumps' head in with MSPaint. Just as unconvincing, but in a different way - at least the text is correct! And just as effective at influencing those who don't even think or want to cast a critical eye on them.

Yeah I agree, it's a great point and I think there's a lot to dig into with the psychology of fakes - who makes them, why, what resources do they have and what tradeoffs do they accept. If the CIA want to fake something they aren't gonna use Midjourney. So who is using it, and what things are they not gonna bother doing? The Royal Family photo recently shows that a lot of people just can't be arsed (or don't have the time/money) to check their work in many cases.