mtrc
@mtrc

Recently I've been approached by different news organisations to comment on deepfaked images and videos. In most cases they already know whether the thing is a fake or not, but they want to know why. It's been a pretty fascinating thing to be tasked with, honestly, and some of the examples completely caught me by surprise (warning: Daily Mail link). Many of us see faked images on a daily basis now, but there's not a lot of writing about how fakes are detected other than one-liner folk knowledge on places like Twitter. I thought I'd write a bit about how I approach the problem.



You must log in to comment.

in reply to @mtrc's post:

The thing I've also started looking out for is kind of adjacent to your example of everything looking too perfect: the hair. A lot of AI generators love to make hair look so perfectly brushed and combed that the hair is almost totally evenly spaced. Look at the wedding photo here specifically: while the AI is good at stray hairs around the edges of the guy's head, his sideburns and beard are almost mechanically separated to be nearly perfect. It's as if 80% of that man's head was hair plugs. The first Biden fake is the same way: lots of stray hairs around the edge highlights, but otherwise almost combed almost supernaturally even.

Very interesting writeup !
On the texture etc. points I am often wondering if an image is AI-made because it looks so clean that it's unrealistic or not because for all I know it could also just be someone overdoing it in photoshop, and there's been a lot of that a decade ago or so. I don't put it above political communication teams to arrange a candidate so much that it looks like a self-conscious teenager's selfie.
It also made me think of a thing that probably has little to do with it, that when I draw pictures, I am liable to make similar mistakes to AI tells sometimes, especially when inking and being inattentive. Continuity mistakes, filling spaces with random shapes that just look right etc.
It's both artificial and kinda human.

This is true! I think photoshop can create a lot of these effects - especially the soft light glow and smooth texturing - but it's unusual for a lot of cases. Political campaign photos for example are usually more raw, unless maybe they're in a big flashy billboard or something. Even then it's quite unusual to gloss it to that level of shine.

You're absolutely right about human error being similar too, by the way! Of course it's hard to sustain that error in a photorealistic painting without realising, which is why it's an easier tell.

What we're experiencing now isn't exactly new, it's just a gear shift that we aren't used to.

This is such a key point IMO. I’m old enough to remember when Photoshop first took off, and the associated concerns about “you won’t know what to believe anymore”. And yet somehow society did not collapse. We’re not used to e.g. audio deepfakes yet, but we will be, and we’ll continue to need the same rubrics of “who do you trust? why?” that have always existed.

Yes I mean, we have to be careful not to overgeneralise, but people are being tricked today by simple spam emails or popup ads or whatever, without any deepfake stuff at all. That said, our response to this shouldn't be "Well it'll be fine probably" but instead to think about how we can combine all of these experiences together and improve our odds. How do we stop scams currently? What needs to change or what can we learn from that?

Oh for sure! So many of the "fake photos" that I've seen lately were real photos being presented as something other than they were (like, "here's a photo of the crowd at event <x> in city <y> last week" --> "no actually that is a picture of a crowd attending event <z> in a different city ten years ago").

I think the core is ultimately not "do I trust this image?" in a vaccuum, but "do I trust the person presenting this image to me? What is the source of this image? Can I verify it?", and building that skillset is an important first step. Because it might be AI, but it might also be photoshopped or recontextualized.

fantastic guide! and also genuinely interesting--if only ai wasnt coopted for so many malicious purposes, there is some really fascinating tech under there. and in contrast it teaches us a lot about seeing and the logics of how our brains process image. really cool stuff!!

Yes! Something I've been thinking a lot lately looking at criticism (valid, of course) of a lot of AI experiments is that if the experiment was done in like 2014 by a plucky hacker in their bedroom people would be pretty psyched about it. It's just because it's being done on top of this layer of capitalism that we know threatens us, it's hard to enjoy even the little bits of fun or excitement to be had.

Another analogy I use for people is to think about an artist painting themselves into a corner. Since AIs have no ability to plan ahead, a line going one way suddenly needs some kind of resolution. Why is this like this? Why is the shadow going over here? And then does its absolute best to make it retroactively make sense -- but only within the context of what's near it ("near" being variable with the power of the generator you're using.)

This is just another way of talking about the bow example, the line on the plate, the phone cord to nowhere, etc. The most rushed artist in the world is faced with a splotchy painting where paint can never be removed and has such intense tunnel vision that they can only see a sixteenth of the painting at a time, and they do their best under intense pressure.

This same kind of sort of retroactive sensemaking is how language models also work. I don't think AIs produce human work, but I do think that there are interesting thoughts they can provoke about how we also rationalize our own behaviour, at least. Furiously making a complex set of lies and excuses for how it's very reasonable that we ended up in this situation...

Yes I like this connection of lies, because people I think do understand that the chatbots are lying to them but it's harder to understand for images (because all images are lies in some way or another, I guess). But I think this is a good way of thinking about it. I avoided talking about locality too specifically in the post because the AI can make connections across larger distances, but the semantic strength of the connections is weaker (and it has to be, it's not a weakness of the system per se, if they made it super strong it would just cause other weird artefacts I think).

I mean something we haven't gotten into in these posts also is that the fundamental idea behind image generation is impossible. There's no way to correctly draw an image in response to the prompt "a photo of a wedding". There's too much unsaid! You can have a dialogue about a photo of a wedding, especially if the artist is a human and you understand each other. But just the base idea of these tools is kind of philosophically flawed.

Thanks for reading and commenting! :)

For me, Ears tend to be a pretty big giveaway, provided the photo has them in shot. I suspect they're a bit of a nightmare case for AI generators: ears have lots of detail, but unless there is something odd about them, no-one takes photos of ears so they'll be underrepresented in training data.

To me, a lot of AI generated ears look like props or grafted on - for example, the second Biden photo looks like it's halfway to a Spock ear.

Yes, the ear was actually one of the things I ended up highlighting for the people who sent me that photo! It wasn't the slam dunk tell I expected though - Biden does have ears... not unusual ears I guess but like, I didn't expect them to look as similar to that photo as they do. They're not identical but they're not as far off as I thought. Enough to be a tell though :)

This looks more like a digital painting than a photograph.

This is also something i've struggled to articulate. It's as if everyone is under studio lighting, everything's soft and diffuse, but nothing matches up. There's just perfect highlights and glints, and never any shadows.

There's also vibes I get where the depth of field seems off, but neither feels easy to put into words. That, and these are just as often signs of photoshop as they are midjourney.

I did wonder if you could train AI to detect consistency on things like focus or depth of field - essentially estimating the scene's composition and trying to work out if it's real or not. I notice a lot of the fake photos now use a very narrow depth of field becuase blurry backgrounds hide most mistakes, too.

This is a great write-up - I think you sell yourself short a little! You clearly know what you're talking about, and I learned something about how these models work. Thanks for the information; I think you've framed this in a very useful way.

Ah that's kind of you to say! I prefer to be cautious about my expertise because a) there's a lot of posing and intimidation that goes on with scientists/AI people and it's really toxic and makes people feel bad, and b) I don't want people to think that because I work "in AI" anything I say about AI is gospel, cos that's dangerous too. But I'm glad you think it was useful, truly :)

Another example of the generator not being good at replicating very reglemented detail like text that you can see in the Biden images: The flag! The US flag on his sleeve has only eleven stripes, and the canton with the stars is a complete mess. And since it is missing two stripes at the bottom, it makes the flag have visually a wrong aspect ratio too.

Though I find it interesting that it made the same mistake in both of them.

yeah, i notice a lot in say, fantasy-themed ai art, it loves to put fancy little filigrees and patterns and emblems and flourishes and baubles on things, but when you actually stop, zoom in and go "hey what is this pattern actually" its rorschach blob nonsense. you can see it with the plate pattern here, the shapes imply a sort of fancy floral pattern of some kind but it's really very detailed non-regular scribbles.

i dont really know how reliable a rule that is? bullshitting the fancy details is something that real artists sometimes do lol. but theres also a thought process of, "ok why would a real artist completely bullshit this golden swirly filgree to save time, but then also go to the trouble of rendering detailed highlights off the bullshit filigree"

Definitely seems like a more reliable tell on supposed photos. An artist putting the wrong number of stripes on a flag or whatever? Yeah that happens. But any official material (say, a uniform) that is photographed would not be getting such details wrong.

I guess that's an area where we are lucky that one of the countries most likely to be targeted for this due to its importance also has a fairly complex flag.

So yes I agree with all of this actually. One example I didn't use is that the plates in the source of that AI image (which I'm going to write more about) often aren't round. And this sort of defies most of our expectations about plates but... it is possible they're just arty weird plates. So it's hard to tell sometimes, and you want to rely on the most guaranteed tells possible.

Yeah so I actually learned a fun new thing from the Biden exercise - I initially noticed, oh, the flag is backwards. But after doing some research I learned that's actually how patches are placed on arms in the US, stars forward. So that was a fun example of where I thought I'd found a tell but I simply didn't have the right cultural knowledge. I had absolutely not noticed the missing stripes either (I did know that there are thirteen stripes I guess, but only as like, a buried pub quiz fact rather than instinctual knowledge)

Imo this tech is fascinating and something they should be studying in university labs, not something desperately marketed to industry and the public by capitalist ghouls who may be cultists (have you read some of this AGI believer stuff? Utterly bonkers.)

Yeah I mean, it's genuinely honestly sad to me that the tech has ended up where it is. The generative community, pre-large machine learning models, was joyful and inventive, and a lot of us are still keeping that going. There's a lot of interesting and cool things to be learned from these systems still. But as you say, it's being run by sociopaths and people with god complexes.

People talk about the hands and letters. The obvious problem is that if someone really WANTS you to fall for a fake, and has even a small amount of talent, they can load it up in Photoshop and fix those problems. Still a fake, it's just taken a little bit more effort than just typing a line of text in. But not as much as making the whole thing from scratch.

That said, I've not seen any actual PROPER fakes like this, where a skilled human fixed the obviously broken stuff up afterwards. It's just not worth the time, effort and money in most cases, because even the slightest fact-checking (was Biden wearing uniform? When? And why?) reveals it's a fake anyway.

That's the funny thing. Those Trump "photos". Just take some stock photos and paste Trumps' head in with MSPaint. Just as unconvincing, but in a different way - at least the text is correct! And just as effective at influencing those who don't even think or want to cast a critical eye on them.

Yeah I agree, it's a great point and I think there's a lot to dig into with the psychology of fakes - who makes them, why, what resources do they have and what tradeoffs do they accept. If the CIA want to fake something they aren't gonna use Midjourney. So who is using it, and what things are they not gonna bother doing? The Royal Family photo recently shows that a lot of people just can't be arsed (or don't have the time/money) to check their work in many cases.

in reply to @zaratustra's post:

this, though, seriously; the models might be better at making hands but they still can't do text

as evidenced by that horrible AI-generated Willy Wonka event, even if the models are sharp enough to get the letterforms right instead of mutating them into eye-gouging horrors, they still can't reliably generate anything other than gibberish