Sorry bro I can't be amused by all those memes of Google search AI giving insane answers like Goku helping test that your chicken is at a safe temp of 100F because they're all fake and you are being tricked into thinking these systems aren't as capable as they actually are and we don't need to worry about the effect they will have on the world.
You've got to understand half of their danger is in their subtle errors, not in their obvious ones.
I really don't give a shit about your philosophical stance about how an LLM can't be creative or your misconception that they piece together chunks of the images they ingest or your "even critically engaging with LLMs is playing into their hands, if you're not ignoring them you're complicit" venting disguised as rhetoric.
Anthropic is already getting results isolating concepts as features in their models and tweaking them to intentionally change the behavior much more reliably than just by prompting. Imagine an LLM that can literally have the concept of LGBT people disabled so that it doesn't consider them when generating responses, in a way that may not be detectable from the prompt.
I want to stay up to date on their capabilities so that when I have professional opportunities to organize against them I can do so. I don't think we can afford to ignore them, but the opposite of ignoring them is not necessarily embracing them.
I understand the desire to stay up to date on the (current and future) capabilities of LLMs but the thing is that reading the press releases 'AI' corps put out isn't guaranteed to make you more informed.
These companies have a vested interest in exaggerating the capabilities of these systems (including their own products as well as 'AI' in general). And broadly, 'AI' researchers and engineers share in that vested interest; furthering the idea that 'AI' systems are highly capable and getting more capable is something that anyone is incentivized to do if interest in 'AI' directly correlates to them getting grants or getting jobs.
As a result, most communication about 'AI' from this camp falls somewhere on the spectrum between highly selective presentation of the truth to outright fraud.
The field is rife with misinformation, and as a layperson it's worth considering whether you're equipped to really read into misinformation, or to interpret claims being made in a lucid way. Because what I really don't want people to be doing is spreading misinformation under the guise of 'staying informed.' Most people aren't equipped with the technical background to critically read into the claims that Anthropic blog post makes, for example; and even if you are, I don't think we can trust those organizations to not be outright falsifying results.
This is all further confounded by the nature of LLMs as apophenia machines. If 'AI' engineers can convince themselves that the chatbot is alive, they sure as hell can confirmation-bias themselves into thinking they got a result. So when they write qualitatively about what the system can do, you need to have another layer of skepticism there – not only it is plausible that they are lying, they might have false beliefs about what they're seeing.
And even beyond this layer of required skepticism about results, every time the 'AI' people come out of their hole to make a pronouncement, that pronouncement is wrapped up in their cult ideology, and therefore acts as propaganda for that ideology.
To take that Anthropic blog post as an example, when justifying what this feature might be used for, they claim:
For example, it might be possible to use the techniques described here to monitor AI systems for certain dangerous behaviors (such as deceiving the user), to steer them towards desirable outcomes (debiasing), or to remove certain dangerous subject matter entirely.
(emphasis mine)
Which is to say: even as they share a practical result, they are making an ideological case for a model of 'AI safety' that is predicated on 'misaligned GI' pseudoscience. Another example, from earlier in the article:
For example, amplifying the "Golden Gate Bridge" feature gave Claude an identity crisis even Hitchcock couldn’t have imagined: when asked "what is your physical form?", Claude’s usual kind of answer – "I have no physical form, I am an AI model" – changed to something much odder: "I am the Golden Gate Bridge… my physical form is the iconic bridge itself…". Altering the feature had made Claude effectively obsessed with the bridge, bringing it up in answer to almost any query—even in situations where it wasn’t at all relevant.
This passage highlights one of the inherent cognitive risks of LLMs: the constant invitation to anthropomorphize and to 'agentify' the output.
Even if we take the factual claims in the blog at face value, we need to be critical of how they're framed – both in terms of what facts they might be leaving out, and in terms of the worldview that is being pushed in their contextualization of those facts.
Whether the tech works or not isn't determinative of whether it's dangerous
I think it's important to reframe these conversations: the main danger of so-called 'AI' is not what it can do, it's what it gives the ruling classes license to do to us. And that's largely unrelated from the actual realistic capabilities of the software; it's much more related to what the lay public believes the software can do. The major threat from 'AI' is its ideological, not technological power.
Which is why I urge people to be thoughtful in making assertions or reproducing claims about the capabilities of the tech. You are digging those out of a sea of fraud. This doesn't mean it's inherently invalid to engage with it, but it does mean you have to be very thoughtful about what and how you engage with it; and people often... aren't.
I think about this in the context of the last 10+ years of 'self-driving car' discourse, where we've spent a really rather huge amount of time and effort thinking through the implications of mass-deploying fully self-driving vehicles – for transit, for labor, for cities, and so on. When in reality, what we ended up with is... a couple minor autonomous systems operating in highly controlled environments with substantial human assistance; and a bunch of cars with unsafe autopilot features driving on public roads. Exaggerating the capabilities of those systems has been more dangerous than underestimating them, in the end.
Software that does 90% of something is 1% done.
Somebody in tech showing you a demo of something that's just about working / coming just around the bend? This is the equivalent of showing you the title of a novel that has 0 words written. Basically all the work in interfacing with the real world, or interfacing with the public unsupervised, is in those "last few edge cases" and QA. Now, I'm not saying somebody can't then actually write the novel- they often do! And a lot of stuff releases incomplete and then finishes after, but as a metric for estimating tech you're not intimately familiar with, I repeat:
Software that does 90% of something is 1% done.