You can't read a blog or news source nowadays without someone throwing "AI" in your face, often in articles about how it's the biggest economic driver in recent memory while also saying it'll take yours and everyone else's job. Understandably, this leads to existential dread from most humans, given that we are forced to exist in a society where work is a prerequisite for survival for all but the obscenely rich. Don't worry though, Op-Ed columnists are here to tell you which jobs will survive while bankers remind you that only the rich will benefit from this new automation.
Have I set off your anxiety yet?
I suppose I should lead with the bad news first, in that every single article I linked above is absolutely correct. Yes, the current wave of "AI" products are driving immense growth in the stock market, particularly for technology companies. Yes, AI will eventually take away your job, whether you're a lowly Associate at a law firm, a prestigious Head of Surgery at a large Hospital, or a billionaire CEO of a Fortune 50 company. Yes, as it presently stands these tools will only benefit the richest of the rich, given the immense processing power and resource consumption required for them to function.
The good news? What we have now isn't AI, and isn't even remotely close to replacing humans outright in any capacity. If anything, it's a very useful warning shot to help us prepare for the inevitability of a post-AI society, hopefully waking up the populace and politicians alike to the very real threat that widespread automation poses to our entire civilization. It's giving us time to prepare - an asset we absolutely cannot afford to squander given the stakes at play.
Root Cause Analysis for Fun (and dismantling existential dread)
To understand why this is a warning and not an imminent threat, we need to pick apart this complex, over-hyped topic into its core components. This will get technical, though I'll try to use common analogies or metaphors when possible so as to appeal to a wider audience. This process is akin to a root cause analysis in Engineering terms, where we try to figure out what cause the problem, why our safeguards didn't stop the problem, and how we can prevent the problem from reoccurring again in the future. For the sake of this piece, we need to answer the following questions:
- What is AI?
- Why don't these tools meet the definition of AI?
- How could these tools or their resources be better utilized?
As we answer these one a time, the larger picture will become clearer and easier to understand - and rather terrifying, though for different reasons than you might think. So without further ado, let's dive headfirst into the single biggest question of all: What is AI?
Artificial Intelligence: The Machine Mind
We - myself the writer, you the reader, that idiot who annoyed you earlier today - are humans. We can learn new skills, we can make new tools, we can think new thoughts, we can choose whether or not to act upon stimuli. We are, by and large, intelligent animals. Try thinking back to the last time you taught someone a new piece of information, and the expression on their face or change in their mood at learning a new thing. That is intelligence, or at least an expression of it we're (mostly) all familiar with. It's not perfect, and it's often fallible, but it's what has enabled us to build everything around us today.
Put more succinctly by Wikipedia, Intelligence is "...the ability to perceive or infer information; and to retain it as knowledge to be applied to adaptive behaviors within an environment or context." In my own words, Intelligence is three things: the ability to observe, the ability to learn, and the ability to adapt.
Before we even taught rocks to think, there was theorized that someday, in the far future, machines could be as intelligent (or often more) than their human creators. This was eventually dubbed "Artificial Intelligence", to reflect the fact that despite it (the machine) was intelligent, it was created through artificial (man-made) means. Over time (and thanks to marketing departments diluting the term by applying it to every conceivable automation ever created with computers), this was termed "Artificial General Intelligence," or AGI for short. There's various flavors we're not going to get into here, but there's ample reading on Wikipedia alone for the curious reader.
Before we move on, there's one more term we need to define as it's critical to understand subsequent sections: intentionality. To grossly oversimplify, intentionality is similar to "understanding", and acting upon that understanding; it's how we, as humans, form belief systems and derive purpose. Choosing to become a Doctor in order to help the sick is an example of intentionality: you understand that people are sick, Doctors treat the sick, and that as a human who desires to help others, becoming a Doctor would let you provide effective care for the sick and thus contribute positively back to your community or society. On the opposite side of this is the Chinese Room argument:
You're placed in a room with infinite supplies to write and organize as needed. One at a time, you are given a piece of paper with a Chinese character on it, although you do not know it is Chinese. Your task is to write the correct character that comes after it. Your only feedback is a simple red light to show an incorrect answer, and a green light to show a correct answer. Eventually, after decades of these problems, you're able to correctly predict the next character over 99% of the time. The question is, do you understand Chinese? Obviously you do not, as you were never given context of what the characters meant - only enough data to correctly guess which character would come next a majority of the time. This is also an example of intentionality, or rather the lack of it.
In summary:
- Intelligence consists of three abilities: observation, learning, and adaptability
- "AI" is generally assumed to mean "AGI", or Artificial General Intelligence
- A substantial part of human intelligence is intentionality
Chatbots Aren't Intelligent
You'd be forgiven for thinking that current "AI" products sold by the likes of Microsoft and Google - "GPTs" - are some brand new revolution in computing or machine learning, but GPTs (which stands for Generative Pre-Trained Transformers) actually have their roots date back to the early 1990s. This is not new technology, though the modern GPT was only theorized by Google researchers in 2017, and OpenAI's first GPT model debuted in June 2018, so this iteration is fairly recent. Their ability to process plain language prompts and output comprehensible answers to even complex questions has made them the hot product of the current marketplace, and the latest bubble to prop up nVidia's stock after the busts of crypto and blockchain.
So what is a GPT? To grossly oversimplify (noticing a trend yet?), GPTs are incredibly complex flowcharts that transform your input into a desired output via a series of adaptable and adjustable steps. To re-frame that a bit more technically: GPTs take input in the form of tokens, and produce what is the probable output token given the context available to it.
Think of it this way: if I write that A=1, and B=2, and C=3, and then ask you to tell me what D=, you're likely going to say "4". GPTs work in a similar way, by breaking up the prompt into tokens (A=,1,B=,2,C=,3,D=) and, based on its training data, determines that "4" is the most probable answer to be accepted. In essence, the GPT is observing (via your prompt), processing it against what it learned during training, and then adapting its response based on its observation.
Observation, learning, adaptation. Wait a second, isn't that what I just defined intelligence as? Now you're starting to figure out how marketing and these technology figureheads keep selling you on the idea that this is "AI", or could even lead to "AGI" with enough data.
Let's look at another example: you are faced with a stove that someone has warned you is hot. You, being an older human with a wealth of experience, know that you can test this statement by holding your hand near - but not on - the stove top. You learn that the stove top is hot by observing the heat radiating against your hand, and adapt by pulling away. Now let's assume you're actually a child with no prior concept of a scalding hot stove top: you still learn that the stove top is hot by observing the pain in your hand from touching it directly, and adapt by pulling away...and then learning that yes, stove-tops can indeed get hot, even though you had no prior context or information on this area of knowledge.
This is where GPTs fall apart as "AI": their "learning" ability is limited exclusively to their training phases and data sets. If the GPT isn't "taught" that stoves get hot, then it has no idea that stoves can get hot, and has no way of learning that data short of more training. GPTs can learn, or they can observe and adapt, but cannot engage in all three at the same time or in the same state, and definitely not in real time. This is why, were you to fire up Chat-GPT and ask it about an event that happened today, it would fail to answer you appropriately.
Some Context About Context (Windows)
A workaround to this issue - and several other shortcomings with GPTs - is what's known as a "context window". Think of it like a scratchpad or notebook with a fixed amount of pages, and automatically discards the first full page once it's full in order to create a new, empty page. This is a very handy trick for making these useful to consumers, as it allows humans to engage with it more naturally - like in a conversation. Just as we're more likely to bring in related topics, knowledge, information, and skills when discussing something specific, these context windows allow the GPT to stay on-topic over a period of time and retain relevant information...although not indefinitely, and not in a way that it "learns" from it.
If I were to put my finger on what's going to keep these tools relevant, it's less about the possible parameters (which I promise we'll get to in a moment), and more about their ability to maintain larger and more useful context windows in an efficient footprint (i.e., without gobbling up more RAM than your Chrome browser currently is).
Training isn't Learning
Alright, back on topic: how do GPTs "learn", anyway? Well, the simple answer is that they don't. Instead, GPTs are trained with datasets - increasingly huge datasets - and constantly tested to ensure the resultant model is generating appropriate responses. I'm not going to reinvent the wheel here, so instead I'm just going to link you to this excellent CGP Grey video about machine learning in general and this shorter but important follow-up video about it. Trust me, it's worth your time.
How big is this training data, anyway? Well, the GPT Wikipedia article has some handy info to cite that says OpenAI's GPT-3 - the predecessor to GPT-3.5, which you can play with for free on their site - was trained on some "499 billion tokens" across "175 billion" parameters. In plain English, that's over 570GB of data from the internet (known as CommonCrawl), more text from OpenAI's own web crawlers, the entirety of English Wikipedia, and two collections of books (appropriately titled Books1 and Books2), spread across 175 billion _parameters (think dials, if you watched the CGP Grey videos I suggested, or tags/categories if you didn't; both are oversimplifications, but helpful metaphors).
If that sounds like a lot of data, that's because it is. Remember, GPTs are essentially predicting what the appropriate output is based on the input it received, and relying on its context window to try and stay on topic. Even more important is that it's being trained, not learning. Let's go back to the earlier logic problem (A=1, B=2, etc): a human who has learned enough about logic and reasoning could be expected to extrapolate the remainder of that sequence, all the way to Z=26; however, a GPT trained without that explicit data is increasingly likely to generate an erroneous answer the further in the sequence it goes. Over the twenty-six letters of the English common alphabet, that's not a big deal, but in far more complex queries or answers it's highly likely to screw up in a way that a human might not - or at the very least, where the human has learned through experience to say "I don't know." This is why training is often divided into two phases: a general phase, where it's automatically trained on a huge data set, and a fine-tuning phase where humans are more involved in massaging the output to fit the intended use case.
Yup, humans are still involved in getting these things to work correctly. GPTs are not intelligent because they cannot learn. It's the Chinese Room problem all over again: just because these models can predict what the response is with a reasonable degree of accuracy does not mean they have intelligence, no matter how large their context window or datasets may be.
The (Insane) Cost of Training
No doubt some of you are now wondering how many resources it takes to train one of these things, and likely already did some searching or reading and found numbers like "3.1e23 FLOP" for GPT-3 bandied about without context.
Allow me to provide said context.
There's a super handy list of global Supercomputers called the TOP500, which details all kinds of specs and measurements for the technically curious. At the (current) top of that list sits the Frontier Supercomputer at Oak Ridge National Laboratory in Tennessee. This supercomputer, at its peak capacity, can crunch roughly 1.6 exaFLOPS. It is the fastest public supercomputer on the planet, and is primarily used to simulate nuclear physics, materials sciences, and neutron and medical research (or things that go boom, build stuff, and fix people if we're grossly oversimplifying); y'know, stuff generally considered (mostly) good for (most of) humanity.
GPT-3's training cost is quoted at 3,640 petaflop/s-day, which means it needed 3,640 petaFLOPS of compute capacity for one full day to train GPT-3's model. Converted, 3,640 petaFLOPS is equivalent to 3.6 exaFLOPS.
GPT-3 would've needed to use the entirety of the world's fastest supercomputer for roughly two and a half days at peak capacity to train GPT-3 one time.
To really drive home the impracticality of this exercise, GPT-4 is rumored to cost 2.1e25 FLOP to train - or 21 exaFLOPs, which would consume the entirety of Frontier for two weeks.
To summarize this section:
- GPT-based models don't learn, but are trained, and therefore do not exhibit intelligence under our earlier definition
- GPT training sets have finite cutoff dates, meaning they cannot learn from new data given to them in real time - again, proving they do not exhibit intelligence
- GPT models are only predicting the most probable next token based on its training and associated context window - similar to a Stochastic Parrot, but not quite - and cannot pass the Chinese Room argument because of their fundamental workings
What Are They Good For?
In the essence of fairness, these GPTs are tools - and all tools have a use. These can make great support assistants, for example, by quickly providing answers to technical questions that may otherwise require extensive consultation of documentation. They're incredibly handy as developer aides in that regard, although they cannot replace the human at the task. Ultimately, that's what these tools are good for: supporting existing professional humans and amplifying their productivity, rather than replacing them. That's already obvious, though, given how quickly they're proliferated into work environments and consumer devices.
The real question we need to ask is whether their immense resource cost is worth the productivity gains of the tools in the first place. Given the existential problems facing humanity at present, particularly with regards to Climate Change, I'm of the personal opinion that we should be dedicating this new supercomputer capacity toward solving these sorts of material science or healthcare issues first, before we train fancier chatbots.
The Signal Flare Argument
I would also propose that these chatbots are even more useful in a way I suggested back at the start of the piece: as a signal flare that we're rapidly approaching a dangerous, irreversible point in history where the bulk of all labor can and will be automated by machines, in a society that is not ready for that transition. These chatbots will not replace workers, and likely can never displace or replace humans in any meaningful capacity. That said, Capitalists and the ruling class are already trying to replace human workers with automations, even if it's technically impossible with current technologies to fully supplant or replace the human worker. Anyone who has watched The Expanse, or Black Mirror, or read any amount of Science Fiction knows full well that if we fail to provide humans with the ability to engage in meaningful labor, then what we're left with is not a Utopia so much as a dystopia, a horror where only those who own the servers that run the automations or AI have the money and power to live a prosperous life, and where all the rest of us are forcibly distilled down to cogs in a machine we can never hope to change. As strange and puritan as it sounds, human beings need some sort of meaningful outlet for energy, just as every animal and plant requires; it doesn't have to be manual or creative labor, but it must be something that consumes energy and requires effort, in order for us to continue learning, growing, and thriving as individuals.
That's the real message to take away from all this "AI" nonsense before the bubble pops: we need to fundamentally reshape everything about our current society to prepare for a post-AI future. We need to build a world today that does not rely on work to survive or money to thrive. We need to quash wealth inequality, shareholder primacy, and even Capitalism itself before this inevitable post-work AI future envelops us all. Whatever state our society is in when we finally create AI is likely to be its final state until we have sufficiently controlled every body within our solar system and reached beyond its boundaries.
CGP Grey describes the reality perfectly in his "Humans Need Not Apply" video: it is not a matter of if, but when, human labor is entirely replaced by automations, because automations don't have to be perfect - they just have to be slightly better than humans at the same or less cost.
GPTs are not the automation that will replace us all, but it's very likely to be the last warning we get before we inevitably are.
