attempting to not get into the habit of sharing bad tweets on here, but this one is emblematic of a common, bizarre, and dangerous misapprehension of what "artificial intelligence" is and what it's good for, and I want to write a bit about it:
first and least interesting, under capitalism this is obviously not a way technology would develop. we know for a fact that intuit, the developer of turbotax, leans on the federal government not to provide free tax preparation software and, where they are currently required to offer free service to the poor, engage in as much obstruction as possible to keep from having to fulfill that requirement. not only do they not really have a business interest in making filing your taxes easy (easier taxes don't mean you buy More Tax Preparation), but they have a business interest in making it hard (they would love to figure out some way to robustly prevent you from going to h&r block.)
secondly, and this is the core of this post, this is not a task that AI is uniquely good at, like, at all.
machine learning is, fundamentally, a process by which you teach a computer a mathematical function: you hand it a set of inputs to that function, each tagged with the output the function is supposed to produce under those inputs, and the computer uses a particular computational architecture to create an approximation to that function. a couple examples:
- "AI art" technologies are built using extremely vast databases of pictures, each tagged by a human1 with what they depict. the models are -- generally, and roughly; I'm no professional machine-learning-ologist -- a function from the space of text prompts onto a space of small, square images, and a second function that upscales these small images to bigger ones.
- AlphaGo consists of two parts. the first part, the "policy network", is a function from go positions onto what next move is Correct to make. the bootstrap version of the policy network was trained on every game that was ever played on the KGS go server; the policy network was then trained further by having it play a bunch of games against itself. the second part, the "value network", is a function from go positions onto how good they are for each player; this is trained from the play history of the policy network.2
there's a few properties of a system that make it a promising candidate for using machine learning solutions:
- the underlying pattern of the system is difficult or impossible to embody in a computer program using other methods. humans have been playing go for literally 3,000 years without coming up with a perfect strategy for it; using prior human input about aesthetic judgments and learning by example is arguably the only sensical way to have a machine make aesthetic judgments that humans will find comprehensible.
- training data is available cheaply. you have to have some external anchor for what the correct output is for a given set of inputs. learning the rules by example requires examples -- whether they be examples of humans performing the same task, or a less-powerful computer model of the rules of a system (whether that's a game like go, or a physical system like a self-driving car on a city street.)
- you are willing to accept an approximate solution. systems which are complex enough to meet criterion 1 often have a lot of borderline cases that are difficult or impossible to classify, and generally, the best machine learning techniques can guarantee is that they'll output the model that makes the fewest possible errors under the architectural priors you gave them at the start of training.
- you are willing to accept an unauditable solution. the inputs and outputs of a machine learning model are legible to humans, and its developers know the broad strokes of its architecture; however, the actual meat of the model is a bunch of empirically-derived numbers that the computer chose to minimize classification error.
tax law fits none of these criteria. the rules of the system are explicitly made and maintained by humans, known and written down in full, and published in an authoritative form by the Government Printing Office. an approximate system making an error is likely to incur a monetary penalty, or someone being sent to prison; if the system fails one time for every million tax returns, that amounts to 113 errors every year. after that, the best you can do to fix it is to give the next training pass another example to learn from, and there's no guarantee it'll actually do any better afterwards -- it might inexplicably screw up another taxpayer's return next year, or even the same one. actual taxpayer records are held under strict secrecy by the Internal Revenue Service, so a large set of training data is hard to come by for anyone other than the federal government or tax preparation software incumbents -- and the federal government already has a program by which they certify independent professionals as experts on the tax system worthy of representing taxpayers.
it seems like a lot of people, even people who should know better or who have concerns about the current state of the tech industry, have made it to 2022 with a conception of AI that only amounts to "it's when a computer is smart! even smarter than a human!" and that's straight-up just a dangerous thing to believe.
-
the ethics of AI datasets are fraught in their own way. by way of explanation, generally the process of gathering a dataset goes like this:
- a research lab at a university scrapes a bunch of untagged data, of questionable quality, and under the belief that nobody is going to sue a lil ol research lab over IP rights, off the web;
- under the auspices of a research project, this dataset is labeled by people on a piecework service like mechanical turk making chump change, since a lil ol research lab can't afford to pay data workers properly;
- the research lab is eventually spun out into a non-profit that pays everyone in the lab a healthy wage by selling the dataset to tech companies at a price that renders it impractical for use by anyone who's not using it to make money.
-
the successor work, AlphaGo Zero, basically proved that you don't need to have a bunch of hand-prepared go game transcripts to start with, as long as you have a machine that tells you when moves are illegal.