great research with some cool methodology that confirms what everyone should've already suspected: if you ask someone to generate text on amazon mechanical turk, there's an almost 50% chance that they're just going to ask a large language model to do it
haven't really read much of the machine learning literature so I don't know how widespread this is any more, but also I suspect we're going to get a lot of follow-ups where people discover that their research using mturk as a proxy for human performance can't be replicated if you actually pay people to do the task
we've been deeply concerned about the way in which generative language models destroy a lot of the data sets, such as wikipedia and web scrapes more broadly, that everyone uses as ground truth for all manner of research, which is going to make research a lot more general
this paper identifies an aspect of that problem we hadn't even thought of, namely that even studies which think they're paying people to do things are often getting ML output instead
Pre-2022 text datasets are now the information equivalent of low-background steel.
