Daeren

Autistic, Librarian, Writer

  • He/him

Shambling pile of learned responses and special interests in a trench coat. Wrote and done some shit you might've heard of before. I was there when the deep magic was written and honestly I wouldn't recommend the experience.


mogwai-poet
@mogwai-poet

For millennia we relied on human curation to organize our knowledge. In the late 1900s it became feasible to collate a meaningful percentage of human knowledge automatically. This was very exciting. Were the results as good? Maybe not, but they were often good enough, and it was fast, and it scaled, and you could do it at home.

For the last decade, the search engine user experience has gotten progressively worse. This is partly due to search engines optimizing for profitable search results rather than helpful ones, but I would posit that the bigger cause is that the majority of new, useful information on the internet is being created behind closed doors, not on the searchable web.

More and more, the information that is available to search engines is created for search engines to find -- companies paying writers pennies to churn out multi-thousand word essays that will rank high in google results, but not paying them enough to do the research that would make the information accurate and useful. This phenomenon is getting worse fast thanks to generative language models like ChatGPT (which aren't even capable of producing accurate information except by accident) and I expect that soon the vast majority of text on the internet will be created by bots like that, writing more and more convincing essays full of useless information.

Maybe search engines will figure out how to discard this deluge, but my sense is that it's an arms race that they're going to inevitably lose. For a long time machine curation felt like the future, but now I think the window of pure machine curation is closing. To the extent that search will still be useful, it'll be useful for searching human-whitelisted content.

This is good news for human experts like editors and librarians, who have been treated as obsolete by the tech community for decades. Remember librarians? They're still around, and their work is more valuable than ever.


You must log in to comment.

in reply to @mogwai-poet's post:

ChatGPT seems to like a particularly good example supporting this—that it gives something resembling accurate information at all is only because there are a bunch of people heavily curating its dataset and input processing to do so. Take that curation away, which is probably inevitable as the technology becomes more widespread and the incentive to impress VCs and get good press gives way to the incentive to maximize profits, and I doubt it'd be much better than Google has become

I wonder if the search engine could become so obsolete, so overtaken by useless SEO content, that some bubble bursts and the only incentive to keep churning out this word salad garbage, getting eyes on ads, dissappears. I like the optimism of a return to human curation. I would just also like to see the ppl who are gleefully ruining the usability of search engines lose billions of dollars and the value of tech like chatGPT plummet.

This is my hope. At some point in the cycle of machines selling ads on machine-created content for promotion by a third machine, a human has to click on an ad and make a purchase. People are clearly willing to fill the Internet with junk for a low probability of that happening, but as it approaches zero, you run into the fact that web hosting doesn't cost zero. It's not financially feasible for the Internet to become completely useless.

then again I know some people who will click on some real garbage ads, so the pessimistic possibility is "actually, lots of people don't care if their content is accurate or good, and they will continue purchasing galaxy lamps forever, while demanding human writers will be seen as some fussy nerd shit"

The odd thing is even if the new neural networks do push the Internet's SEO-spam problem bast the critical point, I think this is not the thing that kills Google. At this moment I think the thing that kills Google is that their desperate attempts to steal the ChatGPT market by shoving a ChatGPT clone into Google Search kills Google Search by annoying everyone into switching to Bing, and this will happen too quickly for the coming SEOpocalypse to get a chance

a lot of generative AI models, including GPT, are built using a concept where the generator tries to fool a discriminator, that seeks to distinguish the source distribution from the generator's output, at which point the two systems grow against eachother in an arms' race.

better tools to distinguish AI sludge from genuine human output can get plugged back into the discriminator to "improve" the output of the generator by getting fooled. the process of bypassing anti-spam itself becomes basically automated

i think you're right to be concerned about search engines' ability to keep up in that arms race

There's a corollary to Goodhart's law here, that's been going on steadily for years: when search ranking is the target towards which information is created, search ranking ceases to be a useful measure of quality.

(Let's point the finger: search ranking is the target towards which information is created by capital seeking returns.)