cass

assigned catgirl at birth

white, early twenties, disabled, lesbian, plural. a cat that just so happens to be a person. sister of @yrgirlkv. makes things, sometimes.


last.fm (cassandra-rose)
Last.FM Recently Played



xeph
@xeph

like a lot of people on this app, i spend a lot of time thinking about lofty things like "the state of the internet" and how it changes over time. for example, how people say "on this app" now instead of "on this website". one of the things i realized recently is that the most valuable source of information right now, in early 2023, is reddit. i know. i'm not happy about it either.

there's a great post about this. if you search some topic you want to know about, especially if you're making a purchasing decision about it, chances are you'll find lots of SEO spam blogs. what do we do in this situation? we add "reddit" and read comment threads. that's where the real information lives.

the value of information is, at least in part, determined by how many knowledgeable people have contributed to it. a wiki is valuable precisely because it's been looked at and revised by its audience. the seo blog was probably not looked at by anyone but one overworked copywriter, and maybe barely skimmed by that person's boss. outside the content itself, that's the difference.

you know what else reddit is like? the old internet. reddit is not all that different from the initial organization of things, in yahoo directory, in newsgroups, in webrings. searching for information directly through web crawling might be the aberrational state, sandwiched chronologically between these community-driven bookends of internet information.

maybe there's a search engine concept in here; provable contributor count being a score for the quality of information. i sure don't know how to make it though


DecayWTF
@DecayWTF

Before search engines, we had link directories. Topic-specific, or general, or whatever. We even had stupid link directories. Yahoo started as a really big link directory!

Search engines were the thin end of the wedge, in retrospect. That was the first "algorithm" that decided what you got to see; fully automated, no human in the loop. You had to and could learn how to query the old search engines like lycos and altavista to get useful results but you were still more or less at the mercy of the spider and how it indexed things. Google's big "innovation" was applying more algorithm. We should have seen what was coming.

But yeah if we want to unfuck things: Link directories. Bring links back in general! Put link pages on your websites!


ireneista
@ireneista

they are in direct conflict with search engines because search engines train off them. for younger people who don't know this history, links were the single most important signal in the early history of Google Web Search. the strategy of using links to do citation analysis was the realization that allowed Google to win and kill off the other search engines. (the company's communication guidelines strongly discourage workers from saying things like "kill off", but that's the reality of what happened)

aside: if you're code-inclined we encourage going and implementing the PageRank paper (it ranked pages, and also it was invented in part by Larry Page...), it's a cute trick how it actually collates links. we say "implementing" and not "reading" because frankly it's dense math that makes very little sense if you just read it, when we studied it about twenty years ago we didn't really understand it until we had written our own implementation. it is a lot simpler than the text makes it seem.

anyway! search engines derive their value from these manually curated links. this means that if link directories are placed somewhere that search engines can see them, people end up getting the value of the link directory by using the search engine instead, and not even realize where it originally came from. this is a problem because, even if we ignore the monetary aspect of things, site creators find it worthwhile to make stuff when people actually visit their sites!

search algorithms aren't magic, they need ground truth data to work. we're sitting today at the end of a decades-long process whereby search engines essentially killed, ate, and digested their entire food supply, and now there's nothing left. when we build new community stuff these days we have to remember that it is an adversarial process, that corporations will try to eat us, and we need to have a plan in place to deal with that. rel=nofollow is probably sufficient for link directories, we just also want to make sure people are bearing the general case of that problem in mind in everything they come up with.

in case somebody mentions it: we lived through web rings. web rings were a lot of fun, but they were not very practical. we think the little "my personal five favorite sites!" thing that every page used to have was a lot more useful than web rings, in general, and that's the kind of thing we mean when we talk about link directories.


invis
@invis
This page's posts are visible only to users who are logged in.

You must log in to comment.

in reply to @xeph's post:

this reminds me of how when I'm trying to Google a car, for information about it, I have to append "-wikipedia" to it, because otherwise all my results are bullshit about where best to buy and sell, mpg stats, customer reviews, etc. Even worse is if I'm trying to research a semi-obscure neighborhood or locality, no history comes up but I'll get twenty pages of besting hiking trails, homes for sale, bars near, restaurants near, hotels near, etc, etc. Even if it's a town that HASNT EXISTED in 40 years, because Google Maps pulled a location marker off of GNIS and then SEO spam auto-generated off that.

Reddit is like one of the last vestiges of the old internet, so naturally they've been redesigning the website to make it worse/more "modern". Reddit's redesigns are clearly based around trying to get people to consume as much content as possible with minimal interaction or discussion around it.

I'm having an "I'm not the only one?" moment here, having to resort to appending searches with "reddit" or meddling with security measures during a continued push for overpersonalization.

Maybe we'll be seeing Cohost joining Reddit with culling the algorithm's useless search results, considering how we just drop really good posts on this site on a whim.

in reply to @DecayWTF's post:

search services really are such bull these days. Google? We all know it' issues. Duckduckgo? it always gives you porn and then 20 or 30 results in it's just results about the area you live in that are wholly unrelated to anything. And everything uses google databases so it's useless. Reddit is like the only place I can seem to find what I search these days.

in reply to @ireneista's post:

Webrings were fun, yeah, but as you say rarely practical; they were also kind of violating some of the basic functionality of the web by factitiously turning link collections into ordered lists with the result that one site being down broke ring navigation in at least one direction and webrings were either so small that a single small page of links would have been more use, or so large as to be effectively unnavigable in the intended manner (so everyone just frobbed "random" or just ignored the ring entirely). Fun, definitely, and a neat community-building exercise and something I support on a small scale! But not actually a solution to any real problems.

yes, absolutely. as you say, it had some value for community building.

we do think that, had web rings worked as a way to drive traffic, they would have wound up becoming uncomfortably hierarchical in a way that everyone having their redundant list of favorites was not. because when everyone has their own list, nobody is in charge of anyone else's.

Back c. 1995-1996 somebody at CMU, I can't remember who, remarked after a talk that we thought of WWW pages as being 'for' humans to read, but it might turn out they were more 'for' machines to ingest, in terms of how things were gonna go.

I walked back to my office and had my mind blown for a little while. Because this was both new to me and in hindsight obvious: the machines outscale us. Within just a couple of years, search engines, rather than link references, would become the dominant entry point to any given web page. RIP hypertext.

yeah. the semantic web stuff never happened, search engines wound up just using sheer scale to get the benefits structured data would have given them, but it sure is true that it's more around the needs of machines than people lately.