boredzo

Also @boredzo@mastodon.social.

Breaker of binaries. Sweary but friendly. See also @TheMatrixDotGIF and @boredzo-kitchen-diary.



Osmose
@Osmose

Wired had an op-ed last week that apparently accused Google of artificially modifying search queries on the back end to add keywords that would increase the number of commercial results you see. They've since redacted the entire article:

Editor’s Note 10/6/2023: After careful review of the op-ed, "How Google Alters Search Queries to Get at Your Wallet," and relevant material provided to us following its publication, WIRED editorial leadership has determined that the story does not meet our editorial standards. It has been removed.

There's an archive of the post available. Google's Search Liason (an employee doing outreach / explanations of how search works) responded with a Tweet saying that the article was conflating ad-matching with matching for organic results, and that the systems were separate..

An ex-Googler asked Google PR to provide the referenced slide that is the source for the original article, which they did.

Slide provided by Google PR.

Slide Contents

Advertisers benefit from closing recall gaps

New matches for keyword*: +kids +clothing

kids → childrenkids clothing → kidswearclothing → apparel/ outfit
clothing for young childnikolai kidswearcreative apparel for kids
children's clothing in singaporetj maxx kidswearkids outfits
kids clothing canadakids winter wear for girlskids apparel in citywalk
best children's clothing brandssean jean kids wear
childrens beach clotheskids wear online
newborn children's clothingkidswear outlet

Note: Table is a sample of matches, not exhaustive.
* Includes both S&R, SNE, and SemPhrase & SemBMM matches (all are new).

Without the surrounding slides or accompanying presentation audio, it's hard to tell exactly what this slide is showing. The article seems to have interpreted it as Google effectively adding brand names like "sean jean" into a search that originally didn't have any, while the slide's intent seems to be more to show that matches that happen to have brand names in them already that would have been missed because they used words like "children" instead of "kids" would now be included. Whether it's one the other depends on some baseline understanding of what a "match" is in this system.

Regardless of the answer, Google's assertion is that this whole system is only used for selecting what sponsored ad to use, and isn't at all used for organic results in the first place. The slide isn't specific enough to tell one way or the other.


Tech discourse is rife with this kind of stuff. People assume that being published on Wired.com means the article has been researched and fact-checked, even if it's an op-ed by an external contributor. They see the headline "How Google Alters Search Queries to Get at Your Wallet" and not only do they accept the direct claim, they accept the framework under which the claim was made as well.

The headline implies something a little subtle: your search query is sacred and Google shouldn't be replacing anything in it without your consent. Your searches are a representation of your intent and it's wrong for Google to change that, especially to do so in order to profit. This is, of course, kind've ridiculous, because the "change" is happening within Google's own system for finding matches. Your queries aren't being posted anywhere or shared to some external audience that will misconstrue your intent. Google literally can't do a search in the first place without ingesting and splitting up your query a bunch of different ways.

But most people reading the headline won't be thinking directly about this and will just kind've accept in their head that "modifying my query == bad". Their mental model of searches and how they work (or should work) has shifted in a way that doesn't actually match reality.

This is ultimately the same as those myths and rituals artists believe about posting on Twitter and working with The Algorithm. Stuff like: you will get de-prioritized if you post direct links to your Patreon, so instead you should subtly reference "you know where to get the full art" and only link to Patreon on your bio.

Without a full, accurate understanding of every detail about how feeds are constructed on Twitter, the only way to discover whether things like this are true is through experimentation, and Twitter (and most websites) do not give individual users enough access to run rigorous experiments because there are too many confounding factors. A single test with two different posts on your account can vary wildly depending on the time of day, day of the week, what news was happening that day, what similar users were posting, what country you live in, what language you're posting in, and a billion other factors.

The only way to get reliable data on whether something as small as "post contains a patreon link" has an effect would be to run controlled tests across thousands of accounts with identical post content, and even then that test could be invalidated by Twitter identifying the test itself as a spam attack and de-prioritizing all posts with similar content.

This does not stop folks from writing tools for testing if your account has been marked as bad by The Algorithm, and other folks with large followings posting the results of those tools and treating them as unimpeachable. And soon it becomes "common knowledge" about how The Algorithm works that everyone knows and works around.


The point is: These systems are built such that external people almost never have the ability to reliably determine how they work, yet we are constantly making claims to know how they work as part of discussing them. We can't trust claims made by the companies building the systems themselves because they have an interest in obscuring or lying about how they work for their own profit. Instead the best we have is the stamp of authority from news outlets by Wired that whatever story is trying to explain how a system works can be trusted because they probably fact checked it, whatever that means.

I'm not saying we should uncritically trust claims from companies or employees that build these systems, because that incentive does totally exist. But I will say that we aren't critical enough of claims from people external to these systems, especially when they match our preexisting views like "Google is a dog chasing advertising bux".

Which, to be clear, is broadly true. We don't need a little factoid like this to see that when we have very visible and obvious things like their moves against ad-blocking in browser extensions, their placement of ads on search pages, their obfuscation of where ads come from and who they're being shown to, etc.

But when we amplify these claims we're chipping away at a proper understanding of how these systems work, which piles up over time until our collective ability to be critical of them is hurt by our misunderstanding of how they work in the first place.


You must log in to comment.

in reply to @Osmose's post:

yeah, we've been burned a few times with these things so we picked up on how the original article pretty much started with a disclaimer that the author didn't really see the slide and the whole thing was speculation. most people don't seem to have noticed that, and went around magnifying it.

the thing is that - as someone who worked at Google on advertising privacy - it wouldn't really surprise us that much at this point if the allegations the article made were true. it's just they are very serious claims and there needs to be some actual evidence for them.

also presumably if they were alleged "for real", we'd have already heard about it from earlier court filings. we're no legal expert, but contrary to television portrayals of legal proceedings, these things aren't normally surprises during the trial. both parties have to outline their entire argument in writing before it even gets to calling witnesses.

thanks for your thorough write-up, there's a lot of value to collecting these resources in one place.

it wouldn't really surprise us that much at this point if the allegations the article made were true

True. Also what makes me really pause about the likelihood of this absent any evidence is how they would sell being boosted in organic results like this. If it's a service being sold, it self-competes with sponsored results, which are already displayed above organic results anyway. If it's not, the only motivation for doing this I can think of is branded results somehow having a higher success rate, which is a plausible thing you could see in analytics but doesn't seem like it'd be implemented at the "augment search query with extra terms or synonyms" stage.

oh, well, if true the payoff would be that Google gets more money because they change keywords that advertisers bid low amounts for to keywords that advertisers bid more for. while there might well be some contractual carve-out allowing it, to advertisers it would feel like fraud at their expense.

that's one of those things where.... the advertising ecosystem is complicated and you kinda need to have seen it from a few different angles for a few years to get a good sense of who has a financial incentive to do what, but there are definitely financial incentives here.

we shouldn't give those details without also saying -

it's really important when constructing elaborate hypothetical scenarios, as the original article did, to remember that they are hypotheticals and that there are plenty of other abusive behaviors as well. this is one of the lawsuits we haven't gotten around to reading all the filings of yet, so we don't know what the allegations are but there is like a lot of serious malfeasance Google could be doing that has nothing to do with this particular scenario. it's really important to not overly frame our thinking based on one specific fear.

oh, well, if true the payoff would be that Google gets more money because they change keywords that advertisers bid low amounts for to keywords that advertisers bid more for.

I'm not quite following this. I'm thinking of the suggestion that Google would be selling keywords that affect placement in organic results, and why they would offer that alongside the keywords that affect placement in sponsored results that they already offer. Why would advertisers would bid higher for the organic results when the sponsored results are already prioritized above organic results?

(Just clarifying, ya'll are right about not hinging anything on this one point, I just wanna make sure I'm reading correctly lol)

oh, yeah, I guess I substituted the allegation in my head into one that made more business sense.

there's still a financial incentive for Google to manipulate organic results due to advertising, but it would be to drive traffic to publishers that display ads from Google's ad networks (Google has several ad networks, which is an antitrust concern in its own right imo, but never mind...)

I recognize that this is naive idealism in the year 2023, but...

It's really fucked that we can't simply trust news outlets. It would be really great if we could, in fact, trust journalists to do the fact-checking because they have more resources at their disposal. It's also much more efficient for one writer to do that work, rather than all of their readers.

Pinned Tags