Osmose
@Osmose

Mozilla has been an advertising company since 2004 when Google was first made the default search engine in the browser—where do you think the money Google paid for that came from? The ads on the search results! Mozilla has been almost fully funded by advertising revenue for two decades.

If you're surprised and worried for what this means you should've switched to a fork when Mozilla was directly negotiating with advertisers to standardize Do Not Track in 2010. "Advertising is the only thing that anyone has made work at scale so let's try and make advertising better until we find a better funding model" has been a Mozilla core principle for over 14 years.

His impact in helping to create Mozilla and Firefox is inarguable, but boy do I hate the way he writes about things and hate how much he loves easy, out-of-context dunks.


Osmose
@Osmose

Like to be clear the post above is not a defense of Mozilla buying Anonym, just my treasured, extremely petty pastime of policing any mention of jwz on the mozilla tag.

This is the closest to a specific webpage describing what they do and it's.... not great. With context I have from previous projects around attempting to make advertising better my guess is that they:

  • Provide some sort of library/service that ingests metrics directly from user's browsers, and hides/replaces identifying info with consistent anonymous markers, e.g. everyone from "Orlando, FL" gets marked as being from "Large Southern City 7" so you can still glean insights without being able to tell if any one user is located in Orlando.
  • The services run in some environment that prevents access to the non-anonymized data while it is in-transit.
  • They also mix in fake data to make it harder to find legitimate data, but in a way that doesn't change the results of analyzing all the data together (i.e. if the legitimate data says users were 5% more likely to buy your product after seeing an ad, the legitimate data + the faked data will say the same). I dunno how it works but I've been told in the past that this is an area that's had research done and there are techniques that can actually do this reliably.

Does their product actually exist? Who knows. It'd be cool if it did. The two Meta execs is sus given how many former Meta folks have joined Mozilla. It's also possible this is a play for influence in the ad business, something Mozilla needs if it wants to get into talks like what led to Do Not Track in a world where they have an order of magnitude less browser share than they did back then.

The main thing I'm confident in is that this is not a meaningful shift in Mozilla's stance on advertising. It has always thought that advertising isn't great but it's worth making it better in the absence of anything else that can match the funds it can provide.


You must log in to comment.

in reply to @ann-arcana's post:

Thing is, management keeps killing everything that might've helped them get the upper hand again, so they can line their pockets instead.

They basically handed an entire programming language over to Amazon ffs.

in reply to @Osmose's post:

Without reading the details, your summary of the system sounds like Differential Privacy. Which if that's what it is, then that's good. The principle is mathematically sound and provably optimal. Just, y'know, the implementation.

https://en.wikipedia.org/wiki/Differential_privacy

The main idea is that any data query leaks some amount of private info always. Even if the individual in question isn't in the dataset! But, Cynthia Dwork was able to formalize a mathematical lower bound on the information leakage, and design a system that achieves that bound. The primary approach is that some subset of the underlying data is replaced with random noise, so the contribution of any individual to the data is statistically indistinguishable--what you described about their system is details of how this is implemented, which is similar to other differentially systems that are publicly available.

In practice... some systems have designs which are formally verified against the definition of differential privacy, and some do not. Some merely cargo-cult the implementation details of more-robust systems. And honestly even the "good" ones strike me as requiring too much configuration, so that either their privacy or their reliability is subject to operator error and the target audience is generally non-experts.

Like usually the query will have to specify up-front the expected range or distribution of data, so that the random noise can be properly created. Uh, some of the point of querying the data is to find out whether your guess of the range is correct or not.

There's also the separate issue that for most of these systems the "true" data still has to be recorded somewhere and the random noise is added at query time. So the privacy guards only apply to unprivileged operators, but privileged operators are unimpeded, meaning e.g. no resistance to warrants.

I mean don't get me wrong, I like Differential Privacy and think it should be expected and even legally required in some areas. But due to practical concerns it may not be the panacea in the way that provably-mathematically-optimal solutions often aren't.