This is a graph of Discord’s algorithmically inferred gender (extracted from “request your data” json; axes are probability and days) for a user whose display name is “Tiffany”, whose bio is “she/her”, whose pfp is a drawing of a girl and whose profile theme color is pink.
Algorithmically inferred gender is worse than useless. Presumably the issue is that she talks about programming, and all the deliberate “I am explicitly telling you I am a girl” signaling in the world can’t convince a computer. I sometimes watch a livecoding streamer whose youtube stats claim his audience is 99% male even though you can see fem-coded chat participants regularly. Algorithms like this are deleting the women
Here's my Discord gender graph. Fuck you, Discord.
Also, it's not super easy to extract this data, but here's what I did:
- Did a full data export (including messages)
- Waited a few days for it to arrive
- Noticed that my
activity/analytics/events-2024-00000-of-00001.jsonfile was about 3GB which is really difficult for most JSON tools to process - Ran the following command to filter out just the rows with the gender information:
jq 'select(.prob_male)' activity/analytics/events-2024-00000-of-00001.json > gender.json - Found out that
jqdoesn't produce parseable JSON after all this, so I opened the file in a text editor and replaced}with},and then added a[and]to the beginning and end, respectively - Ran this bit of Python to get a CSV file:
import csv import json outfile = open('gender.csv', 'w') writer = csvfile.writer(outfile) data = json.load(open('gender.json')) for item in data: writer.writerow(item['day_pt'][:10],item['prob_male'],item['prob_female'],item['prob_non_binary_gender_expansive'])
and then I had something I could import into any given spreadsheeet (in this case I used Apple Numbers).
Anyway, as always, "the algorithm" is a form of bias laundering. Who knows why Discord decided I'm probably male! Fuck you, Discord!
