hey so I wanna say something, it'll prolly get lost for like forever rn but uhhhhh just thinkin abt it right now
mass reporting leading to bans isn't actually a feature of automated report checking systems, because it's so well-known how easily mass reports cause bans that any ready-to-deploy automated report checking system would have other programmed responses for mass reports. this is a feature that is so obviously easy-to-abuse if it existed, and so incredibly unlikely to be triggered outside of report abuse scenarios, that i dont know anyone genuinely interested in user safety who would implement shit like that. like, this theoretical feature is so stupid that nearly anyone you explain it to who knows about distributed abuse, regardless of cognitive prowess, is going to criticize it.
why would a professional engineer, likely deeply versed in both service security and safe moderation practices, intentionally implement something that obviously-bad? they wouldn't. that wouldn't be career-retaining behaviour, for fairly good and obvious reasons.
what you're actually doing when you mass report to get someone banned, is you're flooding the work queues of human support staff and causing them to either hit ban to stop the flood so they can see what the fuck is actually going on, or accidentally approve a ban in response to one of the reports. "wait it could be like, a flood prevention thing in the software" no. it's a flood prevention thing in the brain of the human, you know, the human reading the 300 reports some petty ass redditor just put in to knock that enemy gamer offline. you can't exactly read into the context of six weeks' worth of reports slammed into your queue over the course of 30 minutes.
i feel like, on the internet, we've gotten to a point of blaming imagined robots for every problem that shows up somehow related to our tech. maybe not, but it's feeling like it to me. every problem people talk about seems to make up some piece of software designed to do the thing obviously wrong, when the reality is actually social engineering, or a weird brain fart during some human-checked action, or political tensions in a country they've never heard of causing a datacentre outage, or rarely even just some asshole changing something to spite someone else in some fucked workplace drama you'll never hear about.
re the parent post from @mouri-sunlight
what you're actually doing when you mass report to get someone banned, is you're flooding the work queues of human support staff and causing them to either hit ban to stop the flood so they can see what the fuck is actually going on, or accidentally approve a ban in response to one of the reports.
the ways to defend against this kind of horseshit are, unfortunately, on the backs of platform builders to support. if you find yourself in the unenviable position of writing, maintaining or using software that does ticketing or support queues, or moderation queues, or rules for moderation, in a community oriented online presence with user-generated content and social features, these are my recommendations on what to do, with the 20+ years of continual experience I've had on the moderator's end.
-
You are the actual moderator. Even if your job does not involve "working in the trenches" with user reports, your decisions in making or implementing the mod queue will shape the entire website, and all the moderators other than yourself will act in like, to the design of your software or report flow handling.
-
The flag button can work against you. First things first, before you even consider the rest of the report-transaction flow with your moderating staff, know that floods, mass-reports, and "flag-bombing" activity is expected, everywhere, and almost universally malicious.
As mentioned in the parent post, the job of flag-bombing activity is to wear down your staff, and in more naïve mod systems/dynamics, it is stupendously effective. When a flag bomb lands, it can't hit the user; it expects to hit your staff, who then effectively blame the user and retaliate. It is an insidious social weapon, because you - the website, the agent of trust in the user's content and credentials - are now shooting at your user.
Moderation is a system. Remember: the purpose of a system is what it does. -
When it's obvious, it is obvious. When one person's one post gets a queue of user reports, quickly, they are either the target of a flag bomb, or are posting something very obviously illegal or against the site norms.
It is almost never a thought-provoking decision that needs nuance or very diverse viewpoints: the post is either (for example) lukewarm to tepid ranting against a right-wing pundit or Tesla/Elon Musk (the post is being flag bombed), or it is a clear 11/10 on the pain scale for gore, CSAM, or other shock/illegal content, or spam activity (the post is not being flagbombed, these are proper reports).
You, as a moderator, can quickly tell if it is one or the other, and if it is shock content or spam, you can swing hammer as expected. -
The "unban" doesn't exist. For all intents and purposes, you should absolutely have a way to reverse an ill-reasoned, accidental, unintentional or otherwise unnecessary moderator action, and communicate to the user in question that it has been reversed, "after careful review" or with whatever flavor fits site and situation best.
But, be aware that, the moment a moderation action lands on a user or their post, especially if it's unjustified, the user almost certainly sees this as a slight against them, and will not even go through the trouble to appeal - "that's it, they lost me as a user."
Losing users in this way may shift the character of your website negatively or in a way uncharacteristic of your, or your site's, actual values or expectations.Opinion time: (click to expand)
This will shift the character of your website, from people who are likely targets of flag bombs (like unionists, activists, queer and ND people, anti- or non-capitalists, socialists, Black people, among other racial minorities, and the poor) towards the people doing the flag bombing activity (in my anecdotal observation: often, but not universally, tankies, people who say "marxist-leninist" or "trotskyist" a lot, fascists, conservatives, the right wing, rich, white and neurotypical heterosexual cis men and women). -
This is aside from the flag-bombing topic, but still matters: Actions and behaviors of the moderation team convey the "vibe" of the website. Moderation activity is a key component of how the people of a website behave. Do the best that you can in your software, interface, and report flow, to ensure the process carries the vibe you hope to convey on the website.
It is why cohost is the way it is, on one end; and 4chan, SomethingAwful, and Reddit on the other end, as well. -
Do not make people afraid to report, when things need it. Rules like "fraudulent reports will result in a ban", lead to users being extremely careful about clicking the ⚠️ or 🚩 button. You may think this is a good thing, like "maximizing KPI ratios" or something grotesquely capitalistic like that. It is not.
-
If the tickets look overwhelming when a flag bomb happens, maybe it is time to change the formula. Rather than a list of single tickets, perhaps coalesce reports against one post into a single item onscreen. Do what it takes to make the moderation queue look less like your aunt's inbox, and more like a streamlined 911 PSAP calltaker-dispatcher dashboard.
-
Along with the 911 call-taker metaphor earlier: Segment the tasks of intake, triage, review, and action. The police, fire and ambulance service don't all answer the telephone simultaneously on a party line; they have dispatchers. why shouldn't your mod staff?
This is also a good methodology for software development in general, but in a moderation usecase, having the responsibilities divided can reduce the workload on all moderators, without resorting to "weird" methodologies such as AI/ML report gating. -
If your team has the right shape for it, allow more eyes on a report to review it. Some things do necessitate rapid action - those are obvious, as I mentioned in the first point above. But having more people onboard, especially a diverse and broadly inclusive group, to see who's sending reports, what they're reporting, what the complaint entails, goes a long way to keeping a malicious, flag-bombing moderation attack from successfully removing a user.
-
People will lawyer your rules to make a flag bomb successful. If the post mentions someone in a negative light, it is now "targeted violence against" someone according to the flag bomb. If the post says something on the order of "god I wish [name] would stop", that is now a direct threat against that person, according to the flag bomb.
Your rules, no matter how well-meaning, will be twisted into a weapon against a user, in some imagined battle of a culture war that the flag bombing party participates in, because they said a certain word or made their feelings about someone known. Do be aware of this; don't let it happen. Laws are the tools of censorship; police are the engineers. -
If you have rules against NSFW content at the behest of advertisers, payment processors, or app platforms like Apple/iOS, you will get flag bombs regularly when a group with extremely conservative views on sexuality, language/profanity, arts, furries, or the human body, encounters a post. You aren't doing yourself any favors by culling NSFW works. Instead: rule for strict self-moderation (by users tagging and 18+ flagging their own posts), and use the porn-ban moderation workflow instead, for handling mis/untagged NSFW content and illegal content as it's reported.
Finally, a local anecdote:
I can't speak for the @staff of cohost, and I don't know what the backend of the reports queue looks like; but I can commend the staff here for being one of the only current large-scale social websites I know of that does not punish users for being the target of flag-bombing, to my knowledge. Thank you. 

