Reba-Rabbit

I'm just here to play around ;3

  • She/Her

NSFW (18+ only) /40yo/An exceptionally busty little rust haired rabbit who winds up being smeared on the highway every once in a while. You can call me Reba or Roadkill, whichever you prefer <3


Inumo
@Inumo

On January 5th, 2023, the United States Census Bureau released its results for "Week1 " 52 of the Household Pulse Survey (HPS), collected December 9th-19th, 2022. On February 15th, 2023, USAFacts (a nonprofit that tries to make government data accessible to the lay public), published an article that processed the HPS data and reported about long COVID by gender. On November 24th, 2023, someone tweeted one of USAFacts' graphs with a plea to mask up, and at least one person asked: why are trans people grouped together when cis people aren't, and what does "other" mean anyways?

Fortunately, I'm a huge nerd that likes data. Let's roll it back to the HPS and talk about how we got here.

The data analysts running the HPS have a rough job. They're tasked with finding out & tracking "how the coronavirus pandemic and other emergent issues are impacting households across the country from a social and economic perspective." If you've ever gotten a survey request in the mail or on the phone and discarded or ignored it, you know exactly why this kind of work is difficult. Response rates are low, and if you want something that's checking in with people routinely, convenience is essential. All of this has to be figured out alongside the actual core of your task: what do you even ask people, anyways?

Ultimately, the HPS is a short (read: 20 minutes, so not that short) online survey that was sent electronically to households where the Census Bureau had either a phone number or an email address for the household (and the household had not previously opted out of surveying). For Week 52, this meant sending surveys to 1,049,385 households. 70,685 responded, for a weighted response rate of ~6.7%.

Did I mention these kinds of surveys are rough because response rates are low? Anyways.

The question(s) USAFacts highlighted in their article was structured roughly as follows2:

  1. Have you (a person 18 years or older) ever received a positive COVID-19 test or diagnosis from a healthcare provider? (Yes/No)
  2. If yes to 1, have you experienced COVID-19 symptoms lasting 3 months or longer? (Yes/No)
  3. If yes to 1 and 2, have your long-term symptoms reduced your ability to carry out day-to-day activities? (Yes, a lot/Yes, a little/Not at all)

As you can imagine, there's a lot of nuance in these questions alone that depend on exact wording. Do you answer "yes" to question 2 if your symptoms resolved? How do you answer question 3 if your symptoms are slowly improving? Heck, do you even answer "yes" to question 1 if you self-administered a COVID test w/o a healthcare provider's direction or supervision? Just as there is no statement that is impervious to misinterpretation, there is no question that will not have some people answering counter to your intent as a survey designer. Suffice to say, there is undoubtedly some noise—some random, unintended and unavoidable variation—in the long COVID response data, despite every intention and desire to limit it.

All right, that's the long COVID response side of things, now let's tackle the other part of the data: demographics. After all, the USAFacts data isn't just reporting "what is the rate of long COVID," it's separating it by gender identity, such as those data exist. If you want to collect that data though, that means you need to ask for it – and that presents another thorny issue. To start, let's think like a data analyst and ask: how many people do you have to ask a question before you think, "Yeah, this is probably representative of a population?" The number obviously goes up as the population goes up; five people may be enough to represent a room of twenty, but five is blatantly insufficient to represent twenty million. Where do you draw the line?

Now, let's apply this to trans people. Last I heard, trans people make up anywhere between two and five percent of the US population, which equates to ~7–17 million people. As I think most people reading this will be aware, though, this is a very internally diverse population. How many are trans women, vs transfeminine, vs transmasculine, vs trans men, vs trans nonbinary, vs trans but not on hormones, vs genderfluid, vs… You can see how this becomes an issue very quickly for a surveyor. How do you divide trans folks up for study? Do you even divide trans folks up for study? Because remember, while there may be as many as 17 million trans people in the US, we only heard from ~71 thousand households. If we assume that "trans people" and "households with trans people" are roughly equal proportions of the general population/households (which is, admittedly, a big assumption), that's at most ~3,600 responses that could inform you about trans people. Do you really wanna split that pool of 3,600 further, in an attempt to talk about up to 17 million people in a more granular fashion?

To cut to the chase, the HPS team decided, "3,600 is barely enough to make some guesses at the US trans population; let's not divide them further." Thus, we end up with a question that goes something like:

  • What is your gender identity? (Cisgender male/Cisgender female/Transgender3/None of these)

Now, you may be wondering: what do you mean, "none of these?" Who falls into that category? And while I have no doubts that Cohost has an above-average number of people willing to declare their gender identity to a self-proclaimed anonymous survey run by an arm of the US government, as you might imagine things get much more complicated at the scale of ~333 million people. Consider a person that declares that as a nonbinary person who has never felt dysphoria, they aren't really "trans" so much as language has allowed them to describe their lifelong gender more accurately. Consider a TERF that bristles at being called "cisgender" and refuses to self-categorize accordingly4. Both of them are part of the dataset, yet neither fall under the "cis male/cis female/trans" trifecta. Thus, it's common for survey designers to have an "other" option, even if they plan on ignoring it during analyses & reports, because it helps maintain good will among their survey population & allows them to validate that their responses cover a sufficient, if imperfect, proportion of that population. If they were to get back 50% of surveys where "none of these" had been been marked, I would hope that would flag to them that they are woefully behind on current trends in gender identity & need to do some research before the next survey.

All right, with all that said & done, we finally—FINALLY—are done talking about data collection. Frankly that's the hardest part for most data scientists, so the rest of this should be (relatively) quick. The next step is data analysis, which is where USAFacts comes in. The raw data is, frankly, an almost unconscionably dense spreadsheet of numbers. It's not really meant for public consumption, it's meant for use by analysts like the folks at USAFacts, where consistent formatting is more important & useful than human legibility. Since people have something of a hard time processing just how many "576,065/1,250,864 people which are a subset of 127,108,598 people" is, exactly, the job of a data analyst is to turn these raw numbers into something people can understand. Quite often, the easiest route is to make some percentages.

Well, I say that, but it's worth recognizing that percentages have to be used very carefully to actually be accurately informative to people. There's a lot baked into a percentage to make it legible. Consider the statement, "80% of people who enjoy food A also enjoy food B." Seems like a high number. But what if you dig deeper, and find out that only 5% of people enjoy food A? Suddenly the statement is only about 4% of the population, and that's a lot less impressive. This is a relatively easy example, so consider these harder ones: who should be in your baseline population when you're trying to evaluate if Black students have a harder time getting admitted to schools than White students? All applicants? Only Black applicants? What about evaluating queer income? Do you compare these numbers to the general population, or to specifically cishet people? Or should it be specifically cishet White people, to account for potential intersectional effects?

For this discussion of long COVID, both the Census Bureau and USAFacts made the same decision: start with the pool of all people who have ever had COVID, then look at who has long COVID within that population. This makes it less informative if, say, you're trying to figure out how many people might need long COVID disability support in your area, but more informative if you're trying to answer the question, "if I get COVID, what are the odds that I'll develop long COVID as a result of that infection and/or any later infections?" Note that I phrase that question deliberately broadly; these data are about people in December 2022, almost 3 years into the pandemic, with no information about how many times any given person had been infected. The people with long COVID could have been infected once, or could have been infected fifteen times; we just don't know.

Okay, so USAFacts made graphs, and I can quibble about the fact that they didn't put error bars on the graphs (perhaps unsurprisingly, there's a lot more uncertainty around trans data than cis data, making it hard to tell what's a real difference vs random chance), but for now we'll take it at face value: 46% of self-identifying trans people who have reported catching COVID previously also report having COVID symptoms persist for over 3 months, compared to 32% of cis women and 22% of cis men. Does this mean that trans people are inherently more at risk of developing long COVID? Well... no. But it also doesn't prove that they aren't. As one of my favorite websites aptly demonstrates, correlation isn't the same as causation. Just off the top of my head, here are some potential explanations for why trans people might have increased rates of long COVID according to the HPS:

  • Irregular hormones creates a less robust internal biology, resulting in more severe/widespread damage from a COVID infection
  • Trans people are less likely to have good healthcare coverage, and thus get inadequate care for COVID infections, resulting in more severe symptoms
  • Trans people are over-represented in retail and similar public-facing jobs, increasing the likelihood of repeat COVID infections that develop into long COVID
  • People who are willing to self-ID on a government survey as trans are also less likely to be "stealth" in their (presumably wealthier, more health-supporting) households, meaning trans people who avoid long COVID are under-represented
  • Trans people are more aware of their "normal" cognitive and biological states, resulting in increased reporting of long COVID symptoms compared to cis people who don't notice that they can't run as far as before
  • Trans people frequently communicate about disability and other related issues like long COVID, resulting in increased reporting of long COVID symptoms compared to cis people who think their brain fog is just them "being tired, maybe burnt out"

As you can hopefully see, there are far more social, economic, and infrastructural explanations for these data than biological ones. Frankly, with just about any correlation of minority status, you're better off assuming it's a result of socioeconomic & infrastructural factors than anything biological. This also indicates that, in all likelihood, "just remember to mask up" probably isn't going to cut it when it comes to avoiding long COVID as a trans person. It will reduce your risk of infection (and, by extension, your risk of repeat infection), to be sure, but it's not going to make your job give you more days off to properly heal from COVID, or give you better healthcare, or, or, or, you see where I'm going.

Of course, saying "here are some possible explanations" won't stop some people from making some bold claims anyway. Transphobes and trans hypochondriacs both will use this to say trans folks have weaker immune systems. Health insurance companies will use this to say being trans is a "preexisting condition" and crank up premiums. Activists will use this to say trans people need more legal or financial support than cis people to deal with long COVID. All of these are drawing conclusions not wholly present in the data, albeit with varying degrees of logical leaps. After all, when it comes to people, raw data isn't enough – we want a good story.

I hope this has been informative for some folks about how the sausage is made when it comes to these kinds of reports, and all the caveats that get ignored due to limits on time, space, or interest. Comment if you have any questions, and I'll see if I can find answers for you. I may not technically be a Professional Data Analyst™, but I at least have a bit of experience looking at US census data, and I think enough about statistics & data integrity to have an above-average awareness of data caveats. Doesn't stop me from being wrong, but it at least gives me better odds of being something approximating correct. Thanks for reading!


  1. Technically the current version of the HPS has a two-weeks-on, two-weeks-off cycle; they call each pair of weeks "one Week" in order to maintain parity with the very first HPS that worked on a one-on, one-off cycle. I capitalize "Week" where relevant to try and make this clear.

  2. I couldn't find the exact text of the questions with a cursory search, though I suspect they might be available somewhere with enough digging.

  3. I hope that I've framed this sufficiently that you can see this is not an attempt to coerce trans people into a simplified "third gender" categorization. It's a simplification of reality on an assumption of shared conditions to provide informative data.

  4. I actually suspect this may be a contributing factor to the "none of these" responses; roughly 2/3 of the "transgender," "none of these," and "did not report" answers apparently come from AFAB people.


You must log in to comment.

in reply to @Inumo's post:

Great post! For the exact questions used, I would recommend checking out the SAMHSA website guide, which stores a lot of these questions in their database. It’s very informative and publicly available, though a little obtuse at times.

Good to know! I suspect there's probably a revision of a revision of a background document somewhere tied to the census that might be a bit easier to find, though. I took a different tack than when I was writing this essay & was pretty easily able to find the full survey text for the current phase of the HPS, for example, so I don't think it'd be too hard to either reverse engineer a URL or just find an archive of past versions. I'll keep the SAMHSA in mind for future digging, though!