lapisnev

Don't squeeze me, I fart

Things that make you go đŸ€Œ. Weird computer stuff. Artist and general creative type. Occasionally funny. Gentoo on main. I play rhythm games!

Inkscape Monofur font Cohost PLUS!

You can post SVG files like photos on this website! Spread the word!


kobi-lacroix
@kobi-lacroix

The paradox of the Dunning-Kruger effect is that it probably doesn’t exist, but those who know that it probably doesn’t exist don’t bring it up as often as those who are certain that it does, thereby proving that it does exist.


belarius
@belarius

Authors Jansen, Rafferty, and Griffiths (2021) have done one of the better recent re-examinations of this now-tired meme, and I wish every person I saw on the Internet name-dropping Dunning-Kruger (DK) as a way of dunking on people they dislike would consider its conclusions seriously.

Conclusion 1: The vast majority of the effect attributed to DK is due to a statistical artifact called regression to the mean (RttM). Simply put, most tests of ability aren't great at measuring that ability, including our own internal self-assessments. When measurement error is applied to both of the scores obtained from your participants, then taking the difference between those scores gives you what looks like DK for free. Put another way, the famous skill-confidence disconnect is almost entirely consistent with nothing more than noisy measures of skill and of confidence. So, big-picture, DK as most people cite it does not exist, because it's a statistical mirage that Dunning and Kruger should have known better than to be fooled by. This is further exacerbated by the quartile maneuver D&K used in their original paper, if anyone's keeping score. Viewed in the cold light of day, the original paper's a bit of a statistical trainwreck.

Conclusion 2: Based on new data using a much larger sample of participants, this new trio of authors show an interesting wrinkle, depicted in my reproduction of their figure, above. On the one hand, their best estimate of the RttM effect mostly describes the data, but another model that incorporates an interaction with performance does a slightly better job. This suggests that, at least on average, people with the very highest and very lowest scores tend to be a tiny bit overconfident, whereas those with average-to-slightly-below-average scores actually tend to be slightly underconfident. Of course, across all these points, there's enormous individual variability.

Conclusion 3: Because confidence varies dramatically from person to person, this very small over-to-under-to-overconfident effect barely registers, and wouldn't help you very much in trying to predict someone's skill level from their confidence (or vice versa). Formally, the "effect size" of this underlying effect is quite small. As such, while there is now some evidence of a small (and more complicated) skill-confidence disconnect, that effect is so much smaller than how much people just vary in their confidence in general that it's probably best to proceed through life assuming that DK-like effects don't exist in a way that should impact your judgments of people.


You must log in to comment.

in reply to @kobi-lacroix's post:

in reply to @belarius's post:

In the classic DK demonstration, people took a brief test of "logic puzzles" and were also asked how good they think they are at logic puzzles. Here, 4000 such participants are divided up by their "true score" on that brief test, and the average of everyone getting a particular score is shown in blue. These averages have error bars because we should be less confident declaring that we know that average when it's due to only a few observations. For example, very, very few participants got a score of 0 or a score of 19, so the errors bars on those averages are correspondingly huge, because the tiny handful of confidence reports we got from people with such extreme scores haven't taught us very much about how future participants with those scores might respond on the confidence scale.

If performance and confidence perfectly predicted one another, the blue averages would lie on the red dotted line. Clearly, the spread of the average confidence is much shallower than that, implying that the folks on the left side of the graph are overestimating and the folks on the right of the graph are underestimating. This is the "classic DK effect," but it's very misleading. Since the simple test people were given isn't a perfect measurement, and since people also vary in how confident they feel, both measures have a degree of error. Since both scales have a top and a bottom bound, if there's going to be an error in measurement, it's probably going to be an error that moves a score toward the average. This leads to "regression to the mean," a statistical phenomenon that's been with us since we first started thinking about averages, and that has been fooling us for just as long. What Jansen and colleagues have done (which isn't at all trivial) is make a good-faith effort to estimate how strong this effect is for this sort of data, and what they found is that regression to the mean dominates this pattern. This is reflected by the red squares: If we assume only that the logic test and corresponding self-report of ability are prone to error in their measures, with no true DK effect whatsoever, you get the red squares. That's the basis for saying the "true effect" doesn't exist - the famous pattern looks almost the same as what you would expect from noisy measurement alone.

However, Jansen and colleagues considered another possibility: Participants have some idea of how well they did on the logic test, right? Like, did it feel easy? Or hard? So they fit a second model that anticipated that perceived ability might depend both on their own beliefs about themselves and on how well they did on the test. This yields the yellow-green circles, which is the model that best describes the data: A curve that mostly follows the RttM pattern, but deviates from it a tiny bit. So their takeaway is that there's a small pattern of bias, but it's more complicated than the traditional "beginners overestimate, experts underestimate" narrative that DK normally predicts. Indeed, there's still a big of discrepancy between the yellow-green circles and the blue averages, so it's possible that the real relationship is even more complicated, due to some mixture of "kinds of people" that are being lumped together.

Hope that helps!