54 Comments
User's avatar
WD Lindberg's avatar

I was listening to Star Talk the other day (Scientists Who Were Persecuted for Being Right, with Matt Kaplan). They were discussing this topic and noted that science done well is very messy. There should be point and counterpoint discussion/argument, as that is what makes the method work. However, to an outsider it looks like no one knows what they are doing. Unfortunately, since humans are involved, there are mistakes, misinterpretation, misunderstanding, poor judgement and falsification. This leads the outsider into further distrusting the process.

In my experience in the chemical industry, taking data and analyzing it statistically often resulted in answers way smaller than the error bound. I always look skeptically at study results that claim fantastic accuracy about things that can only be measured very inaccurately. They always explain that they corrected for the things that made it inaccurate and I think oh-oh you threw out the inconvenient data. The scientific method is hard to do well and you have to be good at knowing when to give up on an idea because the answer you got was way smaller than the error bound.

When the research summary says it was a meta analysis my alarm bells really ring. I don't understand how any meta analysis can possibly understand the actual error bound on data taken by many different people in many parts of the world. Particularly when the studies they analyse often were attempting to answer very different questions. Seems like a method to ensure finding causation where none really exists.

The facetious rule of thumb the StarTalk folks quoted was: "If you need statistics to get the answer it is not really science." There is an element of truth in that statement.

Your work is always enlightening.

Theodore Whitfield's avatar

"If you need statistics to get the answer it is not really science."

Obviously, this is literally false, but as you observe there is an element of truth in it. When I was a grad student in public health, I attended a lecture by a very distinguished professor who said, "There is no major discovery in epidemiology that cannot be communicated with just simple tables and graphs." I've thought about that statement in the (many) years since then, and it's remarkably true. I'm a biostatistician, so I fundamentally accept the need for some form of statistical analysis, but if someone shows up and announces some big discovery (especially a counterintuitive one) and justifies it by pointing to the results of a semi-parametric partial ridge regression multidimensional Bayesian model, you might want to exercise some caution in accepting this as gospel truth.

Lucas Van Berkel's avatar

I have to say, I never bought the ego depletion debunking.

It seems obviously true that people who are mentally exhausted tend to be grumpier and more prone to errors.

The debunking experiments also seem to be extremely feeble in terms of what they consider ego-depleting. In the first study referenced above, participants engaged in one of two tasks - the control group pressed a button when they saw a word with an 'e' in it while the other cohort had to refrain from pressing the button if the 'e' was adjacent to another vowel. The test lasted 7 minutes and 30 seconds. This is so far removed from a real world ego depleting scenario that it has almost no bearing on reality (e.g. watching an assembly line for defects for hours at a time in a noisy factory, taking a group of rowdy children on a field trip, having a heated argument with a spouse, writing complex code while being hit with emails and support tickets etc).

Speaking as an outsider who follows but does not work in the field, it seems there was an overcorrection in bias. Where people were once too quick to believe an attractive theory, they then became too quick to claim it was all false. Debunking became so high status that a lower standard of evidence was required for people to claim something had been debunked than to claim that the original hypothesis was true.

Adam Mastroianni's avatar

I agree, I think it's obviously true that people get fatigued. I think the reason ego depletion got so much attention is that it made a stronger claim, which is that you can produce those fatigue effects reliably with things like the "crossing out e's" task.

Anon's avatar

Maybe the discrepancy is a result of trying to isolate too much? As in: ego depletion isn’t happening as much as the state is changing as time goes on? For example, urgency goes up in solving some problems due to delays incurred by making other choices, so the inputs to a system are rebalancing moment to moment and you’re not actually measuring the same thing as time goes on. Top example that comes to mind is trying to eat healthier but defaulting to fast food at the end of the day. Earlier in the day you have fewer constraints and more options (ie, lower hunger, more time, more calorie bandwidth), but at the end of the day you might be faced with fewer options and greater urgency, in part because other things got in the way before this point. Even if that is happening though, the effect is functionally the same, exhausted people are more prone to making suboptimal long term decisions. Seems like invalidation on a technicality really.

E. North's avatar

Sometimes the hardest thing is not proving that something no longer works.

It is admitting that it is no longer worth organizing your life around it.

ScienceGrump's avatar

Nodding along, a reasonable if provocative take, yessir I -

"If stereotype threat truly doesn’t exist, that means you could never, under any circumstances, run a study that produces results in line with the theory. "

What? No. Of every 20 honest, well-conducted, preregistered studies with perfectly calibrated null hypotheses, 1 is gonna have p < 0.05. And none of those things are true of most psychology studies.

Caleb Shenk's avatar

Read that paragraph closer. You're misreading that sentence as something Adam believes, when he's actually just presenting the (il)logical extension of an argument other people make to show its fallacy. See the very next two sentences: "That’s a crazy claim to make! We don’t have nearly enough evidence to support such a conclusion, and we never will."

ScienceGrump's avatar

That reading doesn't make any sense in light of everything else he argues in the post. The "crazy claim" he's referring to is "no study could produce results in line with stereotype threat," not his own claim, which is "to say stereotype threat doesn't exist *requires* that no study could ever produce results in line with stereotype threat."

Caleb Shenk's avatar

Hm, I now actually think maybe we're both right? I think my reading of the sentence is correct in that Adam is saying that theories that don't replicate should not be written off as "not real" because if they truly don't exist at all, then you could never run a study that produces results in line with the theory. In the next paragraph, he talks about how p-hacking can find results in line with these theories: "Instead, they are almost always 'proving' that, given infinitely flexible theories and infinite ways to test them, you can produce some small effect that kind-of sort-of accords with some version of the hypothesis, broadly construed." In my reading, Adam is certainly aware of p-hacking, and uses it to say that theories can never be proven completely false.

But we're both right (if I understand you correctly) that Adam's point is a strawman: People who say an effect is “not real” don't mean that no study could ever find supportive-looking results.

Adam Mastroianni's avatar

I wasn't clear. You're right that you can trivially produce any result if you break the rules of inference. But I think we make a stronger claim when we call an effect "not real": if you follow all the rules, you should not be able to reliably produce a "not real" effect. So flukes, p-hacking, etc. don't count.

Kat S's avatar

Literally came here to say exactly this! Adam, mate, what??

Leonardo Servan's avatar

I recently finished college (and am devouring your substack's archive, sorry, not feeling quite full yet), and it's so funny admiring science before trying to actually make it and then seeing it from the inside and noticing it's essentially "orderly infighting". Nonetheless, it only made studying science even wilder, like: "Wow we actually got somewhere by doing this!".

Writing through this comment I noticed that it only reinforces your past point that science is a strong-link problem, and the proof is that we actually got where we are almost "despite" the way it works.

Michael Champion's avatar

The point that theories are seldom decisively disconfirmed, but at some point become embarrassing reminds me of Imre Lakatos's https://plato.stanford.edu/entries/lakatos/ work on scientific research programs. Theories aren't confirmed, but powerful theories drive "progressive research programs" that answer more useful questions / predict more useful facts than not. "Degenerating research programs" are those in which "the successive theories do not deliver novel predictions or if the novel predictions that they deliver turn out to be false".

Eugene Lichkin's avatar

What a ride! Totally agree that even if a theory is technically alive (you can still publish on ego depletion if you really want to), the real cost isn’t the debate -- it’s the opportunity cost of all the brains stuck re‑fighting the same war.

The cringe‑o‑meter is a decent heuristic, but isn't cringe socially assigned? What feels embarrassing to a top‑tier journal editor might feel like heresy to a young researcher who’s genuinely onto something (your examples of continental drift, germ theory, basically any idea that started as a heresy to the wide scientific audience). IMO the real skill isn’t just detecting cringe -- it’s distinguishing between “cringe because the field moved on” and “cringe because the idea is too weird for the current priesthood”.

Anyway, thanks for making me laugh and feel existential dread at the same time.

Adam Mastroianni's avatar

Yes, I think only you can and should decide for yourself what's cringe. Some may say that being a top-tier journal editor is itself cringe. 🙂

Eugene Lichkin's avatar

The Wheel of Samsara Cringe!

Kevin Munger's avatar

Good posts but needs more feyerabend

I also think it’s dicey but we need to be honest about the role of elitism in academia in terms of what is cool vs cringe. The mere fact of peer-reviewed studies is basically irrelevant if they’re not being written/read by those at the institutional heights of power

Dan Collison's avatar

This reminds me — a deep dive on the etymologies of “belief” and “worship” is instructive.

Belief and love come from the same PIE source, *lewbʰ — “to care, desire, love.” And worship is what you turn to, over and over. So “what you hold to be epistemically true” actually has a lot to do with “what you hold dear” — which is why no amount of facts will dissuade.

And as in your Eddington example, what you turn to, over and over, curves your back and restricts what you’re able to see. For the man with a hammer, everything is a nail.

Tanner Janesky's avatar

This reminds me of an old Veritasium video about how most published research is wrong—even with a P-value under 0.05. Experimental and publishing biases combined with randomness produce some funky conclusions!

Theodore Whitfield's avatar

There's a big problem with arguments about falsification in statistically oriented disciplines like the social sciences.

It's certainly true that a proposition such as "All swans are white" can be falsified by one contrary observation. That's because it's a *universal* proposition: it's making a claim about **all** swans, so finding just one black swan invalidates the statement. The laws of physics are examples of universal propositions -- Newton's Law of Gravitation holds for every pair of massive bodies in the universe, and if we can find one exception, then it's not a physical law.

But the social sciences don't usually make universal statements that apply without restriction to an entire class. Instead, they often make assertions that associate two variables, in the form of "X tends to get bigger when Y gets bigger". These statements aren't universal, because they describe general trends rather than something that always is supposed to happen. The claim "Most swans are white" is like this: this says that swan-ness is associated with white-ness, but as a general trend. In this case, observing a black swan is not dispositive evidence against the hypothesis, because the claim wasn't that there were NO black swans (a universal claim) but rather that usually if something is a swan then it is white.

These sorts of claims are **much** more difficult to falsify than direct universal statements, and they require all the machinery of statistics and experimental design. So it's not hard to see why claims in the social sciences can linger on in zombie-like states for a long time.

Christine Whitmarsh's avatar

I just had an entertaining visual of a media outlet picking this up with the headline, "PhD scientist says [name of conspiracy theory] 'might' be real!!!"

Seriously though - I enjoyed this and also read the offshoot article on growth mindset, a catchy phrase that seemed to leap from the petri dish of education into mainstream self-help VERY quickly. I'm now tempted to look up every term in self-help/"personal growth" in google scholar or the consensus app - purely for entertainment.

Tóth Csaba Dr's avatar

With all the Monty Python this just needs to be here :) https://youtu.be/s_4J4uor3JE?si=cxLQ5CthDo48AWVn Love the article, I use imagined universes to explain social concepts, but never thought Halo this way... Will now!

Peter Tillman's avatar

Thanks for this. Nice examples of 20-20 hindsight. "Science advancing vis funerals" and the exceptions. Nothing is ever simple..... Anyway, very good post! Thank you

Peter Apps's avatar

There are two additional influences to be considered; research with no funding never gets done, and so what funders believe determines the research agenda, and the aim of research has shifted from generating new, coherent knowledge, to graduating masters and doctorate students, for which a rehash of something established serves just as well as actually generating new knowledge, without the unpredictability that makes funders nervous.

Alex Mendelsohn's avatar

With regards to social sciences (and maybe other areas of science), an insightful piece.

With regards to the examples from physics, they appear well wide of the mark.

"There are still physics PhDs trying to prove that the sun orbits the Earth" links to a book where one of the authors' PhD appears to be in theology and religious studies. The other does have a PhD in physics.

Notably, the book was self-published by "Catholic Apologetics International" of which the first author is the president. It is about as far as you can get from an academic article. No one in physics research is talking about whether the sun orbits the Earth.

On the Eddington experiment, to relate the dismissal of experimental data to p-hacking misunderstands experimental physics. There were genuine difficulties with the experimental procedure (see this article by Daniel Kennefick: https://physicstoday.aip.org/features/testing-relativity-from-the-1919-eclipse-a-question-of-bias)

At the cutting edge of experimental physics, problems with data collection are frequent. This is part of the challenge. In papers and theses, these problems are often pointed out with hypotheses given as to why the data doesn't match up perfectly with the theoretical model. This means other researchers can challenge these explanations.

The fact that we know about the problems with the Eddington experiment shows good scientific practice. It is not the same as p-hacking, where samples are deliberately manipulated, with the reasons often not disclosed.