How to drive a stake through your own good heart

OR: The demise of the Optimize Guys

Oct 31, 2023

Article voiceover

0:00

-22:27

I was teaching at Columbia last year when the news broke: our esteemed Ivy League university had been lying for years to increase its standing in the US News and World Report college rankings. According to one of its own math professors, Columbia had juiced its stats by, among other things:

Claiming that 96.5% of its faculty are full-time, when the real number was probably more like 74%
Reporting a 6:1 student-faculty ratio, when the real ratio is between 8:1 and 11:1
Counting the $3.1 billion it spent on patient care at its medical center as spending on "instruction"

Those unscrupulous maneuvers were enough to catapult Columbia from #18 to a tie for #2 on the US News list. As an employee of Columbia, I was ashamed. How can you take a school seriously when its own administrators can't uphold the same Honor Code that they make their students sign?1

But as a teacher, I was thrilled, because I had used the US News rankings as an example for this exact problem and now I could gloat in front of my students for being an incredibly prescient professor.2 I was telling them about Goodhart's Law—“Any measure that becomes a target ceases to be a good measure”3—and college rankings are a textbook Goodhart situation. Ranking schools may sound like a great idea, but it just encourages every school to game the system, rendering the rankings nearly useless.

Some of the strategies schools used to pump their numbers were technically legal but a bit distasteful. Boost your alumni giving rate by begging them to donate even tiny sums. Increase your average SAT score by paying your freshmen to retake it (props to Baylor University for inventing that one). Offer lots of classes that are capped at 19 students so that they're technically under the 20-student cutoff that US News cares about.

Of course, it's way easier just to cheat, so lots of schools did that. Columbia is merely one of many universities caught cooking the books to raise their rankings, joining a Hall of Shame that includes Claremont McKenna, Temple, Villanova, Clemson, USC, George Washington University and the University of Oklahoma. The US News website currently lists 69 schools who have admitted to providing incorrect information in the past three years alone. Some of these could be honest mistakes, but notice that virtually all of them made the school look better than it should. Surely it was a mere clerical error when the engineering school at the University of California, Riverside reported that it spent $68.3 million on research when the actual number was $30.9 million!

This is the inexorable logic of Goodhart's Law: wherever there's a system, there will be people gaming it. Journals prefer to publish papers with “statistically significant” results, so researchers develop all kinds of clever hacks for slipping their stats under the threshold. The Affordable Care Act legally limited health insurance companies' profits, so those companies started buying clinics and pharmacies, where the profits are unlimited. My local library used raffle off prizes by giving kids a ticket for every book they read, so my sister and I would check out huge stacks of the shortest books we could find, speed-read them, and turn them back in for tickets, two lil living Goodharts.4

That's where the discussion of Goodhart's Law usually ends. Oh, the hubris of the people who design these systems! Oh, the nefariousness of the villains who game them! It's too bad we must live in a never-ending cat-and-mouse game where the good-hearted try to fix incentives and institutions faster than the bad-hearted can Goodhart them. But hey, that's why it's called Goodhart's Law, and not Goodhart's Temporary But Ultimately Solvable Problem.

And look, if we all spent a little more time meditating on the inevitable perversion of all incentives and the perpetual struggle to build and maintain systems that work, that would be great. But ol' Chucky Goodhart's observation has a lot more to give us. Goodhart's Law doesn't just explain how bad actors fool institutions. It also explains how good actors fool themselves. That is, we think we're Goodharting each other, but we're often Goodharting ourselves. To show you how this happens, let me tell you about some stupid stuff I did in college.

HOW TO BE YOUR OWN TERRIBLE BOSS

I went to a low-achieving high school in the middle of nowhere; it's currently ranked between #13,261 and #17,680 nationally on US News and World Report's high school rankings (yes, they have that now). When I got to college, I encountered something I had never seen before: people sitting in the library all day studying. Some folks would be in there until 3am!

“These people really work hard!” I thought to myself. “I better keep up!” So I plonked down in the library and got to work. I soon discovered, however, that I couldn’t stay focused for more than an hour or two, let alone until the wee hours of the night. I’d descend into a screen-induced stupor for hours, flipping between tabs, checking Twitter, firing off some texts, scrolling through Facebook5, refreshing the New York Times homepage and Googling “how to tell if eyes too far apart,” etc., with a few seven-second bursts of actual work in between.

I figured my classmates must be some kind of super-students capable of infinite focus. Then I peeked at some of the laptop screens around me and beheld—everybody else was doing exactly the same thing! We had all come to the library to study, and now all of us were goofing off. We were Goodharting ourselves. “Am I studying effectively?” is a hard question to answer. “How long did I spend in the library?” is way easier. Hours-in-library is a useful way to measure effectiveness-of-studying right up until it becomes a target—once you try to maximize your library hours, you find lots of ways to spend them doing anything but studying, and suddenly hours-in-library has nothing to do anymore with effectiveness-of-studying.

(I saw the same thing happening at the gym. There was always a guy there wandering around, lifting a weight or two, doing a squat, watching TV for a while, running for five minutes, checking his phone for a bit, and then wandering around some more. That guy thinks of himself as “spending two hours at the gym,” when he should actually think of himself as “spending 15 minutes working out.”)

You know how lots of jobs force you to hang around and look busy until quitting time, even when there's nothing to do, because your dumb boss only cares about you being there from 9 til 5 and doesn't actually know what you're doing with your time? That's obviously a stupid policy, but what's even stupider is inflicting it upon yourself. Such is the terrifying power of Goodhart's Law: you can create your own system and game it too.

CONGRATULATIONS YOU WON A STUPID GAME, HERE IS YOUR STUPID PRIZE

The point of all that studying was to get good grades. Unfortunately, grades are also a Goodhart trap. In fact, there are two Goodhart traps—one everybody talks about, and one nobody talks about.

Obviously, there's a Goodhart-type standoff between teachers and students. You're supposed to measure students' learning with points, but as soon as those points become a target, they cease to be a good measure. If you give points for attendance, for example, students will show up, sit in the back, and shop for shoes online during class. If you give points for participation, students will dutifully contribute nonsense. (“What I found most interesting about War and Peace was the war parts, but the peace parts were also pretty good.”) If you award points for doing well on the tests, students will badger you mercilessly with one question: “Is this on the test?” This is all classic Goodhart and everybody knows it.

But the most devastating Goodharting happens not between teachers and students, but within the students themselves. I got pretty good at figuring out how to game my college classes: discover the professor's pet arguments and then use them in essays, monitor lectures for hints about what's on the test, memorize all the vocab words and then forget them the day after the final exam, etc. Whenever I deployed these strategies and collected my A, I got this thrilling sense that I won. But what did I win? The biggest booby prize of all: a good grade that required zero learning. And I actually did want to learn! I thought I was Goodharting my professors, but I was only Goodharting myself.

That's the tragedy of Goodhart's game: even when you win, you lose, because the game is stupid. As I wrote in I wanted to be a teacher but they made me a cop, I tried to stop my own students from attempting to win the stupid game of grades, with little success:

On the first day of class, I would explain to students that, although I don’t see the value in grades and they are apparently secret anyway, I was contractually bound to give them. I told students they shouldn’t care about these grades, and they should instead “define success” for themselves, which also happens to be one of the key tenets of good negotiation. “Grades are Monopoly money,” I explained, “And if you, for instance, claim that you’re sick when you’re really not, all you’re doing is stealing Monopoly money.” This always got a big laugh, and then minutes later I’d have students come up to me after class saying, “But seriously, how do I get some of that Monopoly money?”

So yes, it's usually possible to finagle a good grade in a class without actually learning much. We act as if those students have stolen something from their teachers, when really they've only stolen from themselves, spending a whole lot of money and time in order to avoid getting educated.

STEVE GOES BEAST MODE

Speaking of stupid games, my college friends and I used to play a lot of Halo, which is a video game where you're a a bionically-enhanced, emotionally stunted super soldier who shoots aliens. I was way better than my friends, because my low-stress high school afforded me a lot of alien-shooting time. I was so much better, in fact, that I won literally every match we played—except one.

On some random Tuesday, my friend Steve went, as they say, beast mode. I watched with increasing desperation as he neared the game-winning 25 kills while I was only at 15, praying for a miraculous comeback that never came. As Steve crossed the finish line, I flew into a fury, leaping up and saying something like, “It doesn't even count! You could never beat me on-on-one!” And everyone was awkwardly like, “Okay, sure!” until I got embarrassed and calmed down.

Why did I care so much? The point of playing Halo is to have fun with your friends. But “have fun with friends” has no number attached to it. “Score points” does, and I had gotten so focused on scoring points that I forgot the only reason to score points is to have fun. Goodhart: 3, me: 0.

It's not just me, of course. I've seen people melt down over lots of dumb games: Monopoly, Scrabble, Mario Kart, mini golf, charades, bowling, trivia, and croquet. Losing your cool after losing a game is mildly mortifying, but if 15 minutes of struggling for made-up points is enough to turn you into a howling, friendship-shredding monster, imagine how low you can sink when there's actual money and prestige on the line. If you take a high-paying job to "provide for your family" and then you never see that family and they end up hating you, if you claw your way to the top only to find that you still feel as empty up there as you did at the bottom, if you get so obsessed with success that you lie and cheat to get it, then guess what: you got Goodharted.

THE DEMISE OF THE OPTIMIZE GUYS

How do we stop Goodharting ourselves?

If you break open a Goodhart situation, you'll find two components inside: quantification (measuring something) and optimization (targeting that measure). Neither of these things is inherently bad, and neither is their combination. If you want to build more fuel-efficient cars, save enough for retirement, or pick the right AirBnB for a family reunion, you need to do some quantification and optimization.

The problem is that no measure is ever perfect, and the more you rely on a measure, the more you multiply its imperfections, and the more consequential they become.

I learned this the hard way when I was ten years old and had what I thought was a life-changing revelation: I should just rush through all the boring things I have to do, so I can spend more time doing fun stuff. If I just scarf down my lunch, run home from school instead of walking, shower in 90 seconds, etc., then I can use all that extra time to play video games and watch TV, and my life will be awesome. It seemed like the most obvious idea in the world—just do whatever it takes to increase the number of minutes spent on Nintendo and cartoons.

This turned out to be a disastrous policy. Hurrying all the time stressed me out. It infused my life with this constant sense of “c'mon let's go already!” that seeped even into the things that were supposed to be fun—as I sat playing Pokémon, I wondered, “But am I having enough fun?”, which is a good way to have no fun at all. Also, showering is perfectly pleasant if you let yourself luxuriate a bit, and 90 seconds wasn't enough to remove my preteen stink anyway. After a frazzled, disappointing, and smelly two days, I went back to normal, unhurried life.

Why didn't it work? The culprits were both quantification and optimization. First, “minutes spent playing Nintendo and watching cartoons” was an imperfect measure of “living a satisfying life.” Not all minutes spent at the TV are actually fun, and not all minutes spent away from it are boring. Second, targeting that measure not only eliminated all of my fun non-TV time; it also sapped some of the fun from the TV time itself.

I was, at 10 years old, trying to be what I now think of as an Optimize Guy. This is the kind of person who's always trying to engineer the best life for themselves, whether it's through replacing their meals with nutritive slurries, or keeping dossiers on their professional contacts so they can network more effectively, or obsessively tracking their sleep and calibrating the dimness and temperature of their bedroom to squeeze out another minute of REM. If you just cobble together enough life hacks, the thinking goes, you'll end up at the Good Life.

The error of the Optimize Guy is believing that life is stupidly simple. They're right that life is full of many stupid, well-defined problems that can be overcome with engineering, but these aren't the important problems. At the center of life is a tangle of mysteries—irreducible woo, if you will—and no amount of quantification nor optimization is going to solve them for you.

But Optimize Guys have a hard time figuring that out, because the engineering approach can fool you into thinking that you're making progress when you're really not. If you try to optimize, say, the number of dollars in your bank account, it will seem like you're succeeding as long as that number is going up. Your investment portfolio can't count the toll you're taking on your relationships, and you might not realize that toll until it's too late and your retirement party is full of people who are glad you're leaving, and would prefer it if you were already dead.

That's why, while Optimize Guys sometimes end up rich and successful, they're never someone I look up to. They always seem like they're solving all of their problems except the ones that really matter. Even as all of their key performance indicators increase, they never get less anxious or more satisfied. They're envied by many and admired by none—except, of course, other aspiring Optimize Guys. And whenever I slip into my ever-latent Optimize Guy persona, I always end up less happy, usually by fretting over whether I'm having the most fun possible.

It's good to have money, it's good to be successful, and it's good for people to like you. It's bad to optimize for any of those things. Optimize for money and you'll never enjoy what you earn, optimize for success and you'll end up powerful but hated, and optimize for social approval and you'll become a spineless people-pleaser with no sense of self.

REJECT THOUSANDS, MAKE MILLIONS

The best way to avoid getting Goodharted, then, is to look at Optimize Guys and learn what not to do. The quintessential Optimize Guy is Sam Bankman-Fried, the former crypto billionaire who is standing trial after making lots of people's money disappear. But his example isn't actually that instructive, because when you see an Optimize Guy fail, you can always reason that he simply didn't optimize correctly. No, if you really want to learn something, you have to see how terrifying it looks when an Optimize Guy succeeds.

For such an example, may I suggest Richard M. Freeland, president of Northeastern University from 1996 to 2006, who successfully maneuvered his institution from the middle of the pack to the top tier of the US News and World Report rankings. To quote his Boston Magazine profile:

Freeland swept into Northeastern with a brand-new mantra: recalibrate the school to climb up the ranks. “There’s no question that the system invites gaming,” Freeland tells me. “We made a systematic effort to influence [the outcome].” He directed university researchers to break the U.S. News code and replicate its formulas. He spoke about the rankings all the time—in hallways and at board meetings, illustrating his points with charts. He spent his days trying to figure out how to get the biggest bump up the charts for his buck. He worked the goal into the school’s strategic plan. “We had to get into the top 100,” Freeland says. “That was a life-or-death matter for Northeastern.”

Freeland succeeded: Northeastern was ranked #162 when he took the job, and it rose to #98 just after he retired. Now it stands at #53. For an Optimize Guy, this is a huge win.6

It's tempting to look at Freeland's career with awe, but the appropriate feeling is horror. Everybody agrees that the US News rankings are stupid. Rising in those rankings is also stupid. Freeland himself wrote that a university president's main job is to articulate the institution's values; the value he articulated was “make a magazine give us a lower number.”

And the world is worse for Freeland's efforts. For instance, one way Freeland upped his numbers was by encouraging as many applications as possible, so that he could issue lots of rejections and thus artificially lower Northeastern's acceptance rate7. Applications also net the school lots of fees—Northeastern charges each applicant $100, meaning the school made up to $9 million from rejecting 90,393 students in 2023 alone.8 Literally no one benefits from this scheme in which thousands of 18-year-olds pay to receive rejection letters in the mail, except perhaps Northeastern's current president, who lives in a $8.9 million apartment and gets chauffeured to work.9

(Other less-than-virtuous strategies employed by Freeland and his successor: recruiting low-achieving but rich students from abroad who could pay full tuition, networking extremely hard to earn higher ratings from other institutions on US News' annual survey, deliberately responding with junk data to that same survey, and personally lobbying the guy behind the curtain at US News to rejigger the rankings to Northeastern's benefit.)

It's not even clear if Freeland's efforts were good for the students who actually go to Northeastern. As a “former administrator” put it, “Freeland was much more focused on the ratings and not necessarily on what it took to improve the quality of the institution.” Which makes sense, because US News can't actually measure the things that makes a college good—things like how much students learn, whether it provides a place for students to become the best version of themselves, etc.—and those are exactly the kind of touchy-feely liabilities that schools have to ditch if they want to supercharge their stats.

(Also, I have to mention this delightfully apropos RateMyProfessor review of Freeland from when he returned to teach a history class: “he is an extremely harsh grader and only once you understand how arbitrary his grading system is will you do decently.”)

I'm picking on Freeland here because he was so brazen and proud about how he gamed the rankings. Most schools do this, of course, just less well, and only some of them get caught. There are no heroes here; it's all dumb.

That's what you have to recognize if you want to bust out of your personal Goodhart hell. People will cheer for you even as you're Goodharting yourself: “Way to go jumping through those hoops!” “Congratulations on being the best at playing the game!” “You made the number go up, wahoo!” I have wasted a good chunk of my life chasing exactly that kind of praise. I thought I was winning, but the only way to win Goodhart's game is to walk away.

And you should believe me on that. I must be very smart and trustworthy because I used to teach at a university that was, until recently, ranked as the second-best in the world!

The deans approving the inflated numbers may well have been the same deans approving the expulsion of students who cheated on their homework. (Just kidding, of course, they’d never have one dean do that much work.)

If you want more from this incredibly prescient professor, you can get my whole crash course on negotiation here.

The original formulation of Goodhart's Law was far less pithy and merely appeared in a footnote in one of his books. According to Wikipedia, the formulation that everyone uses comes from the anthropologist Marilyn Strathern. In Goodhart's words, “It does feel slightly odd to have one's public reputation largely based on a minor footnote.”

My sister and I actually read the books, but our less scrupulous competitors didn't even do that. If you're reading this and you won that inflatable Garfield cupholder that can float in a swimming pool, providing hours of fun and relaxation on a hot summer day, and you got it stuffing the raffle box with fraudulent tickets, gimme my Garfield, you swine.

This was in the earlier days of Facebook, when it was a place for young adults to hang out, before it became a place for old adults to advocate the overthrow of the government.

This is, however, a drop from its #44 spot last year, thanks to a change in US News methodology. That's the thing about gaming the system: once you start, you can never stop.

US News removed acceptance rates from their rankings a few years ago, but rock-bottom admission rates continue to generate headlines every year, and they almost certainly help schools with their important peer assessment scores.

I can't find data on how many students apply for and receive application fee waivers, so take this as an upper bound.

That was the price in 2007; it's probably worth way more than that now.

Elea

You said such a variety of treasures in here, but I'm just stuck on the visual of a dispirited 18 year old sitting in their shitty car with a rejection letter that they paid to get. Driving home the point with a bulldozer, check!

Sky Fairfield

Nov 1, 2023

Some good points and concepts. I must say, though, you fall into a trap (another one!) that I see a lot of professors fall into: inaccurately believing that grades /do/ not materially impact students' lives just because they /should/ not.

You think of it as Monopoly money--for the students, these arbitrary nonsense numbers literally determine their entire livelihoods. Can they keep their financial aid? Can they graduate? Can they get a good job after school?

Telling them the grades don't matter and are meaningless to /you/ doesn't help /them/ unless the next part of that speech is, "so you'll all get As."

5 replies by Adam Mastroianni and others

44 more comments...

Experimental History

Discussion about this post

Ready for more?