Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

Political pollsters are pretending they know what's happening. They don't.


This week I have to explain a little statistics, because the political news media have failed so badly.

If you’re like me, you’re probably exhausted with all the so-called news stories about the latest U.S. presidential polls. After each poll, which seems to be a daily occurrence as the election draws near, news outlets report the results on their front pages with breathless excitement. The race is changing! Harris is pulling ahead! Trump is catching up! Swing states are swingier than ever!

To cite just one example, CNN just reported that “Polls show Harris’ numbers in Pennsylvania have shifted over the past month.” This story included a video featuring polling expert Harry Enten, who stated “there has never been a race this close in the polling since 1972!” OMG! (The “OMG” is from me, not Enten.)

Okay, now let’s get to reality (with a small dose of statistics).

A poll is nothing more than a tiny sample of voters’ opinions. A poll might ask, say, 1000 people who they plan to vote for, and then report the results. If you ask a truly representative sample of people, a poll can give you a pretty good idea of how the candidates stand.

The problem is, this is really hard to do accurately, and it’s become much harder since everyone switched to cell phones. Decades ago, pollsters would phone people, and those people would answer the phone. Not any more: many people won’t answer a call from a number they don’t know, and people who do answer might have a bias towards one party or the other.

So pollsters compile some numbers and then adjust them (more on that below), and then report the results with a “margin of error,” which goes something like this. Suppose that a poll finds Trump leading in Nebraska by 18%, with a margin of error of 4%. That means that he might be leading by anywhere from 14% to 22% – and that’s just one pollster’s guess. He might be leading by 30%, or even losing, if you ask a different pollster.

But a 4% margin means that if you run the poll again and again, the results will swing back and forth, randomly, in a pretty wide range.

That’s exactly what’s happening in Pennsylvania. The voters are split, but every poll shows a slightly different result, because that’s what random sampling does. It’s not news!

So let me give you some actual data. Let’s look at Pennsylvania because it seems to be the closest Presidential race, according to the polls. Here are the actual margins of victory from the past 9 elections - real numbers, not polls:

  • 2020: Biden (D) won by 1.2%
  • 2016: Trump (R) by 0.7%
  • 2012: Obama (D) by 5.4%
  • 2008: Obama (D) by 10.3%
  • 2004: GW Bush (R) by 2.5%
  • 2000: Gore (D) by 4.2%
  • 1996: Clinton (D) by 9.2%
  • 1992: Clinton (D) by 9.1%
  • 1988: GHW Bush (R) by 2.3%

Clearly, Pennsylvania has been closely divided in recent years. Suppose the race this year will eventually be won, by either candidate, by less than 2%, as happened in 2016 and 2020. Then what would we expect polls to show us? Given their typical 4% margin of error, we’d expect polls to show a race that flips back and forth from one poll to another, even if no one is changing their mind.

And that, I argue, is just what we’re seeing. The polls aren’t accurate enough, statistically speaking, to tell us anything other than “we don’t know who will win Pennsylvania.” But the media reports each one as if it’s a revelation.

Now back to those “adjustments” that pollsters make. In 2016 and 2020, the polls were off by quite a lot. In 2016, as everyone knows, pollsters were highly confident that Hillary Clinton would win, and they were wrong. They were almost as confident that Joe Biden would win in 2020, and they were right–but the race was closer than they predicted, and their estimates were once again off. In the 2022 midterm elections, though, they were wrong again, but in the opposite direction, and Democrats did better than forecast.

Did they over-correct after 2020, and is that why they predicted much bigger Republican gains in 2022? And have they fixed that now, so that polls this year will be spot on? Who knows?

The thing is, it’s very hard to figure out who will actually show up to vote, and who might answer the phone when a pollster calls, and whether they’re even telling the truth. So pollsters make statistical adjustments based on past experience, weighing some voters more than others. In general, they don’t tell the public precisely how they do this.

Are these adjustments accurate? Well, here’s the kicker: we won’t know until after the election! But one thing is almost certainly true: the changing polls are overstating the number of people who are changing their mind.

So I have a suggestion to the media: stop reporting every poll as if it’s news. Instead, tell us where the candidates stand on issues that really matter: support for Ukraine, support for Israel, health care policy, immigration, respect for the rule of law, stuff like that. I know, crazy stuff, right?

I realize that my plea will fall on deaf ears. It’s so much easier for The Washington Post, CNN, The New York Times, Fox News, and others to write about and talk about polls, and pretend these are actual news. It’s also lazy.

What's the limit of the human lifespan? And what do World War I veterans have to do with it?

Graph showing lower rate of mortality (blue) in people aged
90-95 versus the rate in people aged 50-55 (orange). Figure
from S.J. Newman (2018) Errors as a primarycause of late-life
mortality deceleration and plateaus. PLoS Biol 16(12):
e2006776. https://doi.org/10.1371/journal.pbio.2006776
An intriguing phenomenon has emerged in recent years: among very old people, the rate at which people die appears to decline when they get past a certain age. In other words, as these authors claimed in their 2011 book, aging slows down and maybe even stops. Or at least the mortality rate levels off past the age of 100, according to another study published earlier this year. This has led some scientists to speculate that the upper limit on human lifespan may be much older than anyone alive today.

Not so fast, says a new study by Saul Newman in PLoS Biology. Newman looked at the data and found something quite different: it's all just a mistake. Well, perhaps not a mistake exactly, but a consequence of many small errors. Let me explain.

In almost all species, mortality rates increase with age. In other words, as you get older, your likelihood of dying in a given year slowly but inexorably increases. Intuitively, we all know this: if young people die, it's tragic because we don't expect it. When people in their eighties and nineties die, it's sad, but no one is really surprised.

The evidence for decreasing mortality among very old humans has emerged from a number of studies that provide seemingly solid evidence that people over 100 die at the same or even lower rates then people between 80 and 90, or between 90 and 100.

Not surprisingly, many people would like to believe that human lifespan is unlimited. Indeed, it's one of the hottest topics in Silicon Valley these days. And perhaps someone will invent some true life-extension technology someday. But Newman's analysis pours cold water on the notion that our natural longevity is unlimited.

One difficulty with studying very old people is that there simply aren't that many of them, so the studies tend to be small. Another problem–and this is what Newman zeroes in on–is that we don't have very good birth records for people over 100 years old. They were born a long time ago, when record keeping wasn't always so good. What if there are a few errors?

It might seem that this shouldn't matter, as long as the errors are random–in other words, as long as people's ages are both under- and over-estimated at the same rates. The problem is that even if the errors are random, they don't play out that way. Here's why.

For the sake of argument, let's imagine a set of people whose true ages are off by 5 years in either direction. (I know that's a lot, but bear with me.) By the age of 100, as Newman points out, virtually no one is alive from the cohort that underestimated their age; these are people who have a true age of 105. But many more will be alive from those who overestimated their age; these are the 95-year-olds who think they're 100.

Newman's paper points out that if only a few people are overestimating their age, this can cause mortality rates to flatten or decelerate–or at least they appear to decelerate, because these people aren't really as old as we (or they) think they are. He then shows, in considerable detail, that only a very small error rate is more than enough to explain all of the apparent decline in mortality rates from recent studies. In other words, the decline in mortality is simply an illusion.

What does World War I have to do with any of this? Newman explains:
"approximately 250,000 youths inflated their ages to enter the 1894–1902 birth cohorts and fight for the United Kingdom in World War I."
The same thing happened in the U.S. and other countries: 16- and 17-year-old boys said they were 18 so they could sign up. Coincidentally, these men would have been around 100 years old when many of the recent studies of centenarians were conducted, and it's very likely that some of these men were included in those studies. It wouldn't take many to distort the apparent mortality rates.

Who could have imagined that these brave young men who signed up to fight for their country (my grandfather was one of them), so many years ago, would have this completely unexpected effect on the science of aging, almost exactly 100 years after the war ended? It seems somehow appropriate that today, as the last veterans of the Great War leave us forever, they can still remind us of their sacrifice.

Stop teaching calculus in high school

Math education needs a reboot. Kids today are growing up into a world awash in data, and they need new skills to make sense of it all. 

The list of high school math courses in the U.S. hasn’t changed for decades. My daughters are taking the same courses I took long ago: algebra, geometry, trigonometry, and calculus. These are all fine subjects, but they don’t serve the needs of the 21st century. 

What math courses do young people really need? Two subjects are head-smackingly obvious: computer science and statistics. Most high schools don’t offer either one. In the few schools that do, they are usually electives that only a few students take. And besides, the math curriculum is already so full that some educators have argued for scaling back. Some have even argued for getting rid of algebra, as Andrew Hacker argued in the NY Times not long ago.

So here's a simple fix: get rid of high school calculus to make way for computer programming and statistics.

Computers are an absolute mystery to most non-geeks, but it doesn’t have to be that way. A basic computer programming class requires little more than a familiarity with algebra. With computers controlling so much of their lives, from their phones to their cars to the online existence, we ought to teach our kids what’s going on under the hood. And programming will teach them a form of logical reasoning that is missing from the standard math curriculum.

With data science emerging as one of the hottest new scientific areas, a basic understanding of statistics will provide the foundation for a wide range of 21st century career paths. Not to mention that a grasp of statistics is essential for navigating the often-dubious claims of health benefits offered by various "alternative" medicine providers. 

(While we're at it, we should require more statistics in the pre-med curriculum. Doctors are faced with new medical science every day, and statistical evidence is the most common form of proof that a new treatment is effective. With so much bad science out there (just browse through my archive for many examples), doctors need better statistical knowledge to separate the wheat from the chaff.) 

Convincing schools to give up calculus won’t be easy. I imagine that most math educators will scream in protest at the mere suggestion, in fact. In their never-ending competition to look good on a blizzard of standardized tests, schools push students to accelerate in math starting in elementary school, and they offer calculus as early as the tenth grade. This doesn’t serve students well: the vast majority will never use calculus again. And those who do need it - future engineers, physicists, and the like - can take it in college. 

Colleges need to adjust their standards too. They can start by announcing that high school programming and statistics courses will be just as important as calculus in admissions decisions. If just a few top universities would take the lead, our high schools would sit up and take notice.

We can leave calculus for college. Colleges teach calculus well, and 18-year-old freshmen are ready for it. Every major university in the country has multiple freshman calculus courses, and they usually have separate courses designed for science-bound and humanities students. Many students who take high school calculus have to re-take it in college anyway, because the high school courses don’t cover quite the same material. 


Let’s get rid of high school calculus and start teaching young students the math skills they really need.

Guest post at Simply Statistics

This week I was invited to do a guest post at Simply Statistics, the popular blog site run by three of my colleagues in Biostatistics.  I wrote about the crisis in R01 funding, the most important source of funding for most biomedical scientists in the U.S.  See the post here.

Have another cup o' joe, it's good for you

My favorite science studies are the ones that tell us that what we're already doing is good for us. This story fits the bill. In the American Journal of Epidemiology this month, Janet Hildebrand and colleagues reported on a large study looking at the effects of coffee on throat cancers.

Let's get right to the good news: drinking coffee seems to reduce your risk of death from oral or pharyngeal cancer by about 50%. Drinking more coffee is better than drinking less, and drinking caffeinated (normal) coffee is better than decaf.

I knew there was something wrong with decaf.

Now for some details. This study is part of an enormous project, the Cancer Prevention Study II, with over 1 million participants who've been followed for 30 years. The participants regularly fill out questionnaires answering a variety of questions, including how much coffee and tea they drink. After excluding people with missing information about coffee drinking and those who already had cancer in 1982, the researchers still had over 950,000 people. Coffee drinking was categorized based on daily consumption: less than a cup (or none), 1-2 cups, 3-4 cups, and more than 4 cups. They asked about decaf coffee and tea drinking as well.

A study like this is hard to do well, because there are so many confounding variables, especially smoking. Smokers have a dramatically higher risk of throat cancer, and smokers also drink a lot of coffee. Hildebrand and colleagues did a good job at separating out this effect, looking at the risk of cancer in nonsmokers separately and adjusting the statistics accordingly.

Perhaps the most encouraging finding is this: in people who have been nonsmokers for at least 20 years, 1-2 cups of coffee per day corresponds to a 32% decrease in the risk of death from throat cancer.  More than 2 cups per day corresponds to a 64% decrease. And among all participants (including former smokers), more than 4 cups a day seemed to provide the greatest benefit.

Decaf coffee also seems to reduce the risk of fatal throat cancer, though not quite as much. Tea drinkers, in contrast, don't seem to get any benefit, not for this type of cancer.

All this comes with some very big caveats.  First, despite the very large size of the study, the number of deaths oral or pharyngeal cancer was very small, only a few hundred. (Oral/pharyngeal cancer is very common worldwide, but less common in the U.S., where this study was conducted, with about 7850 deaths per year. This includes cancers of the tongue, mouth, and pharynx.)  So the absolute risk is very small.  Another big caveat is that this study only looked at cancer deaths - it did not measure the risk of getting cancer in the first place.

But skepticism aside, drinking coffee seems to reduce the risk of oral cancer.  This confirms my long-held view that the three major food groups - coffee, chocolate, and red wine - are all good for you.  So the next time you feel like a second cup, or a third: drink up!

How to predict an election? Ask the math geeks.

Mark Newman's rendering of the 2012 U.S. election,
 weighted by population

It's time for a bit of gloating.  No, not for Democrats over Republicans, though I'm sure that's going on.  It's time for the math geeks to throw a bit of scorn at those insufferable, over-confident frat boys who call themselves political prognosticators, and who spent most of the past two years telling us that they knew how the election would turn out.  They bloviated endlessly on talk shows, explaining why their favored candidate would win, and how he would do it.  

Politicos behave just like promoters of quack treatments when things go wrong: they always have a ready answer, and somehow their "theories" can never be proven wrong.  It seems that the only thing these guys are really expert at is getting themselves onto talk shows. Now that the election is over, let's hope that happens a bit less often.

Instead, pay attention to the math geeks.  The statisticians and analysts who build mathematical models based on multiple polls and other data absolutely nailed this election.  Nailed it!  Nate Silver at FiveThirtyEight blog predicted the winner of the presidential election in all 50 states.  So did Sam Wang , a biophysics professor at Princeton, over at the Princeton Election Consortium blog.  And so did Simon Jackman, a political science professor at Stanford who writes for HuffPo.  Nate Silver first drew everyone's attention in the 2008 election, when he correctly predicted 49 out of 50 states.  Last week's success shows that this is not an anomaly, although it has the mathematically challenged pundits in a tizzy.

Hopkins statistics professor Jeff Leek wrote a nice explanation of how these models work over at Simply Statistics, so I won't explain it here.  Suffice it to say that mathematical models don't work by chattering with their buddies at political rallies.

Mathematics delivered the goods.  And make no mistake, this is the way of the future.

That hasn't stopped the punditocracy yet.   On election night, Republican hatchet man Karl Rove was sputtering on Fox News that Romney could still win, after Fox News itself - which is little more than a media arm of the Republican party -called Ohio and the election for Obama.  Rove, who predicted that Romney would win with 285 electoral voites, also orchestrated the spending of over $127 million on Romney, not to mention his spending on 12 Senate candidates, 10 of whom lost.  Has he admitted he did anything wrong?  Nope.

Rove wasn't the only one wrong.  As Techcrunch pointed out
"every single major pundit was wrong - some comically wrong."  
The Atlantic created a detailed score sheet listing all the pundits and their predictions of the overall winner, the electoral college total, and the winner in all the swing states.  And indeed, even those who predicted correctly that Obama would win got most of the swing states wrong.

Here's what needs to happen.  The television networks need to realize that political expertise is meaningless when it comes to making statistical predictions.  Let's treat political forecasting just like weather forecasting, using models that are demonstrably accurate (such as Silver's).  Television stations can hire attractive political "forecasters" (because physical appearance matters on TV, like it or not) who will describe the latest forecasts just like today's weather forecasters do.  Now that I think of it, why not let the weather forecasters do both jobs?  We already have them in place at every local TV station in the country.  Think of all the money the networks will save.

But what about all that air time they need to fill with talking heads arguing about who will win elections?  Well, this makes about as much sense as having two self-proclaimed experts arguing about whether it's going to snow this weekend.  Maybe they can find real experts who will argue about issues rather than about who's ahead in the polls.

Ha ha, just kidding!  Who wants to hear about issues?  But if you must know how the race is going, ask the math geeks.