STATS ARTICLES 2004
Twilight Of The Pollsters
October 30, 2004
Ana Marie Cox (aka Wonkette)
Polling is inextricable from modern political journalism.
Everybody, all together now: “The only poll that matters is the one on Election Day.” This is the favorite cliché of reporters who write about polls and of the candidates who appear to be losing them. You hear it a lot in the lead-up to any election. And .it seems like a true enough statement, except that pre-election polls matter. They influence both how voters perceive candidates and how reporters cover them – and, of course, they influence the candidates. As imperfect as science as polling is, it is inextricable from modern political journalism.
Perhaps, in a perfect world, the only poll at all would be the one on Election Day, and up until then, reporters would merely analyze the candidates’ policy positions and the candidates would just have to defend them. . . And we would have candy for breakfast and pennies would rain from the sky. In any event: The purpose of this guide is to minimize how reporting on polls distorts what they actually represent.
Margin of error
After the first presidential debate, news organizations were eager to show how the face-off changed the landscape of the race. Unfortunately, the polls didn’t cooperate. Or they showed different landscapes.
Poll Results Show Race for President Is Again a Dead Heat (NYT)
Bush Leads Kerry by 5 Percentage Points, ABC-Post Poll Finds (Bloomberg)
At least the Times was being honest. Their poll had the race split at 47/47. The ABC-Post poll is less clear cut: It gives Kerry 51 percent and Bush 46, but in a poll with a margin of error of +/- 3 points, 5 points is also a tie, albeit the headline writer’s least favorite: a “statistical one.” Remember, margin of error works on all poll results, meaning both Bush and Kerry's scores could be as much as 3 points either way. So the poll could just as easily represent a populace that’s 49 percent behind Bush and 48 percent behind Kerry.
But the Times is also spinning their poll. They want this “dead heat” to look like a new development. But look at the previous month’s worth of polls:
Ending 9/22 Bush 49 Kerry 41
Ending 9/16 Bush 49 Kerry 41
Ending 9/8 Bush 50 Kerry 42
Ending 9/1 Bush 50 Kerry 42
Figuring in the margin of error, this race starts to look like it’s been very, very close all along. That reporters make slight changes seem significant is understandable: It’s more fun to report on a race where lead goes back and forth. It’s just not, in this case, what’s actually happening.
Sampling: It’s not just for rap songs
A sample is the group of people who participate in a poll. A poll cannot be accurate unless its sample is drawn from a representative group, and it is also drawn randomly from that group.
If, for instance, a pollster wants to determine the attitudes of senior citizens toward a Medicare policy, he’s going to limit the sample to those of retirement age. But beyond that, the sample should include seniors from across the country, in every income bracket, of both genders, etc.
Further, every person in the target group should have an equal chance of being selected. Of all seniors, each would have an equal chance at being selected. In a poll on candidates, every voter would have an equal chance of saying whom they would vote for. In a poll on presidential job approval, every American would have an equal chance of stating their opinion of the President’s performance.
For the probability of any one person being selected to participate in the poll to be equal, the selection must be random. "Random" does not mean haphazard, as in stopping shoppers at a mall. Rather, there must be systematic randomness to ensure that each individual is chosen entirely by chance. Systematic sampling, in mathematical terms, means selecting every Nth person out of a group. For pollsters, it means selecting every Nth person out of a group that matches the target population – likely voters, say, or registered voters – as closely as possible.
This is where, in the real world, things gets tricky.
For instance, as all major polling firms use the telephone to find and call respondents, anyone who doesn’t have a phone has already been eliminated. Of course, as this is a relatively small percentage of the population, it doesn’t – statistically – have much impact. This year, however, we’ve heard lots of talk about how those without land lines – who use only cell phones – might effect polling. Because they are younger and less likely to vote and less likely to be registered, it’s unclear that the effect has been great.
A more persistent and problematic issue is the time of day when the calls are made. For example, shortly after the 2004 New Hampshire primaries, The Washington Post [Jan 23] criticized polling giant John Zogby for his reliance on day-time polling. According to the Post, “about 30 percent of the people in his samples were called during the day.” This allows Zogby to finish his polls more quickly and to thus be the first to grab a headline, but it also means that his samples contain a disproportionate number of retirees and housewives.
Almost all polls over-represent friendly or sociable people whose disposition makes them more likely to talk to a pollster. In the U.S., respondents refuse to participate in phone survey at very high rates, sometimes as high as 60 to 80 percent. This is the Achilles heel of polling, because if non-respondents are different in any quantifiable way (aside from their inability to say “no”) from those who are eventually selected, a fatal bias may slip into the sample. Because we can’t know for sure how these non-respondents are different – they would have to respond to a survey – there’s no way to tell if all polls do suffer from this kind of bias.
"Undecideds" also plague pollsters. If respondents who say they are undecided are members of a political party, pollsters assume they will eventually "come home" to either the Democratic or Republican side. As Pollster Dick Morris asserts, "If there is one axiom which has emerged in modern American politics it is this: In a race involving an incumbent, the challenger always gets virtually all the undecided voters on election day."
But this is a tendency, not a scientific rule. In 1992, Gallup assigned five out of six undecided voters to Bill Clinton, on the assumption that undecideds would favor a challenger. Their final pre-election poll showed a 49-37-14 distribution for Clinton, Bush, and Perot, respectively, giving Clinton a 12-point lead over Bush. But the actual vote split 43-38-19; giving Clinton just a five-point lead over Bush. Bottom line: The undecideds did go for a challenger, just not the one Gallup assumed.
In addition, undecideds can also be “pushed.” That is, a respondent that says he is unsure of who he will support is asked which candidate he feels he will most likely support (who he’s “leaning toward”), thus eliminating “undecided” as category, but inevitably creating a poll that can appear to show an exciting race with more changes in who’s in the lead. A sharp reporter should always note the “softness” of polls that include leaners in their tabulations.
Election polls and sampling
For polls that attempt to gauge election outcomes, sampling is more complicated than for mere opinion polls because the voting public that actually shows up on Election Day is very different from the general population. And an election poll whose respondents are drawn from a pool of all adults will not be accurate.
According to the 2000 Census and election results, only 70 percent of those eligible to register to vote did so. Thus, an election poll that draws from a pool of registered voters is more accurate. But only 86 percent of registered voters made it to the polls. (These factors together result in our notoriously low overall voter turnout: Of all those eligible to register, only 60 percent voted.)
What pollsters really want, then, is a pool of not just those eligible to vote, not just those register to vote, but those who are LIKELY to vote. This distinction can make a big difference: For instance, the Oct. 19, 1992 Gallup/CNN/USA Today presidential preference poll showed Clinton over Bush by 48 percent to 30 percent, an enormous 18 point gap (with 15 percent for Perot). The race took a dramatic turn on Oct. 26, when the same poll found that Clinton's lead over Bush had shrunk to 42-36, a mere 6 points. A tracking poll a scant three days later found Clinton 41, Bush 40.
What had really changed? While the press played the story as a last-minute Bush surge, the closing of the gap coincided with a sampling switch. After October 19, Gallup/CNN/USA Today stopped sampling all registered voters; starting October 26, they reported the results for "likely voters."
A less dramatic but more recent example occurred in the 2004 Democratic primary in Wisconsin. An American Research Group poll released Feb. 12, and conducted on Feb. 11 and 12, showed Kerry with an apparently insurmountable edge: Kerry 53, Edwards 16, Dean 11. A Zogby poll released on Feb. 15 showed an almost as dramatic lead: Kerry 47, Edwards 20, Dean 23. Zogby conducted this poll Feb 13-15. One way to explain the discrepancy is that Zogby polled “likely primary voters” and ARG simply “registered voters” – the more engaged attitude of likely voters might account for the stronger support of the lesser-known candidates.
But who is a likely voter? To target their real quarry, pollsters use various debatable assumptions. Some use screening questions ("Are you following the news?" "Did you vote in the last election?") to identify likely voters based on their past behavior as well as their stated intention to vote.
But you can’t just take a voter’s word for it. Many people say they are likely voters when in fact they are not. About a third of those who say they will vote never show up. Being a "likely voter" is, after all, based on an expressed intention to do something socially desirable (like going to church or giving to charity), which we all tend to overstate.
And this is particularly true of young people. While 18 to 34-year-olds disproportionately register as Democrats, and 75 percent tell pollsters that they are "extremely likely" to vote, only about 30 percent actually do, even in a presidential year. Pollsters have to weight the sample based on these predictive assumptions. In many polls, likely voters will be a sub-sample, and predictions about their behavior will have a wider margin of error.
None of these methods are foolproof. One way to identify a misfire is to look at what percentage of respondents the surveyor identifies as likely to vote. For example, a New York Times poll on the 2000 senate race between Rick Lazio and Hillary Clinton classified 87 percent of the registered voters it surveyed as "likely voters." But never in New York history have 87 percent of registered voters actually gone to the polls. Out of 10.1 million registered New York voters in 1996 presidential election, only 5.9 million voted, a mere 58 percent.
Approval Ratings: Thumbs Up or Thumbs Down?
Next to the percentages generated by the traditional horserace question – “Who would you vote for?” – the most worried-over number in presidential race polling is the approval rating. The approval rating seems like a nice, solid number – a way for the public to give a president a “grade.” But the significance of the approval rating is not nearly as clear as that nice, specific number suggests.
Step back in time to this spring, when several different polling organizations released new approval ratings for President Bush:
A Gallup poll released on May 6 found that 49 percent approved of the job Bush was doing as president and 48 disapproved.
A Quinnipiac University poll released on May 5 found that 46 percent approved and 47 percent disapproved.
A CBS/NYT poll released on April 28 also found that 46 percent approved and 47 percent disapproved.
The polls themselves did not contradict each other to any great degree (especially considering that each one has a 3-point margin of error), but the stories about these polls sure did.
Poll: Bush approval rating holds steady – Providence Journal, May 5
Bush Approval on Iraq, Economy, and Terrorism at Low Points – Gallup Poll News, May 5
The blame for these mixed messages was not solely the fault of headline writers. Making definitive statements about what an approval rating means is almost impossible for anyone, because approval rating’s apparent objectivity is derived from a collection of subjective opinions.
Think of it this way: Approval polling is plagued by all the complicating factors that make regular old horserace polling inexact. For example, those who responded to the poll will affect the significance of the result. Thus, how eligible voters (anyone over 18, as in the sample used by NYT/CBS) feel about the president is of less importance on Election Day than how registered voters feel (the pool for the Quinnipiac survey). To these caveats, approval polling adds a whole other layer of imprecision: the idea of “approval.”
Approval has gradations of intensity: You can strongly approve of someone or you can only barely approve. You can almost disapprove. One person may “approve” of the job President Bush is doing simply because we haven’t had another terrorist attack. Another may disapprove because she disagrees with his position on stem cell research.
One could argue that a sufficiently large sample would eliminate these “opinion outliers,” and that there is some average level of approval to be sussed out of the 48 percent of 1500 or so respondents. But the fact remains that approval cannot be pinned down in the same way that a preference in election can. You can only vote for one person – you can’t just vote for him a little bit.
So how do pollsters translate “approval” into votes? Usually with a historical slight of hand. They look at the approval rating of the current president, and then compare it with the approval rating of past presidents at the same point in office. This, supposedly, helps us divine the chances of the current president being re-elected. It sounds so scientific: At six months away from the election, according to an article in the LA Times, the “50% [approval rating] mark . . . historically has separated the presidents who won a second term from those who didn't.”
That Bush for months has hovered around this magic dividing line can drive analysts nuts.
“The thing that makes this so hard to predict is that [Bush] is really on the cusp,” said Alan I. Abramowitz, a professor of political science at Emory University in Atlanta, who developed a model for predicting presidential results, in one typical polling story. “If he goes up a few points in approval, it is going to look a lot better for him; if he drops down below 50 percent, he is really going to be in trouble. If he stays where he is, I think we are going to have another really close election.”
But what, really, are analysts basing their predictions on? Abramowitz’s model is based on data from presidential elections going back to 1956. That’s a sample size of 12. Not especially reliable. No, wait: Of those 12 elections, only eight have been of incumbents running for a second term. And can you even count Johnson in this sample? He wasn’t, after all, elected to office. So that leaves us with a sample size of seven. Of those four won reelection. Three of them, Nixon in 1972, Reagan in 1984 and Clinton in 1996, had approval ratings similar to Bush’s (hovering at 50%) at the same point in their terms. (Eisenhower’s approval was at 70% - perhaps people were less judgmental then.)
But none of them ran against John Kerry.
How to spot stupid polling questions
Remember when your grade school teacher told the class, “There’s no such thing as a stupid question”? Your teacher was wrong. A poorly phrased question in a public opinion poll can skew results, confuse respondents, or even change a respondent’s mind. When interpreting poll results, the questions are just as important as the answers, and a responsible pollster should provide the full details of a poll’s text to journalists (and, really, to anyone that asks).
A stupid question muddles choices. Any poll that asks people to choose from more than one proposition must make the distinction between the choices clear, and the choices must be mutually exclusive. And if the question is about an issue to that has many different solutions or responses, the pollster should endeavor to provide a full range of choices.
In December 2003, The New York Times and CBS conducted a poll on gay marriage that illustrates what happens when a question collapses a set of options too far. The Times poll made news largely because it found that 55 percent of those asked would favor a constitutional amendment banning same-sex marriage – a higher percentage than one would have expected, since polls over the past decade have shown increasing acceptance of homosexuality.
What’s more, a Pew Research Center Poll conducted in October 2003 had significantly different results: Only ten percent of those questioned favored a constitutional amendment defining marriage. How to account for the difference? The New York Times/CBS poll didn’t give respondents opposed to gay marriage any option besides a constitutional amendment. Here’s their question: “Would you favor or oppose an amendment to the U.S. Constitution that would allow marriage ONLY between a man and a woman?” Contrast this to the Pew Center question: “Do you strongly favor, favor, oppose, or strongly oppose allowing gays and lesbians to marry legally? IF OPPOSE GAY MARRIAGE (3,4 IN Q.17), ASK: Q.19 Should the U.S. Constitution be amended to ban gay marriage, or is it enough to prohibit gay marriage by law without changing the Constitution?” Which do you think gives a more accurate picture of respondents’ desires?
A stupid question has more than one interpretation. During the build-up to the invasion of Iraq, public interest pollster Alan Kay pointed to a good example of how a poll do this. The issue at stake was how the public interpreted the administration’s arguments for the war. The Pew Research Center asked: "In your opinion, which of the following better explains why the United States might use military force against Iraq? Is it more because the U.S. believes that Saddam Hussein is a threat to stability in the Middle East and world peace or is it more because the U.S. wants to control Iraqi oil?" ( Pew Research Center/ Princeton Survey Research Associates, Nov 4-10, 2002)
Quick: Who is the “U.S” in this question? Is this question asking what the respondent believes is a good explanation for going to war? Is it asking the respondent what argument the Bush administration made for going to war? Or is it asking what the respondent believes is the Bush administration’s real rationale for going to war? Kay helpfully provided a good counter example how you might ask a question on this issue that separates out these the options: "Among these 11 objectives, which do you think should be the number one goal of the U.S.?", followed by "Which is the number one goal of President Bush?"
A stupid question assumes knowledge. When asking respondents about policy options, it is important to define terms and issues as clearly as possible while avoiding loaded terms. Shortly after the September 11 attacks, pollster Stanley Greenberg conducted a survey that dramatically illustrated how people’s responses change according to how (and if) terms are defined: When asked simply, “[D]o you think that the United States should increase spending on foreign aid, decrease spending on foreign aid, or keep it about the same?” 14 percent thought it should be increased, 32 percent of those polled said it should be decreased, and 49 percent thought it should remain the same.
But if the interviewer substituted “humanitarian aid” for “foreign aid” in the same question, and the numbers shifted noticeably: 17 percent for increasing aid, 23 percent for decreasing it and 56 percent for keeping it about the same.
The shift in attitudes is even more dramatic when the interviewer asked about specific foreign aid initiatives: Respondents favored increasing spending in six of the 10 areas mentioned, and keeping spending the same in four, sentiments that appear at odds with the clear majority that opposed increasing spending on mere “foreign aid.”
A few additional points pertain especially to stupid electoral questions and other polls that attempt to determine support among candidates.
A stupid question is asked in a leading context. While a good polling question will contain enough information in it to allow respondents to understand their range of choices, some polls are designed primarily to impart information to respondents, not to determine the respondents’ opinion.
The most infamous use of this kind of poll is a “push poll”: A poll conducted by an interested party that contains damaging – even false – information about an opponent. It’s a version of asking, “When did you stop beating your wife?”, only directed to voters: “Would you still vote for candidate X, even if you knew that he beats his wife?”
In recent election cycles, push polls seem to be more common, but have been subtler in their accusations. In 2000, for example, George Bush conducted a poll in which respondents were asked if they would be more likely to vote for or against John McCain after learning that he voted for “legislation that proposed the largest tax increase in United States history.”
A stupid question includes choices that may not occur to a respondent. In electoral polls, this means including third party and independent candidates that a respondent might not even realize are in the race. But this is a gray area for pollsters: Third party candidates and primary challengers can influence an election’s outcome – look no further than Ralph Nader’s run in 2000. Don’t discount a poll that includes third party candidates, but be aware that those candidates will probably not receive the number of votes indicated by their polling numbers.