Saturday, September 8, 2012

Desperately seeking trends

Doug Schoen and Pat Caddell's latest piece of advice to Obama suggests that his campaign is in trouble and needs a sudden change. As evidence of Obama's recent troubles, they cherry-pick from a number of polls, noting that Obama is trailing Romney badly in the "battleground" state of Missouri (which isn't actually a battleground this year) and that Romney has closed the gap with Obama nationwide since May. Then they do this:
President Obama now leads by just one point in the latest PPP Florida poll (48%-47%)—down from a four-point lead (50%-46%) in an Aug. 22-26 CNN poll.
Look, I'm no pollster, but isn't comparing results across different polling firms and calling it a trend one of the cardinal sins of polling interpretation? And, given those polls' margins of error, aren't those two results statistically indistinguishable from one another?

4 comments:

Anonymous said...

you are correct -- averaging treats the quality as all the same, and washes out margin of error

andrew said...

If the poll has a margin of error of three points, which would be pretty typical, the difference between a four point spread and a one point spread is material enough to care about, although hardly enough to call for an about face. The difference between a 0.3 SD gap and a 1.3 SD gap makes a decent difference in the relative probabilities.

If one is really cherry picking, then yes, that is bad practice. But, if one is comprehensively reviewing and updating all new horse race data since a previous regular installment during the election season, and the result is robust across all of the new polls that exist and not merely a cherry picked subset, then the statistical power of the conclusion that there might be a trend rises. From a reader's perspective, this comes down to the author's credibility.

Poly sci models tend to discount the importance of campaigning period, however. Also, since many poly sci models show Colorado as the median state in political preference in the United States, the only poll that matters if relative partisan inclinations in swing states are stable is the Presidential race polling in Colorado.

Seth Masket said...

If the poll has a margin of error of three points, which would be pretty typical, the difference between a four point spread and a one point spread is material enough to care about, although hardly enough to call for an about face.

Is this the right way to think about it? Does the margin of error apply to the spread between the candidates, or just to the estimates of the candidates' popularity? That is, if one poll has Obama at 50% ±3, and the second has him at 48% ±3, then we can't really say that any change has occurred at all, right?

andrew said...

Shorter version: No.

The lead number is the top of the Bell Curve. If one poll has him at 50% and the next has him at 48%, it is more likely that Obama's actual support has fallen than it is that Obama's support has risen or stayed constant. In particular, a one SD spread of Obama at 47-53 v. a one SD spread of Obama at 45-51 makes a lot of difference at the high end of the former and the low end of the latter result.

There is a nice hefty chunk of probability that there is in fact no change (centered around a mean value of 49%). But, the results aren't identical and there is a direction. The more data points you have the more statistical power you get, although you get much less of a boost in statistical power from pooling the raw data into one megapoll (which is usually the better choice in a highly homogeneous question and methodology) v. treating each poll as an independent data point which is the intuitive thing to do.

Increasing the sample size by a factor of ten decreases the margin of error by about 50% in the range of values where Presidential polls typically fall out.