FiveFortyThree: Exactly how good are our pollsters?

In order to answer this question, we performed an analysis which is analogous to the one presented by Nate Silver.

Step 1: We performed a literature review o f pollster opinions during the Lok Sabha elections for 1971, 1996, 1998, 1999, 2004 and 2009. In total, we databased 202 opinion polls for each party and alliance, both by-State and Nationwide. Data was collected for AC Nielsen, GfKMODE, CSDS, C-Voter, MDRA and TNS. It should be noted, that the vast majority (57.43%) were for AC Nielsen. While we did not database every pollster (ORG-MARG merged with AC Nielsen and we grouped their results together), this is a preliminary analysis. Our hope is to collect more data over time.

Step2: For our measure of error, we used Mosteller 5. This looks at the margins between winners and losers. In our cases, we presented the particular poll target against all other groups, to make the calculations simpler. So, for example, it may have been INC versus ALL OTHERS.

Step3: The following variables were calculated for each poll: Recency of poll (calculated as a half-life), Sample Size (recalculated as a weighting), and number of polls per pollster.

Step4: The variables determined in step3 were used to present a regression analysis for the Mosteller 5 measures. The Root MSE for each pollster was then recalibrated to revert to a mean of zero. This now meant that negative rawscores suggested better than average polling, whereas positive rawscores meant worse than average polling.

Step5: We now took our rawscore and revert it to the mean. This allows us to account for inherent luck, variance and noise. The formula uses both the reversionparameter, which is 1 - (0.06 * sqrt(# pollsters)), and the groupmean, which we take to be -0.12, the average from the groupmean for transparent pollsters and those conducting polls via telephone in the US elections. This groupmean likely accounts best for the transparency among Indian pollsters.

Step6: In our final step, we calculate the Pollster-Introduced Error. This simply takes the adjusted score and adds 2.

Due to the lack of polling data available, we ran our ratings using several scenarios. The results of these were as follows:-

1. Use of all polling data

How did the pollsters fare using all of the available predictions?

Pollster	Pollster-Introduced Error
CSDS	-19.53
MDRA	-7.35
GfKMODE	-6.13
TNS	-6.07
C-Voter	-2.19
AC Nielsen	167.79

2.Nationwide-only Polling

How did the pollsters fare when predicting the nationwide results only?

Pollster	Pollster-Introduced Error
GfKMODE	-1.63
CSDS	-1.48
TNS	-0.06
MDRA	0.41
C-Voter	4.72
AC Nielsen	27.33

3. Polls conducted within 40 days of the first election

How did the pollsters fare for predictions made within 40 days of the first election?

Pollster	Pollster-Introduced Error
CSDS	-9.30
MDRA	-2.85
TNS	-2.59
GfKMODE	2.88
C-Voter	6.05
AC Nielsen	40.50

4. Polls excluding the 2004 Lok Sabha election

How did the pollster fare if we exclude the 2004 election, when the pollsters were widely known to have failed.

Pollster	Pollster-Introduced Error
CSDS	-5.33
C-Voter	-5.22
GfKMODE	6.09
AC Nielsen	16.62

5. Polls excluding Nationwide predictions

How did the pollster fare when predicting at the state level?

Pollster	Pollster-Introduced Error
GfKMODE	-16.15
C-Voter	-6.49
AC Nielsen	77.37

6. Polls conducted for the 2009 election

How did pollsters fare for the last lok sabha election?

Pollster	Pollster-Introduced Error
CSDS	-3.60
C-Voter	-3.29
GfKMODE	7.94
AC Nielsen	16.39

Conclusion

Consistently, CSDS comes out at the top in virtually all scenarios. GfKMODE and C-Voter also do consistently well.

Unfortunately, AC Nielsen does poorly. This may be due to the large amount of data collected for it. We suspect it has more to do with the lack of by-state poll predictions conducted by others. Just by virtue of having conducted so many polls, AC Nielsens’ error was increased. It was also the largest pollster for whom we had data in 2004. CSDS, on the other hand, did not present us with any data for the 2004 elections. This would naturally decrease its error.

Our analyses compare favorably with others. The Indian Republic also had similar findings for CSDS, C-Voter and AC Nielsen.

Finally, we feel the CSDS polls should be classed as the best. The simple fact that they conduct their polls in a very transparent manner shows in their accuracy. Virtually all of the other pollsters analysed also did remarkably well. For those claiming that the polls should be banned and are too biased, our findings refute that. While there have been some poor predictions, such as in 2004, on the whole the pollsters have done extremely well.

Exactly how good are our pollsters?

No comments:

Post a Comment