Tuesday, November 27, 2007

Nobody Knows You're a Dog

The Zogby International poll released yesterday is causing a triumphal buzz on the neo-Naderite left because it confirms their fantasy that not only do their favorites handily top all possible Republican candidates but also Hillary Clinton is topped (and I mean that in every way possible) by those same Republicans. While I'm getting a chuckle out of their self-delusion, I feel duty bound as both a political scientist and a web application developer to nip this idiocy in the bud.

Polls, to be reliable, have to have a demonstrably accurate polling sample. By accuracy, I simply mean that the people you poll are who and what they represent themselves to be, and that their answers to the questions are "honest". This doesn't mean they are rational or commendable, but that their biases are transparent. If you are in favor of X or you dislike candidate Y, your position should be recorded as such in the poll.

The reason pure online polls are not reliable is because, on the Internet, no one knows you're a dog. Or that you claim to be a black 20-something female from a big city while you're actually a white, 40-something male from the wilds of Wyoming. Or that you possess fifteen or twenty different personas. Something I wrestle with all the time in public-facing web sites where we have to rely on the self-reporting of the registrants for background information, such as age, ethnicity, and unique identity. An ongoing challenge, for example, is how the city government can allow citizens to have stable accounts they can use to conduct business with the city yet not contain the kind of data that would allow that person to be definitively identified. SSN is out, for example. The only authentication method readily available is an email address and those are free for the taking.

I'm part of the Zogby International polling pool. Given the (non-)controls they have on it, it must be one of the most polluted, unreliable polling pools out there. You can misrepresent yourself at will, you can submit one bogus identity after another, and you can vote multiple times in the same polls if they select a large enough group. After you finish a poll, you are asked to provide names of people who should be asked to become part of the ZI pool. Now, who are you going to recommend, someone who agrees with you or someone who doesn't? Or why invite someone else when you can just invite yourself via a new email address? The ZI poll is made up of self-selected participants who do not have to pass any follow up confirmation of their unique identities. Given the online bias, the respondent pool is likely to have a higher proportion of dedicated political partisans than a truly random phone poll.

This stands in contrast to a tracking poll the AP is running jointly with Yahoo, where a pool of participants was called, screened, balanced for demographics and geography, but where their actually polling is performed online. The same people are asked the same set of questions and presents a picture over time. This is different than an opinion poll, but it does highlight the error points of the ZI style. Bottom line, they don't know who they are polling and the polls themselves are set up poorly. They are more accurate than the MSNBC "Who won the debate?" open polls that get flooded by partisans, but they cannot adequately control for bias.

I do enough Zogby polls (and the fact that I am constantly selected for polls also says something about their selection process) that I can say they are generally very sloppy. They ask a lot of political opinion questions without testing for relevant biases up front, such as I am asked who I voted for in the last presidential election, but I am not asked if I have a preferred candidate in the current election, something every good pollster knows to ask for. For a long time, I could not indicate that my religious position was "Humanist" not "Other". Questions are often worded so that only extreme positions can be selected (Are you for open-ended occupation of Iraq or for immediate withdrawal? Umm, I'm for phased withdrawal, thank you very much...).

I happened to be part of this last "poll" and it was clear to me from the first question that this was a hit job on HRC. They asked questions about what candidates were running negative campaigns, who was the candidate who represented change, and other coded questions that the Obama and Edwards campaigns have been trying to get into the memeage.

I was then given three screens of four questions. I was asked to say who I would vote for if an election match up was between Democratic candidate A and Rebuplican candidate 1, then A and 2, then A and 3, and then A and 4. The next screen had candidate B and 1,2,3,4, and finally candidate C and 1,2,3,4. However, there were four options for every question:

Democrat A/B/C
Republican 1/2/3/4
Vote for someone else
Won't vote

Hmm, OK, I am not asked who I support, and I can, without penalizing my preferred candidate, vote the third or fourth answer for a challenger to my candidate. Without having to put my cards on the table (be an "honest" participant) by reporting my bias (candidate preference overall), it is not possible to control for strategic voting - such as Obama supporters deliberately saying they would not vote if the choice was Clinton or any Republican.

The peculiar pattern of votes points to punitive voting by Democrats who don't support Clinton and are trying to present a stronger picture of disapproval than is present in the general population. The fact that no one else's numbers change, only HRC's and only down and down in a way not recorded in any other poll (compare to the Gallup poll done at the same time that shows her outpolling all Republicans by larger margins than Obama does.) is not a result of general opinion. It is a result of deliberate manipulation of the data by the participants. The Republican respondents have no need to dissemble as all Democratic candidates are almost equally dangerous to all of their candidates. They have no reason to reduce the apparent strength of any Republican in relation to any Democrat, while HRC's close challengers have a strong incentive to reinforce the meme that the other candidates are more electable in the general than she is.

The problem with liar polls, like liar loans, is that at some point you have to produce the goods - you have to cough up the votes or the money to cover the claim.


