Frank Graves, President of EKOS Research Associates, was kind enough to give me a very complete interview over a few emails. In it, he discusses some aspects of polling and IVR polling in particular that I would have liked to have gotten the chance to cover in my article, but could not because of length. Below is the full transcript of the email interviews.
308: Several years ago, EKOS moved from traditional live-caller polling to interactive voice response. Why was that decision made?
FG: Actually we continue to do live
interviewing and we have a large random probability panel which we use for
survey work. Most but not all of our media polling comes from our version of
IVR, which is a series of carefully tested protocols which we are now calling
HD-IVR (high definition IVR). IVR is not a survey methodology, it is a method
of reaching respondents. The sampling strategy, call back regimens, data
purification techniques, instrument design, etc. are all crucial and determine
the accuracy of the survey results. We turned to IVR several years ago
noting the success of some American IVR pollsters. IVR has certain limitations
but for short surveys, properly designed and administered it can be an
excellent tool. We particularly like the ability to generate large random
samples at far lower cost than with Live Interviewer CATI. In our experiments
HD-IVR gives results that are equivalent or better than those with Live
CATI and much more accurate than any opt-in online methods.
308: Having used different
methodologies in the past, what do you consider the strengths of IVR compared
to those other methodologies?
FG: Once again noting that we continue to actively use many other
methods (we have our own call centre for live CATI) and we maintain a large random
probability panel (PROBIT), I would give the following list of strengths for
properly applied IVR techniques. (this includes the application of call backs ,
noise detection and elimination, dual land line and cell phone sampling frame,
etc.):
- Accuracy, particularly on simple behavioural and intention measures.
- Speed: large samples can be assembled and analysed very rapidly.
- Large samples which produce lower MOE (particularly for sub population
analysis and tracking).
- Economy (the live interviewer cost is replace by robotics).
- Minimisation of undesirable mode effects from live interviewer
(particularly important on questions which can produce social desirability
bias).
308: What are IVR's
limitations, and what can be done to correct for them?
FG: Once again I want to stress the difference between HD-IVR and any
properly designed IVR system and “raw” IVR. We get much better results
with several important refinements which are often not applied in IVR polls. The biggest limitations of IVR are:
- Length, the survey must be quite short.
- Reputational problems: the use of IVR is associated with reputational
issues (particularly in Canada) where there is lower familiarity with properly
applied IVR. Sloppy applications of IVR and the nefarious connection to some
vote suppression activities have done nothing to help this problem.
- Stricter limitations on calling periods.
- Programming complexity to deal with multiple random versions to
eliminate response set biases and sequencing effects.
- More people are called so there is a modest increase in the
intrusiveness of the research.
- In order to get sufficient representation of younger respondents we
have to engage in call backs and a judicious sample of cell phone-only
populations.
- Response rates are somewhat lower than those for live interviewer but
with our techniques only modestly lower and with less systematic patterns of
non-response.
- Our experiments show that there is more random noise in IVR than with
live interviewer. This noise is easily detected with testing and can be
purged.
308: What do you mean by
noise?
FG: By noise I mean responses which are not measuring
the concept being tested. Noise is random, meaningless data. The analogy is
drawn from psychoacoustics but applies to other areas such as this (I believe
Nate Silver uses the terms in the title of his last book). As an example of
random noise consider the difference between someone answering the questions
thoughtfully and accurately (signal) and someone just randomly pushing numbers.
We find that people answering questions about fictitious events/products is
higher in IVR than with live interviewer. This applies to other unwanted survey
behaviour as well. What we used to call anomalous response set (yea and
nay saying and more recently speeding and straight lining). With the noise
detection questions we can identify and remove these sources of noise from the
sample.
308: Generally speaking, how
does IVR polling compare to other methodologies in terms of costs and
effort?
FG: The front end programming and data base management is more complex
but the obvious savings are in live interviewer time. Long distance charges are
higher because of the greater numbers of calls. Our costs and efforts are
perhaps half of what a live interviewer survey would cost and comparable to the
costs of our probability panel offerings. Opt in panels, where the respondents
volunteer for surveying and who have never been spoken to by the survey
organization are cheaper still than HD-IVR.
308: Can you explain how your 'probability panel' is
different from 'opt-in' panels?
FG: Probability methods select with an equal
probability of selecting (EPSEM in Kish’s terminology) each member of a
population. This is a canon of good sampling and the foundation of the ability
to apply the central limit theorem and the law of large numbers, the
foundations of inferential statistics. We sample each member of the population
with a known probability of their likelihood of appearing in the sample. In the
case of opt-in or convenience sampling there are (at least ) two fundamental
problems. The sample is NOT randomly drawn from a frame of all individuals in
the population. Respondents are invited to join or come from other
pre-existing lists of some other portion of the population. They therefore opt-in
or volunteer (typically for material incentives) and their relationship to the
broader population is unclear. Since the process is not random inferential
statistics are not possible (including calculation of Margin of Error). The
problem is worsened by systematic coverage errors where those who cannot or
will not do surveys on line will never appear in the sample
Now some say that as response rates decline the
process of random sampling is no longer meeting the requirements of statistical
inference. The hard, third party research suggests this is not true. While we
have selection processes from a random invitation this is a much smaller
problem than those problems PLUS a non-random invitation. The top authorities
remain unconvinced that one can achieve scientific accuracy and MOE with
non-random samples (MOST online panels are non-random). Under rare and
extremely stringent conditions this happens but in most cases it is wrong. By
the way, the response rates with HD IVR are close to what we get with live
interviewer now. And objections from those using opt-in panels are hard to take
seriously as their response rates are incalculable and if they could be
calculated would be the percentage of all those who saw the internet ad and
didn’t join the panel (maybe 99.9 % or higher?).
308: Because of the issues
related to the use of robocalls in political campaigns, whether legitimate or
not, and by telemarketers, there has been increased criticism of this
methodology recently. What kind of problem does this pose for polling firms
that use IVR?
FG: We have spoken to the CRTC on this issue, as well as the MRIA. We
certainly would welcome limitations on the less savoury applications of
robocalls as this would lessen our problems with public suspicions. We use a
very rigorous code of application that meets the CRTC requirements for
automatic dialling devices. We would welcome clarification that would
distinguish legitimate research application based use of IVR from the much more
common mass market uses. This distinction does apply to polling and
market research in other areas and it is unclear how it would apply in the
context of IVR. We would welcome sound guidelines and a demarcation between
legitimate survey research and other areas of use.
308: You have recently
discussed the challenges of building a representative sample of voters. But
what challenges do you face in building a representative sample of the
population, considering falling response rates and increased use of cell phones
over landlines?
FG: We only really know who the voters are after the vote so this will
remain a challenge. In the case of representative samples of known populations
careful sampling, call backs and weighting can continue to produce
scientific accuracy when based on random sampling, even with steeply
declining response rates. Coverage errors for cell only and off line
respondents can also be solved but these subpopulations are not included in
lots of current work by others. Experimental testing can identify and calibrate
deficiencies and patterns of selection even when using random selection. These
patterns can be both demographic and psychographic; but they are correctable.
308: And where does the
challenge come in building a representative sample of voters?
FG: The challenge is not one of modelling a known population but
predicting a future event. We can never know this with certainty and using
guesses like previous demographic characteristics of the past vote are very
limited solutions. Some things that used to work (e.g. enthusiasm) no longer
work and demographic turnout can and will vary from election to election. Asking
people their certainty to vote is basically useless for predicting who
will vote. Past voting patterns are of some assistance as are questions
about whether you know where your polling station is. But these are highly
limited aids in those situations where more than half of the eligible
population isn’t voting and they are systematically different in their vote
intentions than those who show up. Increasingly, political campaigns are all
about getting out your vote, and keeping home the opponents' vote. Mandatory
voting would eliminate this problem but I am not holding my breath on that one.
At the federal level one should be able to accurately forecast the
outcome with sufficient tracking and diagnostic tools. And we have correctly
forecast all federal elections save the ‘miss’ on the 2011 election which got
the winner right but not the majority. In fairness, no one else predicted a
majority that time and our poll was within the MOE of all polls for that
election. We have been working extensively to understand the issues of turnout
(which is the key - NOT late switching or undecided movements as some have
claimed). We are very confident that we will get the next federal election
outcome accurately as we did in all previous attempts.
308: What role does weighting
play in producing accurate results?
FG: Weighting is very important but it should be a fairly light
touch with crystal clear guidelines. It should never be used to correct
huge deficiencies (e.g. weighting younger respondents by several times).
Our unweighted HD IVR gives very similar results to our weighted version (age,
gender, household size). One should definitely not root around in the weighting
bin until things look okay. And pollsters should produce or have available both
weighted and unweighted results. If weighted results look really different than
the unweighted then something is wrong with the sample.
308: EKOS has been in the
business for a very long time. How has political polling changed over the
years?
FG: That is an essay in itself, but I would say that the methodological
challenges we have been discussing and the transformation of methodologies are
very important.
I think that the media-pollster relationship is in a state of disrepair.
I think there are inadequate budgets and I think the statistical fluency in the
media and possibly the public has declined. The role of the aggregators is
another new feature; something I find to be a mixed blessing (although I
do think you give a really good effort here Eric). I detest the conflation of
polling accuracy with forecasting the next day election. This yardstick comes
from a time when most voted and those who didn’t weren’t particularly
different. The correspondence between the election and final polls was a great
way to check a pollster’s accuracy. When half or more aren’t voting and those
who didn’t have different political preferences this becomes a lousy yardstick
for “polling accuracy”. There is a continued need for forecasting and this is a
related skill but the tasks of forecasting and modeling the population should
be seen as related but separate tasks.
308: If elections are no longer good ways to gauge a
pollster's accuracy, how else can the accuracy of a pollster's work be tested?
FG: Pollsters should conduct proof of concept
testing with known external benchmarks to show that they can achieve
representativeness. Important polls should at least occasionally include a
basic inventory of benchmark indicators of representativeness such as: do you
smoke? Own a valid Canadian passport? Rent or own your home with or without
mortgage? Heating fuel type, etc. And the unweighted raw data should look like
the population on key demographic measures and these external benchmarks.
308: If the media
does not have the budget for polling or dedicated poll analysts (and that will
not be changing), what are pollsters to do? Should they back away or do they
have a responsibility of some sort to put out numbers?
FG: They should probably limit their activities
to those areas where they can put best effort forward. The media should pay
(as they do in the US) as this is an area which really does generate viewership
and readership. The industry could consider a consortium of players to offer
this up as an industry service during elections. Or perhaps we could look at
alternative models such as Ramussen.com that successfully sells directly to
consumers with subscriptions.
308: How has the business of
polling in general changed?
FG: The ‘business’ of polling has changed dramatically. We have
discussed some of the methodological and technological transformations. Political
polling really isn’t a ‘business’ for any of those doing it in Canada.
Historically, we have probably been the largest supplier of polling services to
the federal government. The federal polling budget has dropped from over $30M
in 2006 to under $4M last year. This is a rather breathtaking elimination of
what was non-partisan policy and communications work based on listening to
Canadians. Interestingly while "listening" to Canadians has all
but disappeared “persuading" Canadians has burgeoned. In 2006 there were
roughly similar expenditures. Today, there is probably 30 to 40 times the
expenditures federally on advertising that there is on polling. Fortunately our
investments in new survey technologies have strengthened our other markets and
we are now experiencing growth and profits. While we no longer depend on
federal markets it is our hope that the federal government will return to
listening to Canadians again.
308: In your polls, particularly
of the general population, EKOS has tended to have larger proportions of people
supporting the Greens or 'Others' than other firms. Why is that?
FG: Our polls (particularly between elections) are focussed on all
eligible voters. We believe that our polls accurately measure the percentages
of all eligible voters who support the Green Party on the day of the polls. If
one doesn’t prompt for the Green Party one will get lower incidences as one
would if you dropped any of the other party prompts. The simple fact is that
many GPC supporters don’t bother voting. They are younger and younger voters
vote in half the rate of older voters. They also correctly note that under the
first-past-the-post voting system they are unlikely to see any electoral
results if they did vote, so this is a further de-motivating factor. In
2008, nearly 7 per cent of all voters voted for the GPC. If you don’t
mention GPC in your prompting you may get a number closer to the election (or you
may well be lower than that). But I don’t like to mix up ad hoc adjustments for
the fact that the GPC doesn’t vote as much when measuring all eligible voters.
We carefully note that GPC support historically translates into fewer actual
voters. Other pollsters have their own legitimate views on how this problem
should be handled.
308: What changes, if any,
need to be made to ensure that IVR polling produces good results in the
future?
FG: Our HD-IVR has been refined to provide scientifically accurate
models of all eligible voters. We have the experimental evidence to show that. If
you are separating the question of how to make better forecasts of turnout
there is lots of work needed there and we and others are focusing on this challenge.
As Yogi Berra noted, ‘prediction is really hard, particularly when it’s about
the future'.