May 22, 2018, "2018 Midterm Elections: Thinking about the elections in 2018, if the election for U.S. Congress were held today, would you vote for the Democratic candidate or the Republican candidate in your district where you live?" polling.reuters.com, Reuters weekly poll, week ending May 20, 2018, 1139 Registered voters.
Generic ballot: Republican 40.7, Democrat 34.5. Via online surveys.
About this poll
"This Reuters / Ipsos poll began in January 2012 and since then continuously polled between 2,000 and 3,000 people a week. Over that period, we have asked hundreds of questions ranging from presidential politics to the Oscars, from the Syrian civil war to the perception of social networks, such as Facebook and Twitter.
Unlike almost all mainstream polls, the data is entirely collected via online surveys. Online surveys allow us to collect far more data and to be more flexible and fast-moving than phone research, and online is also where the future of polling lies.
This methodology may be different from the ‘traditional’ (telephone) approach used by others, but it is highly accurate: It was the most accurate national poll of US residents published immediately before the November 2012 general election. [2016 election not mentioned here].
Our data is primarily drawn from online surveys using sampling methods developed in consultation with several outside experts. These involve recruiting respondents from the entire population of US-based Internet users in addition to hundreds of thousands of individuals pre-screened by Ipsos. The responses are then weighted based on demographic information.
Because of the volume of demographic information collected, the poll provides unprecedented insight into the myriad of communities that constitute the United States in the 21st century.
This window into the population allows users to look at the polling results over time and adjust the aggregated interval of results to maintain a reasonable sample size. Those intervals are five-day rolling average as well as weekly, monthly and overall averages.
The accuracy is measured using Bayesian credibility intervals. For the stats folks among you: the credibility interval assumes that Y has a binomial distribution conditioned on the parameter θ, i.e., Y|θ~Bin(n,θ), where n is the size of our sample. In this setting, Y counts the number of “yes”, or “1”, observed in the sample, so that the sample mean (y ̅) is a natural estimate of the true population proportion θ. This model is often called the likelihood function, and it is a standard concept in both the Bayesian and the Classical framework.
In the Bayesian framework, one’s knowledge base is one’s Prior Distribution. For the purposes of this document, θ is a proportion which assumes values between 0 and 1. This may reflect the proportion supporting a particular voter initiative or candidate. The family of prior distributions we are considering assumes a beta distribution, In effect, π(θ)~ß(a,b) is a useful representation of our prior knowledge about the proportion θ. The quantities a and b are called hyper-parameters, and are used to express/model one’s prior opinion about θ.
In other words, judicial choice of a and b can restate one’s belief that the parameter is nearer to 25% (a=1 and b=3), near to 50% (a=1 and b=1) or nearer to 75% (a=3 and b=1). The choices of a and b also defines the shape of the probability curve, with a=1 and b=1 denoting a uniform or flat distribution. In effect, this is equivalent to believing that the true value of θ has the same chance of being any value between 0 and 1.
The hyper-parameters a and b are not limited to a known constant. They too can be modeled as random quantities. This adds flexibility to the model, and it allows for data-based approaches to be considered, such as Empirical and Hierarchical Bayes.
The posterior distribution in Bayesian statistics takes the likelihood function and combines it with our prior distribution. Using our prior Beta distribution, the posterior distribution is also a beta distribution (π(θ/y) ~ ß (y+a,n–y+b)). It is the hyper-parameters of the prior distribution, i.e., one’s knowledge base, updated using the latest survey information9. In other words, the posterior distribution represents our opinion on which are the plausible values for θ adjusted after observing the sample data.
Our credibility interval for θ is based on this posterior distribution. As mentioned above, these intervals represent our belief about which are the most plausible values for θ given our updated knowledge base.
There are different ways to calculate these intervals based on π(θ/y). One approach is to create an estimator analogous to what is done within the Classical framework. In this case, the credible interval for any observed sample is based on a prior distribution that does not include information from our knowledge base. This case occurs when we assume that the parameters of the beta distribution are a=1, b=1 and y=n/2.
Essentially, these choices provide a uniform prior distribution where the value of θ is equally likely on the range between 0 and 1. In effect, our knowledge base is equally sure or unsure that the true value is near zero, 25%, 50%, 75% or 100%. The confidence interval is usually calculated assuming a normal distribution.
However when we are measuring a proportion, and the estimate of the proportion is close to one or zero, this approach is no longer accurate. Therefore we use a logit transform of the proportion and estimate its confidence interval, then invert to calculate the confidence interval of the proportion.
Please enjoy and share. If you have any questions, please contact me, Maurice Tamman, Reuters Editor in Charge, Data and Computational Journalism"