Understanding the 2020 Presidential Election Polls

candidates

Final Update: November 3, 2020

For Election Night

Safe States and Swing States: How to track the 2020 Presidential election November 2, 2020

Why is 2020 Different from 2016?

Simple answer is that there is a smaller percentage of undecided and 3rd party voters. Here is a table comparing polling averages from 2016 and 2020 in potential swing states.

	State	Biden (D)	Trump (R)	Undecided & 3rd Party 2020	Clinton (D)	Trump (R)	Undecided & 3rd Party 2016
0	North Carolina	48.8	46.5	4.7	45.5	46.5	8
1	New Hampshire	53.4	42.4	4.2	43.3	42.7	14
2	Texas	45.5	49.5	5	38	50	12
4	Florida	48.9	46.8	4.3	46.4	46.6	7
5	Arizona	49.5	46.3	4.2	42.3	46.3	11.4
6	Wisconsin	49.3	44.7	6	46.8	40.3	12.9
7	Ohio	46.2	46.8	7	42.3	45.8	11.9
8	Pennsylvania	49.5	44.6	5.9	46.2	44.3	9.5
12	Michigan	50.4	42.6	7	45.4	42	12.6
13	Iowa	47.2	46.4	6.4	41.3	44.3	14.4
14	Virginia	51.7	40.3	8	47.3	42.3	10.4
15	Nevada	49	43.8	7.2	45	45.8	9.2
16	Georgia	47.5	46.3	6.2	44.4	49.2	6.4

TL;DR Scenarios

States with a weighted polling average of 50% or higher

Candidate	Strong	Likely	Total
Biden	173	45	218
Trump	67	44	111

If ALL undecided voters in every state vote for Trump

Candidate	Strong	Likely	Total
Biden	181	37	218
Trump	295	25	320

Randomized distribution of undecided voters in every state

Candidate	Strong	Likely	Total
Biden	227	107	334
Trump	111	93	204

NEW
Undecided voters distributed by Partisan Voter Index

Candidate	Strong	Likely	Total
Biden	227	52	279
Trump	215	44	259

Update

I added a fourth scenario to demonstrate how undecided voters can change the outcome of the election. This one uses the Cook Political Report’s Partisan Voter Index to weight how to distribute undecided voters. Keep in mind that an electoral college tally of 270 and above is a win so when a scenario has a winner, that total will be in bold.

Strong and Likely States

In the two graphics below, we have states that are considered strong or likely states for the two candidates. The probability is based upon the weighted average polling for each state and if a candidate can win the state without the support of any undecided voters. Safe states without polling as designated by the Cook Political Report are assigned a probabilty of 1 for the favored candidate and 0 for the opponent.

Biden

Biden Solid States 173
Biden Likely States 45
Biden Solid + Likely States 218

Link to Interactive graphic
Biden Solid and Likely States

biden

Trump

Trump Solid States 67
Trump Likely States 44
Trump Solid + Likely States 111

Link to Interactive graphic
Trump Solid and Likely States

trump

States where undecided voters are most likely to determine the outcome

The table below can be considered the “path to victory” states as both campaigns will pursue several combinations of these states to pass 270 electoral college votes.

	State	ec	cook	PVI	Biden_avg	Trump_avg	undecideds
1	arizona	11	Lean Dem	R+5	47.4	46.8	5.8
2	colorado	9	Likely Dem	D+1	49	39.3	11.7
3	florida	29	Toss Up	R+2	48.1	46.7	5.2
4	georgia	16	Toss Up	R+5	47.1	48.3	4.5
5	iowa	6	Toss Up	R+3	45.8	47.5	6.7
6	michigan	16	Lean Dem	D+1	49.5	46.2	4.3
7	minnesota	10	Lean Dem	D+1	48.2	44.1	7.7
8	nevada	6	Lean Dem	D+1	48.1	46.1	5.8
9	north carolina	15	Toss Up	R+3	48	47.8	4.3
10	ohio	18	Toss Up	R+3	45.6	46.7	7.6
11	pennsylvania	20	Lean Dem	EVEN	48.7	47.2	4.1
12	texas	38	Toss Up	R+8	46.4	47.6	6

If all undecided voters are shy Trump supporters

The graphic below assumes all undecided voters will vote for Trump. This scenario is the “hidden Trump vote” or “shy Trump supporter” in polling.

Trump Solid States 295
Trump Likely States 25
Trump Solid + Likely States 320

Biden Solid States 181
Biden Likely States 37
Biden Solid + Likely States 218

Link to Interactive graphic
Hidden Trump Vote Scenario

hidden

Random distribution of undecided voters

The graphic below randomly distributed undecided voters for each state in 20,000 simulations per state.

Trump Solid States 111
Trump Likely States 93
Trump Solid + Likely States 204

Biden Solid States 227
Biden Likely States 107
Biden Solid + Likely States 334

Link to Interactive graphic
Randomized undecided voters

random

Partisan Voter Index distribution of undecided voters

The graphic below uses the PVI to determine how to distribute undecided voters for each state in 20,000 simulations per state. This is the equivalent of a Polls Plus model.

Trump Solid States 215
Trump Likely States 44
Trump Solid + Likely States 259

Biden Solid States 227
Biden Likely States 52
Biden Solid + Likely States 279

Link to Interactive graphic
PVI undecided voters

pvi

Updated Weighted Polls, Cook Political Report Assessment, and Partisan Voter Index

	State	ec	cook	PVI	Biden_avg	Trump_avg	undecideds
0	alabama	9	Solid Rep	R+14	38.3	57.7	4
1	alaska	3	Likely Rep	R+9	41.4	46.2	12.4
2	arizona	11	Lean Dem	R+5	47.4	46.8	5.8
3	arkansas	6	Solid Rep	R+15	39.2	53.7	7.1
4	california	55	Solid Dem	D+12	61.7	30.5	7.8
5	colorado	9	Likely Dem	D+1	49	39.3	11.7
6	connecticut	7	Solid Dem	D+6	50.9	33.9	15.2
7	delaware	3	Solid Dem	D+6	54.8	34.3	10.9
8	florida	29	Toss Up	R+2	48.1	46.7	5.2
9	georgia	16	Toss Up	R+5	47.1	48.3	4.5
10	indiana	11	Likely Rep	R+9	39.5	52	8.5
11	iowa	6	Toss Up	R+3	45.8	47.5	6.7
12	kansas	6	Likely Rep	R+13	40.8	48.2	11
13	kentucky	8	Solid Rep	R+15	38.6	56.4	5
14	louisiana	8	Solid Rep	R+11	36.7	54.8	8.5
15	maine	4	Likely Dem	D+3	52.9	39.3	7.7
16	maryland	10	Solid Dem	D+12	59.7	31.5	8.8
17	massachusetts	11	Solid Dem	D+12	63	28.6	8.4
18	michigan	16	Lean Dem	D+1	49.5	46.2	4.3
19	minnesota	10	Lean Dem	D+1	48.2	44.1	7.7
20	mississippi	6	Solid Rep	R+9	41	56	3
21	missouri	10	Likely Rep	R+9	43.8	51.2	4.9
22	montana	3	Likely Rep	R+11	43.5	51.1	5.4
23	nevada	6	Lean Dem	D+1	48.1	46.1	5.8
24	new hampshire	4	Lean Dem	EVEN	52.2	42.8	5
25	new jersey	14	Solid Dem	D+7	57.3	37	5.7
26	new mexico	5	Solid Dem	D+3	53.8	40.8	5.4
27	new york	29	Solid Dem	D+11	59.2	31.8	9.1
28	north carolina	15	Toss Up	R+3	48	47.8	4.3
29	ohio	18	Toss Up	R+3	45.6	46.7	7.6
30	oregon	7	Solid Dem	D+5	51	39	10
31	pennsylvania	20	Lean Dem	EVEN	48.7	47.2	4.1
32	south carolina	9	Likely Rep	R+8	44	50.3	5.7
33	tennessee	11	Solid Rep	R+14	39	52.8	8.2
34	texas	38	Toss Up	R+8	46.4	47.6	6
35	utah	6	Likely Rep	R+20	38.4	48.6	13
36	virginia	13	Likely Dem	D+1	51.5	40.5	8
37	washington	12	Solid Dem	D+7	57.7	31.9	10.4
38	west virginia	5	Solid Rep	R+19	38.2	56.7	5.1
39	wisconsin	10	Lean Dem	EVEN	51	44	5.1
40	district of columbia	3	Solid Dem	D+43	nan	nan	nan
41	hawaii	4	Solid Dem	D+18	nan	nan	nan
42	idaho	4	Solid Rep	R+19	nan	nan	nan
43	illinois	20	Solid Dem	D+7	nan	nan	nan
44	nebraska	5	Solid Rep	R+14	nan	nan	nan
45	north dakota	3	Solid Rep	R+17	nan	nan	nan
46	oklahoma	7	Solid Rep	R+20	nan	nan	nan
47	rhode island	4	Solid Dem	D+10	nan	nan	nan
48	south dakota	3	Solid Rep	R+14	nan	nan	nan
49	vermont	3	Solid Dem	D+15	nan	nan	nan
50	wyoming	3	Solid Rep	R+25	nan	nan	nan

About this Project

Quite a lot has been written about 2016 election forecasts, polls, and the influence they may have had on voter behavior. A good friend and I were also caught up in the hype around forecast models and which one would be the most accurate. We all know how that turned out.

data science

This project is based off of the lessons learned from the 2016 and 2018 elections. The intent is to inform, not influence, the 2020 presidential election.

What’s different?

Plain and simple, there is not a prediction for the winner. Instead, the intent is to show which states are close, which are not, and what the potential effects undecided voters can have on the outcome. The goal of data journalism is to leave the reader with a better understanding of the complex and that is precisely what we will try to do this time around.

Sources of Polling Data

State polling data will come from RealClearPolitics and not include their state averages for reasons covered in the approach section below. Not all states or districts may have polling available. In these cases, the Cook Political Report’s Electoral Scorecard will be used to complete the picture. The states or districts not having polling data until much later in the campaign are often safe for one candidate while battleground states have polling data updated more frequently.

The Approach

Weighted Averages

Not all polls are of equal quality, but a simple average would treat them that way. Instead, I have two methods to weight the polls; by margin or error and by segment polled, specifically registered voters or likely voters.

A registered voter is someone who is simply registered to vote, while a likely voter is a registered voter who has indicated their intention to vote in the election. Polling firms handle this determination differently, but you can read more about the segments here. Since voter turnout varies from state to state, polls with likely voters are weighted 3 times higher than those with registered voters.

segment_weight = np.where(segment == 'LV',3,1)

Margin of error rises and falls depending on the population size, sample size, and confidence interval of the survey. The larger samples will have a smaller margin of error. Polls are ranked according to margin of error as the second weight used to determine average.

candidate_weighted_average = (polls * MoE ranks * sum(segment_weight)) / sum(MoE ranks * sum(segment_weight))

Safe States, Likely States, and Leaning States

We see these terms on a number of electoral college maps, but it may not be apparent as to why a state has such a label.

One of the major issues with the 2016 election was that strong third party candidate performance led to a number of plurality leads; instances in which no candidate had a majority of 50% or greater. The 2020 election year has a traditional two candidate race, meaning we will see majority victories. This is the foundation for the definitions of the labels below.

Safe State is where a candidate has a majority beyond the margin of error and all undecided votes going to the opponent does not change the outcome.

Likely State is where a candidate has a majority within the margin of error and all undecided votes going to the opponent does not change the outcome.

Leaning State is where a candidate has a majority within the margin of error and all undecided votes going to the opponent can change the outcome.

Toss Up is where no candidate has a majority and a combination of the margin of error and undecided votes divided between candidates changes the outcome.

Margin of Error

We are assuming a 3% margin of error for the weighted averages. why assume a 3% margin of error? The simple answer is that it comes down to sample size based upon population size.

Wyoming is the least populated state with around 600,000 residents. In 2016, there were 258,788 ballots cast for the presidential election. For a population of 250,000 likely voters, a sample size of 782 would have a margin of error of 3.5% and a sample size of 1,527 would have a margin of error of 2.5%.

A population of 100,000,000 has similar sample size requirements, 784 and 1,537 respectively. As you look at polling coming in, you see sample sizes near 1,000 or if you average all the polling sample sizes of a state it is also near 1,000, thus a 3% margin of error for the weighted average of state polls.

You can read more about sample sizes here. The lookup table for required sample sizes is one I have used since grad school.

Monte Carlo Simulation for Probabilities

I know upfront I said we are not forecasting a winner and that is still the case. In order to make the state labels easier to understand, we use the weighted averages and 3% margin of error to determine the probability a candidate can win a state without any undecided voter support.

The margin of error gives us a range of what the actual number could be. For example, let’s say a candidate is polling at 50% with a 4% margin of error. That means the actual number would be within this range below.

range of poll

When we do the same for the competitor, then we can see who may be winning the race. A Safe State for a candidate would look like this

safe

Unfortunately, there are a number of races where the margin of error will overlap. Likely and Leaning States are where we have our first potential area for misinterpretation

confusion

This is why we introduce random error within the range. Using a random number generator for each candidate, we can get a result from a simulation that looks like this below.

single sim

That is one random outcome. What we need to do is figure out how often we can see one candidate ahead of another.

To do a single monte carlo simulation, we would input the respective weighted averages of the candidates and the margin of error and assume normal distribution of the errors.

def moe(x, y, n):
    min_x = x - n
    max_x = x + n
    std_x = (max_x - min_x) / 4
    min_y = y - n
    max_y = y + n
    std_y = (max_y - min_y) / 4
    return round(random.gauss(x, std_x),1), round(random.gauss(y, std_y),1)

Running a simulation one time may lead to an outcome that does not have either candidate with a majority or a sum of votes that does not equal 100% of votes cast.

Undecided Voters

Whenever a round of a simulation does not end up with a total of 100%, we capture that as our undecided voters. The reason this number is important is because undecided voters determine the winner of close races.

def undecided(x,y,n):
    cand1, cand2 = moe(x, y, n)
    u = round((100 - cand1 - cand2),1)
    return round(cand1,1),round(cand2,1),round(u,1)

Now we can run the simulation multiple times in order to generate a probability of how often a candidate can win a majority without any undecided voter support.

def sim(x,y,n,num_sims):
    c1 = []
    c2 = []
    u = []
    c1_wins = 0
    c2_wins = 0
    for i in range(num_sims):
        x1,y1,u1 = undecided(x,y,n)
        c1.append(x1)
        c2.append(y1)
        u.append(u1)
        if x1 > 50:
            c1_wins += 1
        if y1 > 50:
            c2_wins += 1
    return c1_wins/num_sims, round(average(c1),1),c2_wins/num_sims,round(average(c2),1), round(average(u),1)

This gives us a much better perspective on the state of the election race as we can now identify which states are likely to be determined by undecided voters.

Another challenge in interpreting the data remains as the label for Leaning states often gives the false impression that the state will be awarded to that candidate. We saw this scenario play out in the 2016 election as unaccounted for undecided voters tipped several states and the electoral college.

Blog Posts