Exploring allegations of fraud in the 2020 U.S. Election via simulation

9 min readDec 6, 2020

Background

Since the election, there have been numerous claims of electoral fraud by voters and politicians alike. These claims range from good-faith questions with an openness to consider other data perspectives, data that lack context or accuracy (often related to turnout), and bad-faith efforts to deceive.

In this document, I will explore via simulation the allegations of fraud in decisive swing states by resampling shifts in margins from the 2016 to 2020 election from various subsets of states that have not aroused suspicion in these allegations. The data used in this document come from The Cook Political Report. I have also identified which states have made use of Dominion software from their website.

Exploring the data

The data set contains information for each “state” (I’ll use “state” here in reference to any administrative unit which awards electoral votes, thereby including the 50 states as well as the District of Columbia and the separate congressional districts in Nebraska and Maine). In particular, the data set shows which party won the state, whether the state made use of Dominion software in tabulating results, whether the state’s vote totals are final (i.e. officially certified), accrued vote totals for Biden, Trump, and other candidates, the number of electoral votes, the margin (computed as Democrat − Republican), and the vote total in 2016.

From this, we can compute the margin in 2020 and the margin shifts from 2016.

The states which have certified their results present a broad, nationwide perspective consistent with a 1–5 point swing toward Biden relative to Clinton’s performance in 2016:

Importantly, this shift is notable in states Trump won as well as in states that did not use Dominion software:

While the narrative in data-based allegations almost exclusively focuses on key swing states Biden won or states that made use of Dominion software, this exploratory analysis does not support this — there are broad, nationwide shifts toward Biden both in states Trump won and in states that did not make use of Dominion software.

Simulating the election via resampling

To explore these concepts further, I will undertake an experiment in which we award Trump every state he won in 2020, comprising 232 electoral votes, and award Biden 0 electoral votes to start. Because the allegation is that margins in states Biden won are not to be trusted, I will pose an alternative universe in which Biden starts with the 2016 margins in these states and that these states see margin shifts randomly sampled from either (1) the margin shifts in states Trump won in 2020 or (2) states in which Dominion software was not used. For example, we might discard the 3.0% Democratic shift in Michigan in 2020 and replace it with the margin shift in another state. If, for instance, we randomly sample Wyoming’s margin shift of 2.9%, Biden will still end up winning Michigan, but by 2.7% instead of 2.8%.

The settings of this experiment are clearly biased toward Trump. There is no risk of Trump losing any states he won, while Biden is in a position to have to defend all the states Clinton won in 2016 as well as attempt to flip the other states that Trump won in 2016 but did not win in 2020. In effect, this experiment sets the floor for Trump’s electoral vote total at 232 and the ceiling of Biden’s electoral vote total at 306.

It is worth noting that with this setup, Biden will automatically win some states by virtue of the fact that Clinton’s winning margin in 2016 was larger than the highest margin shift toward Trump in 2020 among certified states, 2.40% in Utah. Consequently, Biden will always win at least 218 electoral votes in this setting.

Ultimately, this leaves the following states called for Biden in 2020 in play in the experiment:

Resampling using margin shifts in states Trump won in 2020

First, I will explore simulated outcomes of the election in which we resample margin shifts from the states Trump won in 2020.

To clarify, these are the states up for grabs in the simulation:

These are the states from which we will resample the margin shifts and assign to the open states:

Notably, Idaho and Missouri are not included in this subset, as they had not yet certified their votes at the time of writing this document. According to the Cook Political Report, the current margin shifts in these states are D +0.97% and D +3.27%, respectively. Nonetheless, these states are still given to Trump in this simulation.

I’ve run 1 million simulations in which we reassign margin shifts to the states Biden won from the certified states that Trump won. From these results, we can extract the relative proportions of times that Biden wins each of the open states:

In this experiment, Trump tends to win Georgia and Arizona, with Biden winning the states only about 11.5% and 26.9% of the time, respectively. However, Biden tends to win the other states, including about 84.7% of the time in the key “blue wall” states of Michigan, Pennsylvania, and Wisconsin.

Here is the distribution of the simulated electoral votes Biden received:

The distribution ranges from a minimum of 218 electoral votes for Biden to a maximum of 306 electoral votes, with the most common outcome being 279 electoral votes, which occurred on about 23.7% of simulations. Overall, Biden ended up with at least 270 electoral votes in about 67.8% of the time, even in this setting where we negate all of Biden’s state wins and replace the margin shifts in those states with resampled margin shifts from states Trump won in 2020.

Resampling using margin shifts in states that did not use Dominion software

Here, I will repeat the simulation exercise, but use margin shifts only in those states that did not use Dominion software in their tabulation.

To clarify, these are the certified states that did not use Dominion software from which we will resample the margin shifts:

Notably, Hawaii and Idaho, which did not use Dominion software, are not included in this subset, as they had not yet certified their votes at the time of writing this document. According to the Cook Political Report, the current margin shifts in these states are R +2.72% and D +0.97%, respectively. Because Hawaii thus far has seen the largest shift toward Trump, we will include the current margin shift in this simulation anyway, even though the votes have yet to be certified. By doing so, Nevada is added to the list of vulnerable states, as Clinton won it by just 2.42% in 2016.

I’ve run 1 million simulations in which we reassign margin shifts to the states Biden won from the certified states that did not use Dominion software plus Hawaii. The relative proportions of times that Biden wins each of the open states are:

Compared to the previous experiment, Biden now wins Georgia and Arizona about 33.4% and 48.1% of the time, respectively, and wins Michigan, Pennsylvania, and Wisconsin about 88.9% of the time.

Here is the distribution of the simulated electoral votes Biden received under this simulation design:

The distribution ranges from a minimum of 217 electoral votes for Biden to a maximum of 306 electoral votes, with the most common outcome being 279 electoral votes, which occurred on about 18.8% of simulations.

In this setting where we negate all of Biden’s state wins and replace the margin shifts in those states with resampled margin shifts from certified states that did not use Dominion software plus Hawaii, Biden ends up winning the election about 85.7% of the time.

Other allegations

It is worth noting that suspicions of the results are not unique to Republicans. There have been claims by Democrats that some of the Senate races may have been illegitimate, particularly in South Carolina after Lindsey Graham called the Republican Secretary of State in Georgia asking about the legitimacy of some ballots. According to this theory, Lindsey Graham may have been involved in efforts to discard large numbers of (largely Democratic) votes to secure his re-election. The problem with this argument is two-fold:

South Carolina’s increase in 19.5% total votes for president compared to 2016 is above the national average
there were around 2,000 more votes in the Senate race than in the Presidential race in South Carolina.

Other close Senate races have been won by the same party as the presidential race, with the exception of Maine, where incumbent Republican Susan Collins was victorious despite Biden winning the state overall. Such a result, however, is easily understood by considering Maine’s history of independent politicians and the fact that Collins showed the least propensity to vote with Trump over the previous 4 years of any Republican Senator.

Discussion

In both simulation settings for the presidential race, I have taken a fairly extreme stance that all states that Biden win in 2020 saw suspicious margin shifts, and that these margin shifts were to be resampled from various subsets of states that are not raised in allegations of wide-scale electoral fraud. Despite this disadvantage, Biden ended up winning at least 270 electoral votes a majority of the time, particularly when resampling from states that did not use Dominion software. This exercise illustrates that suspicions of electoral fraud in key states Biden won are not borne out by the nationwide environment of margin shifts from 2016.

Usually, careful exploration of data and historical patterns provides key context that debunks claims of illegitimate elections. Other common data-based allegations of fraud incorporate topics such as Benford’s Law, turnout rates, and voting patterns within cities, and they all have met with rational statistical explanations for the data, such as this YouTube video exploring claims of fraud due to purported violations of Benford’s Law.

Faith in the electoral system is key to its continuing success. Mistrust of election results exists on all sides of politics, which can lead to hopeful “mining” of data to confirm these misgivings. Due to the high stakes of electoral outcomes, honest data-based questions should be carefully considered. I think statisticians and data scientists have an important role in providing an open but informed data-based perspective. The 2020 US election was unprecedented in a number of ways, from the turnout (a modern record) to the manner in which ballots were cast and counted. Consequently, it is important for statisticians and data scientists to be able to listen to data-based arguments, and if appropriate, provide counterarguments or context to address the claims.

However, it is also incumbent on the part of those making allegations that they be willing to listen to appropriate counterarguments and not dismiss them under the guise of “lies, damn lies, and statistics”. When encountering claims of electoral malfeasance, it is important to consider whether any data used to make such claims are contextual and complete, particularly in an age of increasing partisanship and proliferation of misinformation.