Doctor, validate thyself!

I recently wrote about my long-term fascination with American electoral geography, the way voting patterns are distributed across states, Congressional districts, counties and other areal units.

Pursuing this interest as an undergraduate political science major, I began to explore state-level presidential voting data. During my junior year, I created a large chart that ranked how states had voted in a series of recent presidential elections, from most to least Democratic, concluding with the 1984 presidential election (then the most recent one).

And I noticed that while Ronald Reagan, the incumbent Republican president, had absolutely walloped Democrat Walter Mondale in 1984, winning the popular vote by 18.2 percentage points (58.8-40.6%) and the Electoral College vote 525-13 (Mondale won only his home state of Minnesota [49.7-49.5%] and the District of Columbia [DC]), there were a few states Mondale lost by a much smaller margin than 18.2 percentage points: Massachusetts (-2.8 percentage points), Rhode Island (-3.6), Maryland (-5.5), Iowa (-7.4), Pennsylvania (-7.4), New York (-8.0) and Wisconsin (-9.2).

As usual, all presidential data are from Dave Leip’s indispensable Atlas of U.S. Presidential Elections.

Consider Pennsylvania, the state in which I was born. While the nation was voting for Reagan by 18.2 percentage points, Pennsylvania was voting for Reagan by “only” 7.4 percentage points (53.3-46.0%), a difference of 10.8 percentage points.

That is, Pennsylvania in 1984 was 10.8 percentage points MORE Democratic than the nation as a whole. Had Mondale lost by “only” 10 percentage points, he would (theoretically) have won Pennsylvania 25 electoral votes (EV), as well as those of Iowa (8), Maryland (10), Rhode Island (4) and Massachusetts (13)—an additional 60 EV.

And had Mondale lost by “only” 7.7 percentage points—as Democrat Michael Dukakis would to Republican George H. W. Bush in 1988—he would also have theoretically won the combined 53 EV of New York (36), Wisconsin (11) and West Virginia (6), boosting his total to 126 EV (better, but still 144 EV shy of the 270 needed to win the White House).

Still, that is close to the 112 EV Dukakis won in 1988.[1] As the purple-inked states on this beautiful hand-drawn map[2] show, Dukakis lost seven states (Illinois, Pennsylvania, Maryland, California, Vermont, Missouri, New Mexico) totaling 125 EV by smaller margins (2.1-5.0 percentage points; mean=3.3) than he did nationally. Had Dukakis lost the election by just 2.7 points, he would theoretically have won 237 EV, only 33 shy of the necessary 270.

1988 Presidential map

The conclusion I drew (no pun intended) was that the “relative partisan margin” of a state—how much more or less Democratic it was than the nation as a whole in a given election—was a useful way to think about electoral geography. Of course, other elections in the state (governor, United States Senate, United States House) are of interest as well, as Paul T. David observed in his Party Strength in the United States, 1872-1970; at one point, I even examined the partisan composition of state legislatures.

Good times.

Two decades later, despite having walked away from a doctoral program in political science, I was still interested in these questions, and I began to collect state-level presidential data again.

My primary goal was to get a sense of how EV’s would be distributed between the parties in the next presidential election (either 2008 or 2012) given a series of hypothetical national popular votes (e.g., Democrat wins nationally by 3 percentage points), essentially updating the exercises with 1984 and 1988 presidential election data I summarized earlier. I was particularly interested in whether the Democratic or the Republican presidential nominee would win more EV if the national vote were divided evenly between the two-major parties.

Having gathered these data, I set about constructing a measure of the relative partisanship of a state, intending to combine data from multiple elections to smooth out any idiosyncratic results.

For example, Democratic presidential nominees won Michigan by an average of 7.4 percentage points from 1992 through 2004, making the state an average 4.3 percentage points more Democratic than the nation. Democrat Barack Obama then won the Wolverine State by 16.4 percentage points in 2008 (9.2 percentage points better than he did nationally). In 2012 and 2016, meanwhile, the average margin in Michigan (with Republican Donald Trump winning by 0.2 percentage points in 2016) dropped to just 4.6 percentage points (only 1.6 percentage points more Democratic than the nation). A reasonable explanation (though not a conclusive one) for the Democratic spike in 2008 is the disproportionate impact of the 2007-08 recession on the automobile industry in Michigan, as voters took out their frustrations with term-limited President George W. Bush on 2008 Republican presidential nominee John McCain.

The questions then became

How many years do I use?
How, if at all, do I “weight” these elections?

My initial instinct was to use five years of data, with a weighting scheme of 1-2-3-4-5, meaning the least recent of the relative Democratic margins (D%-R% of total state vote minus D%-R% of total national vote) would be weighted 1/15 while the most recent one would be weighted 5/15, or 1/3.

This became my first “weighted relative Democratic margin” (W-RDM).

However, as I was also interested in assessing changes in relative state-level partisanship over time, using five elections meant that, prior to 2016, I only had four W-RDM values for a state—giving me only three election-to-election changes in W-RDM to examine[3].

I finally settled on three years in what I call my 3W-RDM [4] in order to minimize the fact that presidential and vice-presidential nominees tend to fare better, relative to their overall performance, in their home states. It is rare for one person to be on at least three consecutive presidential tickets (only two, George H W Bush, 1984-1992 and Gore, 1992-2000, of 21 total unique presidential and vice-presidential nominees, 1984-2012).

And that is the measure I have utilized in a series of posts (here, here, here; I do not specifically use 3W-RDM here, but the logic is the same).

As an example, here is how Nevada voted for president in 2004, 2008 and 2012:

Year State D% – R% National D% – R% RDM

2004 -2.4 -2.5 D+0.1

2008 12.5 7.3 D+5.2

2012 6.7 3.9 D+2.8

The weighted average of the RDM values is (0.1 + 2*5.2 + 3*2.8)/6 = D+3.2. This was Nevada’s 3W-RDM prior to the 2016 election, so one would have expected that year’s Democratic nominee to do 3.2 percentage points better in Nevada than nationwide.

The 2016 Democratic presidential nominee, Hillary Clinton, won the national popular vote by 2.1 percentage points. So, my best estimate (based upon Nevada’s recent voting history) was that Clinton would win Nevada by 5.3 percentage points (2.1+3.2). This estimate was too optimistic, however, as she won Nevada by 2.4 percentage points, 2.9 percentage points lower than expected.

**********

Just bear with me while I briefly describe two other highly reputable approaches to calculating the relative partisan margin of a state (or other areal unit).

The Cook Political Report, the “independent, non-partisan newsletter that analyzes elections and campaigns for the US House of Representatives, US Senate, Governors and President as well as American political trends” has been essential reading for any serious student of American politics since its founding in 1984 by Charlie Cook, formerly “a staffer on Capitol Hill, a campaign consultant, a pollster, and a staff member for a political action committee.”

In 1997, Cook began to calculate the Partisan Voting Index (PVI) as a way to measure “how each [state or Congressional] district performs at the presidential level compared to the nation as a whole.”

The Cook PVI is simply the difference (state minus nation) between two averages:

The average Democratic share of the state-level two-party vote in the previous two presidential elections
The average Democratic share of the national two-party vote in the previous two presidential elections.

In 2008, Obama and McCain won 52.9% and 45.6%, of the national popular vote, respectively, splitting 98.5% of the total vote. Looking only at this two-party vote, Obama received 52.9/98.5 = 53.7% and McCain received 45.6/98.5=46.3%, meaning Obama beat McCain nationally by 7.4 percentage points in the two-party vote.

A similar calculation for 2012 (Obama 51.0%, Republican Mitt Romney 47.1%) shows that Obama beat Romney nationally in the two-party vote by 3.9 percentage points.

The average of 7.4 and 3.9 is 5.7.

In Nevada, meanwhile, overall Obama beat McCain 55.1-42.6%, and he beat Romney 52.4-45.7%; in the two-party vote, Obama won by margins of 12.8 (56.4-43.6%) and 6.8 (53.4-46.6%) percentage points.

The average of 12.8 and 6.8 is 9.8.

Subtracting 5.7 from 9.8 gives you 4.1, meaning that the PVI for Nevada going into 2016 was D+4.1, only a little more Democratic (D+3.2) than the 3W-RDM suggested.

The other approach is the “partisan lean” calculated by the data journalism website fivethirtyeight.com, a favorite of this blog.

It is even more straightforward than Cook PVI:

(RDM 2^nd-most recent presidential + 3*RDM most recent presidential election)/4

Using Nevada again, we have already seen that in 2008 and 2012, Nevada voted 5.2 and 2.8 percentage points more Democratic than the nation; the 538 partisan lean (PL) formula gives you (5.2 +3*2.8)/4 = (5.2+8.4)/4=13.6/4=3.4.

Thus, Nevada’s 538 PL going into 2016 was D+3.4, broadly similar to the Cook PVI of D+4.1 and the 3W-RDM of D+3.2, and the projected Nevada vote based on the 538 PL was D+5.5.

**********

In this post, I assessed the validity of one of my baseball player performance metrics—the Index of Offensive Ability—by comparing it to two other commonly-used statistics, OPS+ and WAR. Here is how I described validity in that post:

Validity is the extent to which an index/measure/score actually measures what it is designed to measure, or “underlying construct”. While now considered a unitary concept, historically, there were three broad approaches to “assessing” validity: content, construct and criterion.

Content validity is the extent to which an index/measure/score includes the appropriate set of components (not too many, not too few) to capture the underlying construct (say, a state’s partisan “lean”). Construct validity is how strongly your index/measure/score relates to other indices/measures/scores of the same underlying construct, including a priori expectations of what values should be (sometimes called face validity). Criterion validity considers how well outcomes “predicted” by the index/measure/score align with the actual outcomes.

As you have probably guessed by now, I will spend the rest of this post comparing my 3W-RDM to the Cook PVI and the 538 PL.

But first, I offer a mea culpa.

Before my “Democratic blue wall thesis” post in February 2017, I had used the 3W-RDM (which did not even have a name until then) only for my own edification and amusement. That, however, does not excuse me for not even attempting to validate this measure until now. Moreover, I should not have started writing data-driven posts using the 3W-RDM—implicitly asserting its validity without empirical evidence—until I had performed that validation.

I now present that empirical validation evidence.

Content validity: All three measures not only use presidential election voting data, but they also compare state and national margins in some way. This makes sense because presidential elections feature one party nominee advocating (theoretically) the same platform in every state. By comparison, other statewide elections (governor, Senate) feature candidates who share a party label yet may have very different policy stances. While this may be less true now for Senate races, which are becoming more nationalized, there is still a vast difference between Democratic Senators like Joe Manchin of West Virginia and Elizabeth Warren of Massachusetts, and between Republican governors like Charlie Baker of Massachusetts and Sam Brownback of Kansas.

Thus, despite differences in number of elections utilized, weighting and margin calculation, all three measures arguably have high content validity.

Construct validity. A correlation coefficient (“r”) is a number between -1.00 and +1.00 indicating how two variables co-relate to each other in a linear way[5]. If every time one variable increases, the other variable increases, that would be r= +1.00, and if every time one variable increases, the other variable decreases, that would be r=-1.00. R=0.00 means there is no linear association between the two variables.

I calculated the projected presidential election margin (D% total vote – R% total vote) in each state (plus DC) in every presidential election from 1996 through 2016 by adding each state’s partisan lean score before that election to the actual national popular vote margin. In other words, I repeated the example of Nevada (projected 2016 presidential vote: Cook PVI=D+6.2, 538 PL=D+5.5, 3W-RDM=D+5.2) for all 306 state-level presidential election margins.

Here are the average correlations (PVI vs. PL, PVI vs. 3W-RDM, PL vs. 3W-RDM) between the three sets of projected margins in each election year:

1996 +0.995

2000 +0.994

2004 +0.997

2008 +0.998

2012 +0.997

2016 +0.999

Clearly, each partisan lean measure is nearly identically capturing the underlying partisan distribution of states from most to least Democratic, indicating that each measure has very high construct validity.

Criterion validity. Building upon the analysis of construct validity, the simplest way to assess criterion validity is to compare the projected presidential election margin in each state in each year to the actual margins.

Table 1 does this for each state in 2016. A negative difference means the state voted less Democratic than expected, and a positive difference means the state voted more Democratic than expected. States are sorted from most “less Democratic” to most “more Democratic.”

Table 1: Differences Between Projected and Actual State-Level Presidential Vote Margin (Democratic % – Republican %), 2016

State	Cook PVI	538 PL	3W-RDM	Mean
West Virginia	-17.8%	-15.8%	-20.0%	-17.8%
North Dakota	-17.6%	-16.2%	-16.6%	-16.8%
Iowa	-13.7%	-13.5%	-13.5%	-13.6%
South Dakota	-12.7%	-11.6%	-12.5%	-12.3%
Maine	-10.2%	-10.2%	-10.1%	-10.2%
Missouri	-10.1%	-8.8%	-10.7%	-9.9%
Indiana	-10.8%	-9.0%	-9.0%	-9.6%
Michigan	-9.8%	-8.8%	-9.1%	-9.3%
Rhode Island	-9.1%	-9.4%	-9.1%	-9.2%
Ohio	-8.4%	-8.8%	-8.9%	-8.7%
Montana	-8.4%	-6.8%	-7.4%	-7.5%
Wisconsin	-7.8%	-6.8%	-7.1%	-7.2%
Hawaii	-9.0%	-8.6%	-3.9%	-7.1%
Kentucky	-6.5%	-6.1%	-7.9%	-6.9%
Vermont	-7.2%	-6.9%	-5.2%	-6.4%
Delaware	-7.2%	-6.3%	-5.8%	-6.4%
Wyoming	-5.0%	-5.0%	-6.7%	-5.6%
Tennessee	-4.5%	-4.3%	-6.6%	-5.1%
Pennsylvania	-5.1%	-4.7%	-5.4%	-5.1%
Minnesota	-4.1%	-4.2%	-4.5%	-4.3%
New Hampshire	-3.8%	-3.6%	-4.0%	-3.8%
Nevada	-3.8%	-3.1%	-2.8%	-3.3%
Alabama	-2.0%	-3.1%	-3.3%	-2.8%
Mississippi	-1.8%	-3.3%	-2.6%	-2.5%
Connecticut	-2.9%	-2.3%	-2.4%	-2.5%
Arkansas	-1.0%	-1.6%	-5.0%	-2.5%
Nebraska	-2.8%	-2.4%	-1.8%	-2.3%
New York	-1.8%	-2.7%	-1.8%	-2.1%
South Carolina	-0.9%	-1.6%	-1.4%	-1.3%
Oklahoma	-0.4%	-0.8%	-2.1%	-1.1%
New Mexico	-1.2%	-0.6%	0.1%	-0.5%
Illinois	-0.9%	0.6%	0.2%	0.0%
Florida	0.5%	0.1%	0.1%	0.2%
New Jersey	0.7%	-0.6%	0.7%	0.3%
Oregon	-0.1%	0.4%	0.6%	0.3%
Louisiana	2.1%	0.5%	-0.6%	0.7%
North Carolina	0.8%	0.4%	1.2%	0.8%
Idaho	1.2%	0.9%	0.7%	1.0%
Colorado	1.2%	1.3%	1.9%	1.4%
Kansas	1.8%	2.1%	1.4%	1.8%
Washington	2.9%	3.0%	3.3%	3.1%
Maryland	3.7%	3.1%	4.6%	3.8%
Virginia	3.7%	3.5%	4.5%	3.9%
DC	4.3%	5.2%	4.8%	4.8%
Georgia	5.1%	4.7%	5.1%	5.0%
Massachusetts	5.8%	6.0%	4.7%	5.5%
Alaska	7.2%	3.8%	5.6%	5.5%
Arizona	9.0%	8.0%	7.4%	8.1%
Texas	8.5%	8.4%	8.5%	8.5%
California	9.4%	9.3%	10.6%	9.8%
Utah	24.8%	27.6%	24.8%	25.7%
*Mean*	*-2.3%*	*-2.1%*	*-2.3%*	*-2.2%*

On average, the measures overestimated Clinton’s performance by a relatively low 2.2 percentage points, with no meaningful difference across measures. Five states—West Virginia, North Dakota, Iowa, South Dakota and Maine—were at least 10 percentage points less Democratic than projected using all three measures; Clinton still won Maine, but by “only” 3.0 percentage points. Four states—Utah, California, Texas and Arizona—were at least seven percentage points more Democratic than projected using all three measures; Clinton won only California of this group, though there are signs that Texas and, especially, Arizona are becoming more Democratic. The massive disparity in Utah results from the presence of unaffiliated presidential candidate Evan McMullin, a Utah native, on the ballot; his 21.3% of the vote cut deeply into Trump’s vote, so the latter “only” won the state by 17.9 percentage points.

As Table 2 shows, the performance of these measures—using the average of the actual difference in margins—was the worst since 2000, when they also overestimated Democratic performance by an average of 2.2 percentage points. On average, across all six presidential elections, these measures overestimated Democratic performance by just 0.9 percentage points, a solid performance.

Table 2: Average Difference Between Projected and Actual State-Level Presidential Vote Margin (Democratic % – Republican %), 1996-2016

Year	Cook PVI	538 PL	3W-RDM	Mean
1996	-0.7%	-1.0%	-0.9%	-0.9%
2000	-2.0%	-2.2%	-2.5%	-2.2%
2004	0.1%	0.4%	-0.3%	0.1%
2008	0.7%	0.5%	0.4%	0.5%
2012	-0.7%	-0.8%	-0.6%	-0.7%
2016	-2.3%	-2.1%	-2.3%	-2.2%
*Mean*	*-0.8%*	*-0.9%*	*-1.0%*	*-0.9%*

These values can be deceptive, however. Consider the performance of the 3W-RDM in 2016. It overestimated Clinton’s margin in Montana by 7.4 percentage points, and it underestimated her margin in Arizona by an identical 7.4 percentage points. In both states the difference was 7.4 percentage points, but averaging the two (0.0 percentage points) would suggest that the 3W-RDM was spot on.

In fact, the three measures missed the actual presidential election margin by at least five percentage points in 26 states.

Table 3 resolves this problem by displaying the average absolute value of the difference between the projected and actual presidential election margins.

Table 3: Average of Absolute Value of Differences Between Projected and Actual State-Level Presidential Vote Margin (Democratic % – Republican %), 1996-2016

Year	Cook PVI	538 PL	3W-RDM	Mean
1996	5.4%	5.1%	5.6%	5.4%
2000	5.5%	5.9%	6.8%	6.1%
2004	3.9%	3.6%	4.2%	3.9%
2008	6.3%	5.7%	6.2%	6.1%
2012	3.3%	3.2%	3.5%	3.3%
2016	5.9%	5.6%	5.9%	5.8%
*Mean*	*5.0%*	*4.8%*	*5.4%*	*5.1%*

On average, the projected and actual presidential election margins differed by 5.1 percentage points in either direction. The 3W-RDM, which differed by an average of 5.4 percentage points, fared slightly worse than the Cook PVI and 538 PL. The best years for these measures were two re-election years, 2004 (3.9 percentage points) and 2012 (3.3), and the worst years were the open seat elections of 2000, 2008 (both 6.1) and 2016 (5.8). The overall worst performance was the 3W-RDM in 2000 (6.8), while the overall best performance was the 538 PL in 2012 (3.2).

I performed identical analyses to those summarized in Tables 2 and 3 using two alternate versions of the 3W-RDM, one which used a 1-3-5 weighting scheme and one which weighted all three years equally. The results were nearly identical to those shown here (though the non-weighted 3W-RDM tended to perform worse on the absolute value differences), suggesting that if the 3W-RDM is slightly less “predictive” than the other two measures, it is not due to the weighting scheme but (most likely) to the inclusion of data from a third election year.

Finally, I counted how many—and which—states were “called” incorrectly by each measure in each presidential election.

Table 4: “Mis-called” States, 1996-2016

Year	Cook PVI*	538 PL	3W-RDM	Average
1996	9 AZ, CO, FL, MT, NV, NH, NC, SD, TX	8 AZ, CO, FL, GA, MT, NC, SD, TX	9 AZ, CO, FL, GA MT, NH, NC, SD, TX	*8.7*
2000	5 AR, CT, LA, MO, WV	6 AR, CT, LA, MO, NH, WV	5 AR, CT, LA, MO, WV	*5.3*
2004	4 NH, OR, PA, WI	3 NH, OR, WI	3 NH, OR, WI	*3.3*
2008	4 AR, IN, MO, NC	4 AR, IN, MO, NC	7 AR,AZ, IN, MO, NC, VA, WV	*5.0*
2012	0	1 FL	0	*0.3*
2016	5 IA, MI, OH, PA, WI	5 IA, MI, OH, PA, WI	5 IA, MI, OH, PA, WI	*5.0*
*Mean*	*4.3*	*4.5*	*4.8*	*4.5*

*States in boldface were “predicted” Democratic wins, and states in italics were

“predicted” Republican wins.

On average, four or five (out of 51) states are “mis-called” in a given presidential election. Again, the 3W-RDM fared slightly worse (4.8) than average (4.5). Of the 83 total misses (out of 918 possibilities), 52 (62.7%) were states that were projected Democratic wins that were actually won by the Republican nominee.

The presidential election of 1996, when Democrat Bill Clinton cruised to an easy reelection, had the most mis-called states, eight or nine; seven states (Arizona, Colorado, Florida, Montana, North Carolina, South Dakota, Texas) were mis-called by all three measures. By contrast, only one state was mis-called in 2012, Florida by the 538 PL: it projected Obama would lose Florida by 0.1 percentage points when he in fact won it by 0.9 percentage points.

Despite these differences, I would argue that all three measures have high criterion validity, as each does a reasonably good job of “projecting” the actual presidential election margin in a given state and year. My 3W-RDM performed only slightly worse than the other two measures, so I will stick with it for now.

**********

One final note about the utility of partisan lean measures.

The Alabama special Senate election between Republican Roy Moore and Democrat Doug Jones to be held on December 12, 2017 is drawing national attention for two reasons. One, a win by Jones would reduce the Republican Senate majority to 51-49. Two, Moore has been dogged by allegations of sexual misconduct with minors (as well as having been removed twice as Alabama’s Chief Justice for defying federal court orders).

The public polls of this election, which once showed a Moore lead of ~11 percentage points, have tightened considerably since the allegations first appeared on November 9, 2017. As of now, depending on how you aggregate and weight these polls, Moore is somewhere between four percentage points ahead and one percentage points ahead; my best estimate is that Moore is ahead 1.7 percentage points.

But consider this. Following the 2016 presidential election, the average partisan lean for Alabama (using all three measures) is D-28.7. As of this writing, the best estimate of how Democrats will fare in the 2018 Congressional elections is that they are ahead by 7.8 percentage points.

Putting these two values together implies that a generic Republican Senate candidate should be leading a generic Democratic Senate candidate by 20.9 percentage points (28.7 minus 7.8): this should not even be a close contest.

However, the polls suggest that Jones is performing somewhere between 16.9 and 21.9 percentage points better than a generic Democrat—that is a stunning difference, and one that may bode very well for Democrats in 2018.

Until next time…

[1] Technically, he only won 111, as one Democratic elector in Washington (state) cast his presidential vote for Lloyd Bentsen, the 1988 Democratic nominee for vice president, and cast his vice presidential vote for Dukakis.

[2] I freely confess to being the artist. This kid-friendly (fine, I had just turned 22) exhortation to vote must have been in the Comics section of the Washington Post (I was living in DC at the time) the Sunday before the 1988 elections.

[3] My data start in 1984, so I would only have 5W-RDM for 1984-2000, 1988-2004, 1992-2008, 1996-2012 and 2000-2016.

[4] I have experimented with adding a weighted linear trend to the 3W-RDM. The logic is that if I want to use the previous three election margins in a state to “forecast” the state margin in the next election, I should account for the fact that, over time, some states are growing relatively more Democratic (e.g., Nevada has become 11.7 percentage points more Democratic relative to the nation since 1984-1992) or less Democratic (e.g., West Virginia, 44.7 percentage points). Adding a weighted average of all previous election-to-election changes in RDM to a 3W-RDM would, theoretically, account for any increased partisanship over the ensuing four years. For the analyses below, however, there was very little difference between the 3W-RDM and the 3W-RDM+weighted linear trend, so I exclude it.

[5] More formally r = covariance(x,y) divided by SD(x) * SD(y).