Quantifying Biden’s choices for running mate

Presumptive 2020 Democratic presidential nominee Joseph R. Biden Jr. stated in a May 27, 2020 interview he hoped to choose his vice-presidential running mate by August 1. In March, Biden definitively stated he would choose a woman to run with him. Meanwhile, a recent Morning Consult poll tested the relative strength of nine rumored candidates, finding that only three Senators even slightly boosted Biden’s electoral position: United States Senator (“Senator”) from Massachusetts Elizabeth Warren, California Senator Kamala Harris and Minnesota Senator Amy Klobuchar. The other six—Wisconsin Senator Tammy Baldwin, Nevada Senator Catherine Cortez Masto, former Georgia State House Minority Leader Stacey Abrams, Michigan Governor Gretchen Whitmer, New Mexico Governor Michelle Lujan Grisham and United States House of Representatives Member (“Representative”) from Florida Val Demings—all hurt Biden, albeit slightly. Notably, the first three Senators sought the 2020 Democratic presidential nomination, boosting their national profile in the process—and making it difficult to distinguish “actual” electoral boost from name recognition.

In the early summer of 2016, when it became clear former Secretary of State Hillary Clinton would be the Democratic presidential nominee that year, I built a Microsoft Excel spreadsheet listing 200 possible choices—including every current Senator, governor, Representative serving as party leader or Committee ranking member, mayor of one of the top 10 cities by population, or Cabinet member, as well as anyone who had served in that position within the last 10 years and a handful of other options. Virginia Senator Tim Kaine—who Clinton named her running mate on July 22, 2016—just edged out Klobuchar and former Labor Secretary Hilda Solis for the highest score.

Once it became clear Biden would be the nominee, meanwhile, I built an analogous spreadsheet. Along with every current and recent Senator, governor, big-city mayor and Cabinet official, I included all 89 women serving in the House as Democrats, as well as Abrams.

However, I excluded any woman who was:

  • Born outside of the United States, citing the “natural born citizen” requirement of Article II, Section 1 of the Constitution of the United States of America
  • Under the age of 35, citing the same requirements
  • From the state of Delaware, citing the requirement in Amendment XII that the vice president “shall not be an inhabitant of the same state as” the president.
  • A non-political figure such as media titan Oprah Winfrey or former First Lady Michelle Obama

My final list contained 123 candidates, including:

  • 80 current (79) or former (1) House members
  • 21 current (16) or former (5) Senators
  • 7 current (5) or former (2) governors
  • 9 former Cabinet officials
  • 4 mayors: former Houston, TX Mayor Annise Parker, as well as Atlanta, GA Mayor Keisha Lance Bottoms; San Francisco, CA Mayor London Breed and Chicago, IL Mayor Lori Lightfoot
  • Abrams and former presidential candidate Marianne Williamson

And here is a scene our younger daughter drew on the still-popular white board. This has nothing to do with Biden’s selection of a running mate; I just like it.

Nora Drawing May 2020

**********

To assess the candidates for vice president, I examined three broad categories:

  1. Demographic balance
  2. Governmental experience
  3. Electoral strengths and weaknesses.

The first category is both symbolic—women of color overwhelmingly support the Democratic Party—and practical—this year’s vice-presidential nominee could well the next presidential nominee. It also acknowledges that Biden is a 77-year-old white man, meaning a much younger woman of color would provide the clearest contrast.

The second category speaks to the ability of the Vice President to assume the presidency at a moment’s notice, as stipulated in Amendment XXV. This is especially important in a Biden Administration, as Biden, who would be the oldest president of the United States, suffered two brain aneurysms in February 1988.

Finally, the third category stipulates that no presidential candidate should ever assume victory, so it is important for a running mate to increase the likelihood of such a victory by, for example, unifying the party or helping to win key voting blocs or regions.

Ideally, then, Biden’s running mate would be a younger woman of color with sufficient governmental experience who can enhance his chances of defeating President Donald J. Trump in November 2020. Or, at the very least, his running mate will not hurt Biden in any of these categories; above all else, a vice-presidential running mate should do no harm.

I calculated a score for each variable, as follows:

1. Demographic balance.

Age. To balance the fact Biden will be 78 years old on January 20, 2021, I created the following point system, using the somewhat-arbitrary “center point” age of 57 (20 years younger than Biden) and adjusting for someone being too young:

  • 35-46 (n=17): (57-Age)-2*(47-Age)
  • 47-66 (60): 57-Age
  • 67-76 (32): (57-Age)-3*(Age-67)
  • ≥77 (12): (57-Age)-5*(Age-67)

This measure penalizes being older—especially older than Biden—far more than it rewards being younger, and it ranges from 10 for 47-year-old Representative Jahana Hayes of Connecticut to -124 for 86-year-old California Senator Dianne Feinstein.

Race/Ethnicity. To balance the fact Biden is white, I assigned the following points:

  • White (n=77): -50
  • Asian (6): 25
  • Native American (2): 50
  • Latina (10): 75
  • Black (25): 90
  • Black and Asian (1): 100

Harris has a Jamaican father and an Indian mother.

Sexual orientation: I want to think sexual orientation does not matter—but I subtracted 25 points if a listed woman was openly lesbian (Baldwin, Lightfoot, Parker and Minnesota Representative Angela Craig) and 10 points if there were rumors (former Maryland Secretary Barbara Mikulski, former Secretary of Homeland Security Janet Napolitano).

TOTAL. The sum of these three measures ranges from -124 (Feinstein) to 102 (Harris)—literally the two Senators from California. The top 10 candidates in this category are listed in Table 1:

Table 1: Top 10 2020 Democratic Vice-Presidential candidates by Demographic Balance

Name Age Ethnicity Lesbian? TOTAL
California Senator Kamala Harris 55 Black/

Asian

No 102
Connecticut Representative Jahana Hayes 47 Black No 100
Former Georgia State House Minority Leader Stacey Abrams 46 Black No 99
Massachusetts Representative Ayana Pressley 46 Black No 99
San Francisco Mayor London Breed 45 Black No 98
Atlanta Mayor Keisha Lance Bottoms 50 Black No 97
New York Representative Yvette Clark 55 Black No 92
Alabama Representative Terri Sewell 55 Black No 92
Former National Security Advisor Susan Rice 55 Black No 92
Georgia Representative Lucy McBath 59 Black No 88
Former EPA Director Lisa P. Jackson, Jr. 59 Black No 88
Former Attorney General Loretta Lynch 61 Black No 88

The average value of this sum is -21.4, with a standard deviation (SD) of 71.1; the median is -47 (Florida Representative Kathy Castor, New York Senator Kirsten Gillibrand, Nevada Representative Susie Lee).

2. Governmental experience.

I calculated the number of years a candidate held each of these offices—Senate, governor, other statewide office (last 10 years only), House, Citywide office (last 10 years only), Cabinet—up to a maximum of 12 years, the equivalent of two Senate terms, to avoid overlapping with age too much. From this I subtracted the number of years since a candidate held that office. I assigned Abrams 1.75 years for her time as Minority Leader of the Georgia State House.

I weight experience as follows:

  • Senate = 5
  • Governor = 4
  • Other statewide office = 3
  • House = 2
  • Citywide office = 2
  • Cabinet = 1
  • Other (e., Abrams) = 1

This variable ranges from 0 for Williamson to 64 for New Hampshire Senator (and former Governor) Jeanne Shaheen. The top 10 candidates in this category are listed in Table 2:

Table 2: Top 10 2020 Democratic Vice-Presidential candidates by Governmental Experience

Name Office 1 Office 2 TOTAL
New Hampshire Senator Jeanne Shaheen Senator, 12 years Governor, 6 years (-5 for time since 2008) 64
New York Senator Kirsten Gillibrand Senator, 12 years None 60
Minnesota Senator Amy Klobuchar Senator, 14 years None 60
Washington Senator Maria Cantwell Senator, 20 years None 60
Washington Senator Patty Murray Senator, 28 years None 60
Michigan Senator Debbie Stabenow Senator, 20 years None 60
California Senator Dianne Feinstein Senator, 28 years None 60
Wisconsin Senator Tammy Baldwin Senator, 8 years House, 14 years (-8 years since 2012) 52
Former Missouri Senator Claire McCaskill Senator, 12 years (-2 years since 2018) None 50
Massachusetts Senator Elizabeth Warren Senator, 8 years Credit 1 year for Directing Consumer Financial Protection Bureau 41

The average of this weighted sum is 17.8 (SD=15.1); the median is 16 (14 women with 8 years in the House, including former 2020 Democratic presidential nomination candidate Tulsi Gabbard of Hawaii).

3. Electoral strengths and weaknesses.

For this category, I considered eight questions:

  1. Will she help Biden win a key state in the Electoral College?
  2. Does she lack foreign policy or national security experience?
  3. Will she provide ideological balance?
  4. Will her ascension to the Vice Presidency cost Democrats a Senate seat?
  5. Is she a Senator up for reelection in 2020?
  6. Did she run for president in 2020?
  7. Has she ever run for political office?
  8. Is she a first-term member of the House?

Swing state status. On average, a vice-presidential nominee adds 2-3 percentage points to the party’s margin in her/his home state. But for most states—ones that are reliably Democratic or Republican, for example—these extra points mean nothing. In fact, choosing a running mate from one of these states could be considered a lost opportunity.

Using the probability Biden wins a given state in the 2020 presidential election, I determined which states were most likely to be the “tipping point” states—the state that gets him to the necessary 270 Electoral votes (EV) when states are ranked from most to least Democratic.

There are 13 states, including Delaware and the District of Columbia, where Biden is at least a 99.7% favorite, and they total 175 EV. The 54 candidates from these states were assigned -10 points.

In three states—Maine, New Mexico and Oregon—Biden is a 97.6-97.7% favorite; these states total 16 EV, although Maine assigns one EV to each of its two Congressional districts (CD). Thus, while the four candidates from New Mexico and Oregon are assigned -5 points, the two from Maine are assigned 0, because the 2nd CD could be pivotal. This gets us to 191 EV.

In four states—Colorado, Michigan, Minnesota and Virginia—Biden is a 93.6-95.8% favorite; these states total 48 EV, for an overall total of 239. These states could possibly be the tipping point states, though that currently seems very unlikely. Thus, the 14 candidates from these states are assigned 0 points.

In three states—Nevada, New Hampshire and Pennsylvania—Biden is an 86.9-88.3% favorite; these states total 30 EV, for an overall total of 269. These are the first states that could reasonably be called tipping point states, thus the 11 candidates from these states are assigned 2 points.

The bottom line is that, RIGHT NOW, Biden is at least a 5-1 favorite in enough states to earn him 268 or 269 EV, depending on one CD in Maine.

The next most likely states for Biden are Wisconsin (78.8%, 10 EV) and Florida (71.1%, 29 EV), either of which would theoretically secure victory over President Donald J. Trump. Baldwin and Representative Gwen Moore could guarantee a victory in Wisconsin. Six House Members, including Demings, could guarantee a victory in Florida. These eight candidates each earn 10 points.

There are three states totaling 44 EV, meanwhile, where Biden is roughly a 2-1 favorite (64.8-66.0%): Arizona, North Carolina and Ohio. As they are marginally less likely tipping point states, the nine women from these states each earn 5 points.

Georgia’s 16 EV are close to a toss-up right now (42.7%), but it is even less likely to be a tipping point state. Still, Abrams and McBath each earn 3 points.

The next likeliest states for Biden to win are Iowa (24.8%; 6 EV) and Texas (17.8%; 38 EV). Representative Cindy Axne of Iowa gets 1 point, as do the six female House Members from Texas.

Finally, I gave McCaskill of Missouri (2.3%) -2 points and eight women from the 0-0-0.9% states of Alabama, Kansas, Louisiana, North Dakota, Oklahoma and West Virginia -3 points.

In other words, I deducted the most points for candidates hailing from states in which Biden is a near-certain winner and fewer points for hailing from reliably Republican states, while adding the most points for the likeliest tipping point states.

I then added one point for every state west of the Mississippi River, as no Democratic presidential or vice-presidential nominee has come from there, and I subtracted one point for being from the regionally-redundant states of New York, New Jersey, Pennsylvania and Maryland.

Foreign policy/national security. I deducted 2.5 points from the 17 candidates who never served in the Senate, House or in a Cabinet-level foreign policy/national security role.

Ideological balance. I assume Biden is in the ideological center of the Democratic Party.

For each member of the House and Senate since January 2017 FiveThirtyEight.com calculates how often that member has voted with President Trump when he has taken a clear public position. House Members vote far less often with Trump (average=13.3%) than Senators (30.2%). Female Senators with the lowest Trump scores are Gillibrand (12.4%), Warren (13.9%) and Harris (16.5%), while Arizona Senator Kyrsten Sinema and former North Dakota Senator Heidi Heitkamp each voted with Trump just over half the time.

Using this score as a proxy for ideology—with the added bonus of specifically reflecting opposition to Trump, I calculated how many SD above or below the mean each candidate is relative to their house of Congress; for the 20 women with no Trump scores, I estimated a score based upon age and state. I then assigned points as follows:

  • ≤-1.25 = 10
  • -1.00 to 1.24 = 7.5
  • -0.75 to -0.99 = 5
  • -0.50 to -0.74 = 2.5
  • -0.25 to -0.49 = 0
  • -0.01 to -0.24 = -1
  • 00 to 0.24 = -2
  • 25 to 0.49 = -3
  • 50 to 0.75 = -4
  • 75 to 0.99 = -5
  • 00 to 1.99 = -7.5
  • ≥2.00 = -10

Ultimately, I deducted more points for being (relatively) well to the right of Biden than for being ideologically similar, as the former would actually harm Biden’s chances to win over the party’s progressive base, while the latter is effectively “do no harm.”

Loss of Senate seats. The Democrats are currently at a 53-47 disadvantage in the Senate, though they have a solid chance of recapturing it in November. But this means that every Democratic Senate seat is vitally important.

I thus deducted 10 points from Senators Warren, Shaheen, Sinema and Maggie Hassan of New Hampshire because a Republican governor would appoint a replacement for each of them. I also deducted 2.5 points for Senators Baldwin, Cortez Masto and Jacky Rosen of Nevada because, while their home state governors are Democrats, there is a non-trivial chance Democrats could lose a special election in Wisconsin or Nevada. Finally, I deducted 10 points for the two female Democratic Senators facing reelection this year: Shaheen and Tina Smith of Minnesota; both are heavily favored to win reelection, keeping those seats in Democratic hands.

Other considerations. Running for the 2020 Democratic presidential nomination both exposed candidates to extreme public scrutiny and served as a rough test run for campaigning for vice president; I thus added 5 points to Gabbard, Gillibrand, Harris, Klobuchar, Warren and Williamson. Six former Cabinet officials (Burwell, Jackson, Lynch, former EPA Director Gina McCarthy, former Secretary of Commerce Penny Pritzker and Rice) have never run for any political office, let alone the vice presidency, so they each lost 10 points. And, given how hard Democrats worked to recapture the House in 2018, I deducted 5 points from each of the 28 female first-term Representatives.

TOTAL. This measure ranges from -21 for Rice, a Marylander who has never run for political office, to 16 for Arizona Representative Ann Kirkpatrick, whose Trump Score of 3.1% is the lowest of any woman in Congress. The top 10 candidates in this category are listed in Table 3:

Table 3: Top 10 2020 Democratic Vice-Presidential candidates by Electoral Strengths and Weaknesses

Name Strengths Weaknesses TOTAL
Arizona Representative Ann Kirkpatrick Tipping point state; low Trump Score None 16
Wisconsin Representative Gwen Moore Tipping point state None 12.5
Florida Representative Donna Shalala Tipping point state; low Trump Score First term 12.5
Former Florida Representative Corinne Brown Tipping point state None 12
Florida Representative Frederica Wilson Tipping point state First term 10
Florida Representative Debbie Wasserman-Schultz Tipping point state Ideologically similar to Biden 9
Florida Representative Val Demings Tipping point state Ideologically similar to Biden 8
Florida Representative Kathy Castor Tipping point state Ideologically similar to Biden 8
Florida Representative Lois Frankel Tipping point state Ideologically similar to Biden 8
Wisconsin Senator Tammy Baldwin Tipping point state Possible loss of Senate seat 7.5

The average of this sum is -4.5, (SD=7.6); the median is -6 (Representatives Suzanne Bonamici of Oregon, Clarke, Nydia Velasquez of California, Bonnie Watson Coleman of New Jersey).

**********

That these three sums measure somewhat distinct criteria can be seen in their Pearson correlations:

  • Demographic balance / Governmental experience                         -0.31
  • Governmental experience/ Electoral strengths and weaknesses -0.08
  • Demographic balance / Electoral strengths and weaknesses         0.13

It is thus not surprising that only six women—Senators Cortez Masto and Harris, and Representatives Marcia Fudge of Ohio, Barbara Lee of California, Sheila Jackson Lee of Texas and Moore—have above average scores in all three categories. Harris, in fact, comes closest to being at least 1 SD above the mean in all three categories, being +1.74 SD on Demographic Balance, +0.94 SD on Governmental Experience and +1.0 SD on Electoral Strengths and Weaknesses.

Indeed, when you convert each category sum to a z-score—number of SD above or below the mean—then sum them into an Initial Score, Harris ranks second, at 3.72, behind Moore at 3.95, with Brown (3.53), Gillibrand (3.52) and Klobuchar (3.47) rounding out the top five. Based upon the correlation of this initial sum with the three categories, it is slightly more associated with Electoral Strengths and Weaknesses (r=0.66) than with Demographic Balance (0.53) or Governmental Experience (0.40).

However, I adjusted these scores one final time, by adding up to 1 point or subtracting up to 10 points (Brown, for her 2017 conviction for fraud). Thus, I added 1 point to Warren, and 0.5 points each to Harris and Klobuchar, for Morning Consult poll performance. Similarly, Cortez Masto, Baldwin, Demings, Lujan Grisham and Whitmer each lose 0.5 points for their Morning Consult poll performance. That said, I added back 0.5 points to Demings for her service as Orlando Chief of Police because a woman of color serving in law enforcement could play well in the current climate. Speaking of criminality, I deducted 3 points from Moore for a tire-slashing incident involving her son and 2 points from Fudge for remarks she made about a serious domestic violence incident.

Other large deductions were:

It is not clear how the impeachment of President Trump will play in the election, but on the theory it is slightly more likely to rile Trump voters than inspire Biden voters, I deducted 0.5 points from Demings, as well as Texas Representative Sylvia Garza and California Zoe Lofgren, who served as House Managers during the Senate trial.

Other deductions include 1 point each from Gillibrand for a seeming inauthenticity in her ideology, from California Representative Norma Torres for controversial remarks on the House floor, from Lynch for her questionable tarmac meeting with former President Bill Clinton, from Gabbard for being generally disliked within the Democratic Party, from New York Representative Kathleen Rice for being a former Republican and from Whitmer for an ill-timed “joke” her husband made.

The Final Score is correlated 0.69 with the Initial Score, with an average of -0.40 (SD=1.63); the median is -0.453 (Michigan Representative Debbie Dingell and Parker). Only 24 of the 121 potential 2020 Democratic vice-presidential candidates had Final Scores of 1.00 of higher, as Table 4 shows.

Table 4: Top 2020 Democratic Vice-Presidential candidates by Final Score

Name Strengths Weaknesses TOTAL
California Senator Kamala Harris Black/Asian; 55;

Ran for president;

To left of Biden;

Popular with base

California;

Only 4 years in Senate

4.18
Wisconsin Senator Tammy Baldwin Wisconsin;

58;

8 years in Senate/14 years in House

White;

Lesbian; Possible loss of Senate seat; Ideologically similar to Biden

2.56
New York Senator Kirsten Gillibrand 53;

12 years in Senate;

Well to left of Biden;

Ran for president

White;

New York; Disappointing presidential run; Suspected inauthenticity

2.55
Florida Representative Frederica Wilson Black;

Florida;

10 years in House

78 2.52
Michigan Senator Debbie Stabenow Michigan;

20 years in Senate

White;

70;

Slightly to right of Biden

2.40
Ohio Representative Joyce Beatty Black;

Ohio

70;

Ideologically similar to Biden

2.22
Florida Representative Val Demings Black;

62;

Florida; Orlando Chief of Police

OnIy four years in House; Ideologically similar to Biden 2.07
North Carolina Representative Alma Adams Black;

North Carolina

73;

Ideologically similar to Biden

1.92
New York Representative Yvette Clark Black;

55;

12 years in House;

Left of Biden

New York 1.77
Former Georgia State House Minority Leader Stacey Abrams Black;

46;

Georgia; Progressive reputation

No foreign policy or national security experience;

No office higher than state House

1.77
Massachusetts Senator Elizabeth Warren Strong progressive; Ran for president; Very popular with party base;

8 years in Senate

White;

70;

Loss of Senate seat; Massachusetts

1.75
Washington Senator Maria Cantwell 61;

20 years in Senate

White; Washington; Similar to Biden ideologically 1.73
Florida Representative Kathy Castor  Florida;

53;

14 years in House

White;

Slightly to right of Biden

1.69
Atlanta Mayor Keisha Lance Bottoms Black;

Georgia;

50

No foreign policy or national security experience;

Never run statewise

1.63
Georgia Representative Lucy McBath  Black;

59;

Georgia;

Left of Biden

First-term House Member 1.57
Texas Representative Veronica Escobar Latina;

50;

Texas;

Left of Biden

First-term House Member 1.56
California Representative Barbara Lee Black;

21 years in House;

Left of Biden

California;

73

1.54
Washington Senator Patty Murray 28 years in Senate White; Washington;

69

1.53
Michigan Representative Brenda Lawrence Black; Michigan;

66

Ideologically similar to Biden 1.49
New York Representative Nydia Velasquez Latina;

28 years in House

New York;

67

1.40
Alabama Representative Terri Sewell Black;

54;

10 years in House

Alabama; Slightly right of Biden 1.38
California Representative Linda Sanchez Latina;

51;

14 years in House

California 1.23
Former Secretary of Labor Hilda Solis Latina;

62

California;

Out of federal office since 2013

1.13
California Representative Karen Bass Black;

66

California 1.02

This list includes 14 current House Members, seven Senators, a current mayor, a former Cabinet Secretary (Solis) and Abrams. Thirteen are Black, seven are White and four are Latina. Five are from California; three are from Florida, Georgia and New York; and two are from Michigan and Washington. Fifteen are between the ages of 46 and 66, while three are older than 70. Only seven are ideologically to the left of Biden, though only three are (slightly) to the right of Biden.

If you eliminate the three House Members over 70, the two first-term House Members, the two white women slightly to the right of Biden, as well as 66-year-old Karen Bass of California, 67-year-old Nydia Velasquez of New York and 69-year-old Patty Murray of Washington, you are left with 15luja solid candidates:

14. Former Labor Secretary Hilda Solis

13. California Representative Linda Sanchez

12. Alabama Representative Terri Sewell

11. Michigan Representative Brenda Lawrence

         10. Atlanta Mayor Keisha Lance Bottoms

9. Washington Senator Maria Cantwell

8. Massachusetts Senator Elizabeth Warren

7. Former Georgia State House Minority Leader Stacey Abrams

6.  New York Representative Yvette Clark

5. Florida Representative Val Demings

4. Ohio Representative Joyce Beatty

3. New York Senator Kirsten Gillibrand

2. Wisconsin Senator Tammy Baldwin

1. California Senator Kamala Harris

Really, however, one choice jumps out from all the rest: Harris, the 55-year-old, Black/Asian progressive-voting Senator who ran a solid race for president, is broadly popular with the Democratic Party and has a wealth of criminal justice experience. Were she not from reliably-Democratic California—which, at the same time, would not cost Democrats a Senate seat—and had at least one full Senate term under her belt, she would be THE obvious choice.

That said, there are a number of excellent choices Biden could make, including familiar names like Warren, Abrams, Demings, Gillibrand and Baldwin, as well as sleeper choices like brilliant, black, 55-year-old, five-term Representative Terri Sewell of Alabama.

Meanwhile, consider who did not make this final cut—Klobuchar (0.97), Cortez Masto (0.50), Lujan Grisham (-0.09) and Whitmer (-1.80). It is unlikely any of these four women makes Biden’s short list; although reports suggest Lujan Grisham remains a leading contender, along with Atlanta Mayor Keisha Lance Bottoms, Demings, Harris, Former National Security Advisor Susan Rice and Warren.

Please feel free to quibble with my categories and/or assignation of points; I admit up front that much of the latter was arbitrary. With all that, however, Harris still comes out the best choice, by far, whatever way you choose to quantify and aggregate strengths and weaknesses.

Until next time…please stay safe and healthy…

Biden vs. Trump: The view from six months out

A note to readers: I have temporarily stopped writing “dispatches” about how my wife Nell, our two daughters and I cope with social distancing and the closure of Massachusetts schools through the end of the 2019-20 school year because they started to feel repetitive. When and if that changes, I will resume dispatching.

**********

As I write this, it is exactly six months until the 2020 United States (U.S.) presidential election, which will conclude on November 3, 2020. On April 8, 2020, U.S. Senator from Vermont Bernie Sanders announced he was suspending his campaign for the 2020 Democratic presidential nomination, making former Vice President Joseph R. Biden, Jr. the presumptive nominee against incumbent Republican president Donald J. Trump.

Using all publicly-available polls of the presidential election—both nationally and at the state level, recognizing presidential elections are determined by the Electoral College—conducted since January 1, 2019, I have been tracking the relative performance of contenders for the 2020 Democratic nomination against Trump. When given the choice, I used polls of likely voters over those of registered voters, and the latter over polls of adults only; I also used polls including such possible third-party candidates as former Starbucks CEO Howard Schultz and U.S. House of Representatives Member Justin Amash of Michigan. Table 1 lists the number of national polls conducted each month for both candidates based upon the midpoint of the poll’s field dates; some polls were actually conducted in two months.

Table 1: Number of National Polls Assessing Hypothetical 2020 Match-ups Between Biden/Sanders and Trump by Month

Month Biden Sanders
January 2019 1 1
February 2019 4 3
March 2019 7 6
April 2019 6 6
May2019 7 5
June 2019 10 9
July 2019 8 7
August 2019 8 8
September 2019 15 11
October 2019 18 13
November 2019 8 4
December 2019 14 9
January 2020 20 17
February 2020 23 21
March 2020 33 23
April 2020 41 3
TOTAL 223 146

Just seven of 41 total pollsters (average grade: B-/B) account for 54% of Biden versus Trump polls; the values are similar for Sanders:

  • IBD/TIPP (A/B), 10 polls
  • Fox News (A-), 13 polls
  • Harris X (C+), 13 polls
  • Emerson College (B+), 18 polls
  • Ipsos (B-), 18 polls
  • Morning Consult (B/C), 22 polls
  • YouGov (B-), 36 polls

Figure 1, meanwhile, shows how Biden and Sanders fared monthly against the president, using my weighted-adjusted polling averages, or WAPA. Basically, I use data published by FiveThirtyEight.com to adjust each poll for partisan lean (tendency of a pollster to err more Democratic or Republican than other pollsters in analogous races) and overall quality (using the letter grade assigned by FiveThirtyEight.com). I also weight more recent polls—again using field midpoint—higher, using the ratio of the number of days since January 1, 2019 and the total number of days between January 1, 2019 and November 3, 2020. Finally, I average two different versions of WAPA: one treating polls by the same pollsters as statistically independent values, and one which treats all polls by the same pollster as a single value; differences between estimates are generally negligible.

Figure 1: Monthly weighted-adjusted average margins for Biden and Sanders versus Trump since January 2019Biden and Sanders v Trump since Jan 2019

Only one national poll assessing hypothetical matchups between Biden or Sanders and Trump was conducted in January 2019, so I combined them with the four and three, respectively, from February 2019 to generate Figure 1. Biden and Sanders have consistently led Trump in head-to-head matchups, never dropping below Sanders’ 2.0 percentage point (“points”) lead in December 2019. Through September 2019, Biden’s margin was typically three-to-four points higher, though Sanders still led Trump by 4.3 points on average, versus 7.8 points for Biden. From October 2019 through February 2020, though, the two men fared equally well versus Trump, with Biden ahead an average 5.4 points and Sanders ahead 4.9 points. Once Biden’s nomination began to become clear in March 2020, however, Biden again began to fare better versus Trump than Sanders, averaging a 5.7-point-lead to Sanders’ 3.4-point lead. Overall, Biden has a 6.1-point lead over Trump, not meaningfully different than his lead over the last two months; Sanders exited the race with an overall national lead of 4.3 points versus Trump, though that lead had begun to drop slightly over the last two months.

**********

Again, however, presidential elections are actually fought across all 50 states and the District of Columbia (“DC”), with the plurality winner in each state/DC winning every electoral vote (“EV”) from that state.

To that end, Table 2 lists the number of polls conducted within each state since January 1, 2019 of hypothetical matchups between Biden/Sanders and Trump, plus that state’s 3W-RDM, an estimate of much more or less Democratic than the nation a state tends to vote; 11 states[1] and DC have not yet been polled. 

Table 2: Number of state-level polls assessing hypothetical 2020 matchups between Biden/Sanders and Trump since January 1, 2019

State 3W-RDM Biden Sanders
Michigan 2.2 33 23
Wisconsin 0.7 30 26
Texas -15.3 27 21
North Carolina -6.0 23 16
Pennsylvania -0.4 23 17
Florida -3.4 19 11
Arizona -9.7 17 14
California 23.2 14 13
New Hampshire 0.1 10 10
Iowa -4.7 9 8
Georgia -9.6 8 6
Ohio -5.8 7 6
Virginia 1.5 7 6
Nevada 2.0 6 6
Utah -33.1 5 3
South Carolina -15.7 4 4
Maine 5.9 4 3
North Dakota -29.4 4 2
Washington 12.1 4 3
Missouri -15.9 4 3
Connecticut 12.8 4 4
New York 21.6 3 1
Colorado 2.2 3 2
Kentucky -28.7 2 1
Montana -18.6 2 2
New Mexico 6.5 2 1
Alabama -28.4 2 2
Kansas -23.4 2 2
Oklahoma -38.1 2 2
New Jersey 12.0 2 1
Mississippi -18.5 2 1
Minnesota 1.5 1 1
Massachusetts 22.1 1 1
Alaska -19.2 1 1
West Virginia -35.5 1 1
Delaware 12.5 1 1
Tennessee -25.8 1 1
Maryland 22.6 1 1
Indiana -16.3 1 0
TOTAL D-6.2 292 227

It is not surprising that eight of the 14 most-polled states thus far are “swing” states, those with 3W-RDM between -5.0 and +5.0, including the four closest states won by Trump in  2016: Florida (19 Biden, 11 Sanders), Pennsylvania (23, 17), Wisconsin (30,26) and Michigan (33,23). In fact, the Pearson correlation between the absolute value of a state’s 3W-RDM and the number of times it has been polled for the 2020 presidential election is -0.47 for Biden and -0.48 for Sanders, meaning the closer a state is to the national average (i.e., a pure toss-up in a dead-even national race), the more often it has been polled. Also highly-polled are large states like California and Texas, red-drifting states like Ohio and Iowa, and emerging Democratic opportunities like Arizona, Georgia and North Carolina. 

While U.S. presidential elections are decided on a state-by-state basis, though, national averages are still important. Combined with 3W-RDM, they provide the “expected Democratic-minus-Republican margin” in each state in 2020, all else being equal. Comparing polling averages to this expected value tells us where Biden may currently be under- or over-performing, or which states have drifted Democratic or Republican since 2016.

For example, Biden leads Trump overall by 6.1 points. North Carolina has recently been about 6.0 points less Democratic than the nation as a whole. Adding those two values together (6.1 – 6.0 = +0.1) yields an expected photo-finish in North Carolina in 2020. However, Biden leads Trump by a mean 2.2 points in 23 polls thus far in North Carolina, meaning Biden is “outperforming” expectations there by about 2.1 points.

This could mean any or all of three things:

  1. WAPA is the more accurate reflection of the November election and either
    1. North Carolina has drifted about two points toward the Democrats since 2016, or
    2. The true “expected value” is somewhere between Trump winning by 5.3 points and Biden winning by 5.5 points, based upon an average 3W-RDM error margin of 5.4 points in recent elections.
  2. The “expected” value is the more accurate reflection, and Republican-leaning voters will drift back toward Trump over the next six months, making North Carolina nail-bitingly close on election day.

Table 3 lists every state’s expected value and WAPA; for ease of presentation, I include Biden-Trump values only.

Table 3: Expected and actual polling margins for Biden over Trump in each state in November 2020

State 3W-RDM Expected WAPA WAPA-Expected
DC 82.0 88.2    
Hawaii 34.3 40.4    
Vermont 27.7 33.8    
California 23.2 29.3 27.1 -2.2
Maryland 22.6 28.7 25.0 -3.7
Massachusetts 22.1 28.2 38.0 9.8
New York 21.6 27.7 27.9 0.2
Rhode Island 18.0 24.1    
Illinois 14.7 20.8    
Connecticut 12.8 18.9 16.9 -2.0
Delaware 12.5 18.6 16.4 -2.2
Washington 12.1 18.2 19.8 1.6
New Jersey 12.0 18.1 16.1 -2.0
Oregon 8.7 14.8    
New Mexico 6.5 12.6 10.4 -2.2
Maine 5.9 12.0 9.2 -2.8
Michigan 2.2 8.4 5.9 -2.5
Colorado 2.2 8.3 6.9 -1.4
Nevada 2.0 8.1 3.5 -4.6
Minnesota 1.5 7.6 12.7 5.1
Virginia 1.5 7.6 7.8 0.2
Wisconsin 0.7 6.8 1.7 -5.1
New Hampshire 0.1 6.2 4.5 -1.7
Pennsylvania -0.4 5.7 4.2 -1.5
Florida -3.4 2.7 1.9 -0.9
Iowa -4.7 1.4 -3.5 -4.9
Ohio -5.8 0.3 3.0 2.7
North Carolina -6.0 0.1 2.2 2.1
Georgia -9.6 -3.4 -0.3 3.2
Arizona -9.7 -3.6 2.0 5.6
Texas -15.3 -9.1 -2.0 7.2
South Carolina -15.7 -9.6 -9.6 0.0
Missouri -15.9 -9.8 -8.6 1.3
Indiana -16.3 -10.2 -14.1 -3.9
Mississippi -18.5 -12.4 -12.9 -0.5
Montana -18.6 -12.5 -16.0 -3.5
Alaska -19.2 -13.0 -4.2 8.8
Louisiana -22.2 -16.1    
Kansas -23.4 -17.3 -11.2 6.1
Nebraska -25.8 -19.7    
South Dakota -25.8 -19.7    
Tennessee -25.8 -19.7 -15.3 4.4
Arkansas -28.2 -22.1    
Alabama -28.4 -22.3 -19.6 2.7
Kentucky -28.7 -22.6 -15.9 6.7
North Dakota -29.4 -23.3 -20.6 2.7
Utah -33.1 -27.0 -12.3 14.7
Idaho -34.2 -28.1    
West Virginia -35.5 -29.3 -34.0 -4.7
Oklahoma -38.1 -32.0 -26.1 5.9
Wyoming -45.7 -39.6    
Average D-6.4 Trump+0.05* Biden+0.9 +1.0

        * Only for the 39 states with both measures

The correlation between the expected margin and WAPA is a very-reassuring +0.96, meaning the polling is broadly in line with the underlying “fundamentals” of the election. Still, Biden is polling ahead of those fundamentals by an average of about one percentage point, meaning the state-level polling as a whole is even better for Biden than his already-solid national polling.

Nonetheless, there are clearly states where Biden is underperforming expectations, including the vital and heavily-polled state of Wisconsin. While Biden leads there by about 1.7 points overall, he “should” be ahead there by about 6.8 points. Moreover, he is trailing by about 3.5 points in nearby Iowa, even though Biden “should” be ahead by about 1.4 points. And while Biden leads Trump by about 3.5 points in Nevada, that is 4.6 points below what the fundamentals suggest.

The story is similar, but more narrowly so, in the key states of Michigan, Pennsylvania, New Hampshire and Florida: Biden leads Trump in these states by an average of 4.1 points, though he “should” lead by an average of 5.8 points, a mean “underperformance” of 1.7 points.

Moreover, there appears to be something of a partisan split in Biden’s over-and under-performance: in the 10 states with both measures and 3W-RDM≥5.0, Biden is underperforming by 0.3 points, on average, though once you remove the single poll of Massachusetts, that jumps to -1.6 points. At the same time, in the analogous 20 Republican states with 3W-RDM≤5.0, Biden is overperforming by 3.2 points, though that drops to 2.6 with the massive outlier of Utah removed.

Let me again stress, however, that there is a lot of “wobble” in the “expected margins,” as well as in the polling averages—especially given that most states have seen very little recent polling. All of this “over- and underperforming” may simply be statistical noise, as we try to read too much into highly stochastic data.

Still, the two values are sufficiently closely aligned to combine them into a single, six-months-out estimate of Biden’s margin over Trump on November 3, 2020, based upon the assumption polls become more predictive as an election gets closer:

  1. Arbitrarily assign expected value and WAPA equal weight as of January 1, 2020.
  2. If the most recent poll in a state was conducted more than 100 days prior to January 1, 2020, WAPA is weighted just 10%. This only applies to Massachusetts, Alaska and Kentucky, with Minnesota the only other state whose most recent poll was conducted in 2019.
  3. WAPA weight increases, by day, with proximity to November 3, 2020.

At the same time, I introduced a probabilistic element into these estimates—rough calculations of how likely Biden is to win the EV from each state, assuming such likelihood is distributed normally:

  1. For expected margins, I used a mean of estimate-0.8 and a standard error of 7.1[2]
  2. For WAPA, I used a standard error of 3.0, roughly the margin of error in most quality polls.
  3. Overall probability Biden wins a state’s EV calculated the same as for predicted final margin

While the means and standard errors are somewhat arbitrary, albeit broadly defensible, the final EV probabilities shown in Table 4 are in line with what other forecasters are saying.

Table 4: Estimated final state margins and probability of winning EV, Biden vs. Trump, November 2020

State EV P(EV): Expected P(EV):

WAPA

P(EV):

Overall

Predicted Margin
DC 3 100.0%   100.0% 88.2
Hawaii 4 100.0%   100.0% 40.4
Vermont 3 100.0%   100.0% 33.8
California 55 100.0% 100.0% 100.0% 27.9
Maryland 10 100.0% 100.0% 100.0% 26.6
Massachusetts 11 100.0% 100.0% 100.0% 29.2
New York 29 100.0% 100.0% 100.0% 27.8
Rhode Island 4 99.9%   99.9% 24.1
Illinois 20 99.8%   99.8% 20.8
Connecticut 7 99.5% 100.0% 99.8% 17.9
Delaware 3 99.4% 100.0% 99.7% 17.5
Washington 12 99.3% 100.0% 99.8% 19.0
New Jersey 14 99.2% 100.0% 99.7% 17.1
Oregon 7 97.6%   97.6% 14.8
New Mexico 5 95.2% 100.0% 97.6% 11.5
Maine 4 94.3% 99.9% 97.7% 10.3
Michigan 16 85.6% 97.5% 93.9% 6.6
Colorado 9 85.5% 99.0% 93.3% 7.5
Nevada 6 84.8% 88.0% 86.7% 5.4
Minnesota 10 83.1% 100.0% 89.4% 9.5
Virginia 13 83.0% 99.5% 93.7% 7.7
Wisconsin 10 80.2% 71.5% 74.3% 3.3
New Hampshire 4 77.7% 93.2% 88.4% 5.0
Pennsylvania 20 75.6% 92.0% 86.9% 4.7
Florida 29 60.7% 73.5% 69.4% 2.2
Iowa 6 53.3% 12.0% 28.4% -1.6
Ohio 18 47.1% 84.1% 72.5% 2.1
North Carolina 15 46.1% 76.5% 67.2% 1.5
Georgia 16 27.5% 46.3% 40.5% -1.3
Arizona 11 26.8% 75.1% 58.7% 0.1
Texas 38 8.1% 25.5% 20.1% -4.2
South Carolina 9 7.2% 0.1% 3.0% -9.6
Missouri 10 6.7% 0.2% 2.9% -9.1
Indiana 11 6.1% 0.0% 2.0% -12.8
Mississippi 6 3.2% 0.0% 1.3% -12.7
Montana 3 3.1% 0.0% 1.3% -14.5
Alaska 3 2.6% 8.1% 3.1% -12.2
Louisiana 8 0.9%   0.9% -16.1
Kansas 6 0.5% 0.0% 0.2% -14.3
Nebraska 5 0.2%   0.2% -19.7
South Dakota 3 0.2%   0.2% -19.7
Tennessee 11 0.2% 0.0% 0.1% -17.5
Arkansas 6 0.1%   0.1% -22.1
Alabama 9 0.1% 0.0% 0.0% -20.9
Kentucky 8 0.0% 0.0% 0.0% -21.9
North Dakota 3 0.0% 0.0% 0.0% -21.6
Utah 6 0.0% 0.0% 0.0% -17.1
Idaho 4 0.0%   0.0% -28.1
West Virginia 5 0.0% 0.0% 0.0% -31.7
Oklahoma 7 0.0% 0.0% 0.0% -29.0
Wyoming 3 0.0% -39.6 0.0% -39.6

Six months before election day 2020, and with all of the caveats about what voting will even look like during a pandemic, Biden is clearly in a commanding position to be elected the 46th president of the United States.

  • He is projected to win by at least 3.3 points in enough states to get him to 279 EV, or 278 depending on what happens in Maine, which, along with Nebraska, allocates two EV to the statewide winner and one each to the winner of its Congressional districts.
    • He has narrower leads in Florida, Ohio and North Carolina, which combine for 62 EV, increasing his total to 340 or 341.
    • Arizona’s 11 EV are balanced on a knife’s edge.
  • He is favored at least 86% in enough states to get him to 268 or 269 EV
    • He would then need to win ONLY ONE of Wisconsin (74.3%), Ohio (72.5%), Florida (69.4%) or North Carolina (67.2%) to win the presidency. Assuming Biden’s chances of winning each state are statistically independent from each other (a lousy assumption), he has about a 99% chance of winning AT LEAST one of these states.
  • He has at least a 58% chance in enough states to earn him 351 or 352 EV, at least 81 more than required.
  • And if things truly break Biden’s way, he has a 40.5% chance to win the 16 EV in Georgia, a 28.4% to win the 6 EV in Iowa, and a 20.1% chance to win the 38 EV of Texas, upping his total to 411-413 EV, depending on what happens in the 2nd Congressional district of Nebraska, which allocates its EV the same as Maine.

Using the simplistic—perhaps even simple-minded—method of multiplying Biden’s probability of winning each state by its EV and summing yields a “projected” EV total of 335.2, fairly close to the 341 generated by taking the 232 EV won by Hillary Clinton in 2016, adding Michigan and Pennsylvania to get to 268, then adding Wisconsin, Florida, Ohio and North Carolina (and the last EV in Maine).

This lead looks even more robust when you make either of two reasonable assumptions:

All polls are overestimating Biden’s margins by 3.0 points.

In this scenario, Biden’s projected EV drops to 286, still 16 more than required. He would be favored at least 80% to win in enough states to win 239 EV, though he would be favored by at least 64% in three states totaling 30 EV, putting him on the doorstep. He would then have to win one of Wisconsin or Ohio, at 44% each; he would have about a 69% chance to do so.

The point is, even if the polls are consistently off by this much, Biden would still be roughly even money to win the presidency. That said, Biden would still be winning by 3.1 points nationally, demonstrating the current Republican bias in the Electoral College.

All polls are underestimating Biden’s margins by 3.0 points.

In this scenario, Biden’s projected EV are a landslide-level 373.7, more than 100 more than necessary. He would be favored at least 80% to win enough states to earn 341 EV, while being a 77.3% favorite in Arizona and a 69.8% favorite in Georgia, for a total of 368 EV. Adding in the states where Biden would be roughly even money—Iowa and Texas—gets us once again to 412.

This appears to be Biden’s upper limit, as even in this scenario where he is wining nationally by 9.1 points, he is no more than 9% favored to win any additional states.

Now, none of this is to say Biden is guaranteed to be the next president of the United States; it would be monumentally foolish for me to conclude that this far from the election, particularly if Amash earns more than, say, three points in the national popular vote. I am simply noting that all indications point very strongly in that direction, based on the data we have right now.

Until next time…please stay safe and healthy…

[1] Hawaii, Vermont, Rhode Island, Illinois, Oregon, Louisiana, Nebraska, South Dakota, Arkansas, Idaho, Wyoming

[2] The former value is the mean arithmetic difference between “expected” and actual D-R margins across 153 state-level contests in 2008, 2012 and 2016, while the latter value is the standard deviation of these values. I recognize this is not a standard error. However, using the value 13.6—the range of values covering 95% of all values divided by 1.96, the final EV projection changes by only 1.0

Updating the Doctors: 13 is not a lucky number for Jodie Whittaker

One of the first data-driven essays to appear on this website was a three-part assessment of every episode of Doctor Who following its revival in March 2005. You may find those three essays—as well as a, frankly, much better written July 2018 update—here; you will also find a much longer essay I wrote demonstrating the influence of classic film noir on the revised series. 

On December 25, 2017, Jodie Whittaker debuted as the 13th incarnation of the multi-thousand-year-old Doctor. Since then, Whittaker has portrayed the Gallifreyan Time Lord in 21 additional episodes, with the most recent airing on March 1, 2020.

With two seasons of Whittaker’s portrayal of the Doctor behind us, here is an updated assessment of the 165 total episodes of the revived Doctor Who.

**********

Just as I collected ratings data to rank every Charlie Chan film, every film in the Marvel Cinematic Universe and my own guilty pleasures, I collected ratings data to assess the relative popularity of the 165 episodes of the resurrected Doctor Who[1], from “Rose” (March 26, 2005) through “The Timeless Children” (March 1, 2020). Excluding John Hurt’s  irascible War Doctor, there have been five incarnations of The Doctor during this time period: 9 through 13. These 165 episodes comprise 12 Series of between 10 and 13 episodes plus 13 Christmas specials and four stand-alone specials, three featuring the 10th Doctor (David Tennant) as well as the November 2013 50th anniversary epic, in which Doctors 10 and 11 (Matt Smith) teamed with the War Doctor to save Gallifrey, The Doctor’s home planet.

For each episode, I collected four values:[2]

  1. Its BBC “Audience Appreciation Index” (AI) Score, an integer from 0-100 revealing how much the British audience enjoyed each episode when it first aired. Higher scores indicate greater enjoyment.
  2. Where the episode ranked that week in Great Britain (Chart), with a lower score indicating more viewers.
  3. Its weighted-average Internet Movie Database (IMDB) score on a 0-10 scale, with 10 being the most favorable, and…
  4. The number of IMDB “raters” whose scores were averaged. The higher the number of raters, in principle, the more “compelling” the episode—though higher ratings could also simply reflect a longer rating time frame or a trollish desire to “trash” an episode.

Analyzing these data will reveal:

  • How popular individual episodes are now,
  • How an episode’s current popularity compares to how popular each episode weas when it first aired,
  • The comparative popularity of individual Series, and
  • The comparative popularity of Doctors 9-13

I decided mostly to set aside “Chart” values as they are difficult to compare over time.

Table 1 provides details on each Series. It excludes the 13 Christmas specials from 2005 through 2017, two 2009 10th Doctor specials (“Planet of the Dead,” “The Waters of Mars”) and “The Day of the Doctor.” However, given its chronological and story-arc proximity to the prior 10 episodes, I chose to include the 2019 New Year’s Day special “Resolution” as the 11th and final episode of Series 11.

Table 1: Doctor Who Series (2005-20)

# Dates # Episodes Doctor Primary Companion(s)
1 March 26-June 18, 2005 13 9 Rose Tyler
2 April 15-July 8, 2006 13 10 Rose Tyler
3 March 31-June 30, 2007 13 10 Martha Jones
4 April 5-July 5, 2008 13 10 Donna Noble
5 April 10-June 26, 2010 13 11 Amy Pond/Rory Williams
6 April 23-June 4, 2011;

August 27-October 1, 2011

7

6

11 Amy Pond/Rory Williams
7a September 1-29, 2012 5 11 Amy Pond/Rory Williams
7b March 30-May 18, 2013 8 11 Clara Oswald
8 August 23-November 8, 2014 12 12 Clara Oswald
9 September 19-December 5, 2015 12 12 Clara Oswald
10 April 15-July 1, 2017 12 12 Bill Potts
11 October 7, 2018-January 1, 2019 11 13 Yasmin Khan/Graham O’Brien/Ryan Sinclair
12 January 1-March 1, 2020 10 13 Yasmin Khan/Graham O’Brien/Ryan Sinclair

Individual episodes. Overall, the resurrected series has been very well-received with a “global” IMDB rating of 8.6 (192,481 unique raters). Upon first airing, average AI score was a remarkable 84.3, with a small standard deviation (“sd”) of 2.9; all but 12 episodes have an AI Score between 80 and 89. Enthusiasm has only somewhat diminished over time: average IMDB rating is 7.78 (sd=1.1), with 113 episodes (68%) between 7.0 and 8.9. In the previous version of this post, average AI Score was a tick higher (84.9) while average IMDB rating was higher still (8.13). While the former, as we shall see, represents a diminution of the show’s popularity in recent years, the latter suggests more recent IMDB raters are not as enamored with these episodes as prior raters; only the 2009 special “The Waters of Mars” had a higher IMDB rating, increasing from 8.7 to 8.8.

Two extremely highly-regarded episodes—2007’s “Blink” (9.8) and “The Day of the Doctor” (9.4—each attracted more than 15,000 raters (median=5,050; 120 [73%] between 3,000 and 5,999), accounting for the discrepancy between the series’ global IMDB rating and the mean across all 165 individual episodes.

Table 2: Most- and least-admired Doctor Who episodes (2005-17) when first aired

Title Series-Episode Doctor AI Score
Journey’s End 4-13 10 91
The Stolen Earth 4-12 10 91
Forest of the Dead 4-9 10 89
Doomsday 2-13 10 89
Silence in the Library 4-8 10 89
Asylum of the Daleks 7a-1 11 89
The Parting of the Ways 1-13 9 89
The Big Bang 5-13 11 89
The End of Time: Part Two 10th Doctor Specials 10 89
14 Episodes 3  to 50th Anniversary 10 (8), 11 (6) 88
5 Episodes 1,9,11-12 9,12,13 80
Nikola Tesla’s Night of Terror 12-4 13 79
The Battle of Ranskoor Av Kolos 11-10 13 79
The Tsuranga Conundrum 11-5 13 79
Can You Hear Me? 12-7 13 78
Sleep No More 9-9 12 78
Praxeus 12-6 13 78
Orphan 55 12-3 13 77
Rose 1-1 9 76
Love & Monsters 2-10 10 76
The End of the World 1-2 9 76

      * The Unquiet Dead (1), Heaven Sent (9), Demons of the Punjab (11), Resolution (11-NYD), The Haunting of Villa Diodati (12)

The first thing we learn from Table 2 is that British viewers did not immediately warm to Christopher Eccleston as the 9th Doctor upon Doctor Who’s resurrection: the first two new episodes (“Rose,” “The End of the World”)—are tied with the execrable Series 2 episode “Love & Monsters” for lowest AI Score. More recently, however, there are signs British audiences may be cooling to the show and, specifically, the ascension of Chris Chibnall as Doctor Who showrunner. Setting aside the even-more-execrable Series 9 episode “Sleep No More,” the other six episodes with the lowest AI Score date from his tenure, evenly divided between Series 11 and 12. Overall, 13 13th Doctor episodes (54%)—14, if you include “Twice Upon a Time”—rank in the bottom 24 in AI Score; no episode in which Jodie Whittaker portrays The Doctor tops 83.

Meanwhile, four of the five episodes with the highest AI scores came as the 10th Doctor’s song was ending: the spectacular two-part Series 4 finale (“The Stolen Earth/Journey’s End) and the equally-brilliant two-part “Silence in the Library/Forest of the Dead.” The top nine is rounded out by four other “finale” episodes: “The Parting of the Ways” (9th Doctor’s regeneration), “Doomsday” (Rose Tyler [Billie Piper] gets trapped in a parallel universe), “The End of Time: Part Two” (10th Doctor’s regeneration) and “The Big Bang” (Series 5 finale), as well as the first episode of Series 7a, “Asylum of the Daleks.”

But while AI Scores are a fixed starting point, albeit solely with British audiences, the IMDB ratings (flaws and all) in Table 3 signal how attitudes toward Doctor Who episodes have evolved over time, after they have been watched and re-watched, shared with others, and discussed at length.

Table 3: Doctor Who episodes (2005-17) with highest/lowest IMDB ratings

Title Series-Episode Doctor IMDB Rating # User-Raters
Blink 3-10 10 9.8 17,343
Heaven Sent 9-11 12 9.6 8,935
Forest of the Dead 4-9 10 9.5 7,789
Silence in the Library 4-8 10 9.4 7,480
The Day of the Doctor 50th Anniv 10/11 9.4 16,566
Doomsday 2-13 10 9.3 7,291
Vincent and the Doctor 5-10 11 9.3 8,961
The Girl in the Fireplace 2-4 10 9.3 9,064
5 Episodes* 1,3,4,10 9 (1), 10 (2), 12 (1) 9.2 4,001-7,138
3 Episodes 2,8,11 10 (1), 12 (1), 13(1) 6.0 4,475-6,787
Can You Hear Me? 12-7 13 5.9 2,154
Sleep No More 9-9 12 5.8 4,185
Resolution 11-NYD 13 5.7 3,751
The Timeless Children 12-10 13 5.6 2,481
The Witchfinders 11-8 13 5.6 4,531
The Battle of Ranskoor Av Kolos 11-10  

13

5.2 3,868
Praxeus 12-6 13 5.2 2,517
Arachnids in the UK 11-4 13 5.0 6,048
The Tsuranga Conundrum 11-5 13 4.9 5,582
Orphan 55 12-3 13 4.1 3,778

      * The Empty Child (1), The Family of Blood (3), Journey’s End (4), World Enough and Time (10)

        † Fear Her (1), In the Forest of the Night (8), The Ghost Monument (11)

Twenty-four resurrected Doctor Who episodes have an IMDB rating of 9.0 or higher, topped by “The Day of the Doctor,” “Silence/Forest,” the penultimate Series 9 episode “Heaven Sent” and, of course, “Blink.” The extremely high number of “Blink” raters supports the idea this is the episode most often used by Doctor Who fans to introduce the show to non-fans; if you are wondering, my wife Nell’s and my introduction was the remarkable “The Eleventh Hour” (88, 8.6), the first episode of Series 5. Somewhat less often used this way (ranked 3rd and 4th in raters) are the bittersweet episodes “The Girl in the Fireplace” (Series 2) and “Vincent and the Doctor” (Series 5). The heartbreaking “Doomsday” rounds out the top eight. My personal favorite episode, “A Good Man Goes to War” (Series 6), is in a 7-way-tie for 13th with a 9.1 IMDB rating.

Bringing up the rear, by contrast, are 13 episodes with IMDB ratings ≤6.0, all but three from Series 11 and 12. In the previous version of this post, “Sleep No More” ranked lowest at 6.0; even though its IMDB rating dropped to 5.8, fully eight episodes are now ranked below it, including the Series 11 episode “The Tsuranga Conundrum” (4.9) and the wretched Series 12 episode “Orphan 55” (4.1).

There is clear overlap across these three rankings: “Doomsday,” “Silence/Forest,” “Stolen/Journey’s,” “The End of Time: Part Two,” “The Pandorica Opens/The Big Bang,” “A Good Man” and “Day” remain among the most admired and oft-rated episodes, while “Sleep No More” and “Love and Monsters” are still best forgotten. It is likely too soon to know if attitudes toward the two most recent Series will evolve. On the other hand, an episode like “Heaven Sent,” which was relatively poorly received when it first aired in November 2015 (AI score=80), is now the 2nd-highest rated episode on IMDB!

A correlation coefficient (r) measures how well two measures “agree” in a linear way. R ranges between -1.00 and 1.00; if r is negative, then as one measure increases, the other decreases, and if r is positive, as one measure increases, the other measure increases. When r=0.00, the association is completely random.

The correlation between AI score and IMDB rating is a very solid 0.61, while that between IMDB rating and number of raters is a solid 0.46. These associations are seen more clearly in Figures 1 and 2 below. The correlation between AI score and number of user-raters was a more modest, though still positive, 0.28 (data not shown).

Figure 1: AI Score vs. IMDB Rating, Doctor Who episodes, 2005-20 (n=165)

DW Figure 1

Figure 2: IMDB Rating vs. # Raters, Doctor Who episodes, 2005-20 (n=165)

DW Figure 2

Attitude evolution. Comparing each episode’s AI scores and IMDB ratings reveals which episodes have increased in appeal over time, and vice versa. To do this, I converted each value to its z-score (number of SD above/below average) to account for differing scales; every z-score has average=0 and SD=1. For example, “A Good Man” has an IMDB rating of 9.1. Subtracting the average of 7.8 from 9.1, then dividing by the SD of 1.1 yields a z-score of 1.25, meaning this episode is 1.25 SD more highly regarded than average based upon its IMDB score.

Figure 3: AI Score vs. IMDB Rating (z-scores), Doctor Who episodes, 2005-20 (n=165)

DW Figure 3

Two-thirds (66%) of these episodes remain either better regarded than average (both z-scores>0, n=55) or less well regarded than average (both z-scores<0, n=54). Once again, “Blink” and “Stolen/Journey’s” were, and remain, highly regarded, while “Love and Monsters” and “Orphan 55” continue to be episodes best to avoid.

Twenty-seven episodes (16%) went from above average to below average in public esteem–as shown in the lower right quadrant of Figure—most notably the Series 3 episodes “Daleks in Manhattan” and “The Lazarus Experiment.” The latter declined 1.7 SD from a respectable AI score of 86 to a well-below-average IMDB rating of 6.6, while the former dropped 1.6 SD (87 to 7.0). The only other episodes to decline at least 1.5 SD while going from more- to less-well-regarded than average are “The Curse of the Black Spot,” “The Poison Sky” and “Planet of the Dead.” Other than “Curse,” these four episodes feature the 10th Doctor, though nothing else obviously links them.

Finally, 29 episodes (18%) went from below average to above average in regard (upper left quadrant of Figure 3), most notably “Heaven Sent,” which has increased an astonishing 3.2 SD (80 to 9.6) since its November 2015 debut; this episode—the Groundhog Day of Doctor Who—rewards repeat viewing. The next highest increase in SD is 1.85 for “Listen” (82 to 9.0), one of the 12th Doctor’s earliest and most personal adventures. In fact, four of five episodes to increase at least 1.5 SD to become more well-regarded than average, including “Hell Bent” and “The Doctor Falls,” feature the 12th Doctor. Perhaps his imminent departure from the series prompted this positive reevaluation; “The Girl in the Fireplace” rounds out the list.

Series. As seen in Table 1, there have actually been 13 resurrected Doctor Who Series, as Series 7 was split into two halves: one with companions Amy Pond (Karen Gillan) and Rory Williams (Arthur Darvill), and one with companion Clara Oswald (Jenna Coleman). While Series 6 featured a nearly three-month gap between the first seven and the final six episodes, I consider it a single Series because it features the same companions and a unifying story arc.

Further complicating the demarcation of individual Series are the 13 Christmas episodes, three 10th Doctor specials and the 50th anniversary special (Table 4). It is not clear into which, if any, Series these episodes should be placed. Christmas episodes were equally admired at initial airing (average AI score=84.1 vs 84.4 for all other episodes) and are slightly better-regarded now (average IMDB rating=7.99 vs. 7.76 for all other episodes). The four stand-alone Specials, however, were—and, excepting “Planet,” are—much better-regarded.

Table 4: AI Scores and IMDB Ratings, Doctor Who Christmas and Special Episodes (2005-17)

Title Year/Date Doctor AI Score IMDB Rating
Christmas Specials
The Christmas Invasion 2005 10 84 8.1
The Runaway Bride 2006 10 84 7.6
Voyage of the Damned 2007 10 85 7.6
The Next Doctor 2008 10 86 7.5
The End of Time: Part One 2009 10 87 8.2
A Christmas Carol 2010 11 83 8.6
The Doctor, The Widow and the Wardrobe 2011 11 84 7.2
The Snowmen 2012 11 87 8.4
The Time of the Doctor 2013 11 83 8.4
Last Christmas 2014 12 82 8.3
The Husbands of River Song 2015 12 82 8.5
The Return of Doctor Mysterio 2016 12 82 7.4
Twice Upon a Time 2017 12 81 8.1
 

10th Doctor Specials (after Series 4, excluding Christmas)

Planet of the Dead April 11, 2009 10 88 7.5
The Waters of Mars November 15, 2009  

10

88 8.8
The End of Time: Part Two January 1, 2010  

 

10

89 8.9
 

50th Anniversary Special

The Day of the Doctor November 23, 2013 War, 10, 11 88 9.4

For simplicity, then, I assessed individual Series using only the 148 episodes listed in Table 1.

Figure 4: Average AI Scores and IMDB Ratings, Doctor Who Series (2005-20)

DW Figure 4

Series 1 started slowly (Figure 4; AI Scores divided by 10 for apples-to-apples comparison), although four of the final five episodes rank among the most well-regarded now (“The Empty Child/The Doctor Dances,” “Bad Wolf/The Parting of the Ways,” average IMDB score=9.0).

While Series 2 is now slightly less well-regarded than Series 1, and average IMDB rating for Series 3 drops to 7.94 without “Blink,” Series generally became better-regarded through Series 4. This latter Series is the best-regarded of the revived Doctor Who, both when first aired (average AI score=88.1) and now (average IMDB rating=8.42). It started slowly: while “Partners in Crime” through “The Unicorn and the Wasp” (n=7) have a solid AI score average of 87.3, their average IMDB rating is only 7.73. Starting with the brilliant two-part “Silence/Forest,” however, the six episodes through “Journey’s End” have an astonishingly-high average AI score (89.0) and IMDB rating (9.20)! Outside of the three-episode sequence “The Name…” (88, 9.2), “The Day…” (88, 9.4) and “The Time of the Doctor” (83, 8.5), this is the pinnacle of the resurrected Doctor Who, rivaled only by the conclusion to Series 9.

Following the 10th Doctor’s regeneration, however, Series 5 and 6 dropped back to the more-than-respectable levels of Series 1-3. Series 6 had two distinct parts: the seven-episode sequence of “The Impossible Astronaut” through “A Good Man” have solid average AI score (86.7) and IMDB rating (8.16), which drop to 85.7 and 7.95, respectively, for the final six episodes (“Let’s Kill Hitler” through “The Wedding of River Song”).

Starting in Series 7a, these measures diverge, with average AI score jumping to 87.2 and average IMDB rating dropping to 7.98; the Series started (“Asylum of the Daleks,” 89, 8.6) and ended (“The Angels Take Manhattan,” 88, 9.0) well, though it faltered in between (n=3, 86.3, 7.43). The advent of companion Clara Oswald in Series 7b appeared to spike a further decline in public esteem, which only deepened when she teamed with the 12th Doctor in Series 8 and 9, excepting the average IMDB rating of 8.90 for the three-part Series finale (“Face the Raven/Heaven Sent/Hell Bent”). Series 10, with the first openly lesbian companion (Bill Potts [Pearl Mackie]), then signaled a return to Series-8-level regard.

And then…the popularity of Doctor Who took a nosedive over cliffs as steep as those which dominate Broadchurch, which also starred Tennant and Whittaker.

To be fair, average AI Score did not decline nearly as much, perhaps because Britons wanted to give the first female Doctor a fair chance. Indeed, the first full Whittaker episode—“The Woman Who Fell to Earth”—was the top-rated program of the week, the first time that had happened since “Day” in November 2013. And that episode has an OK 6.9 IMDB rating to go with its respectable 83 AI Score. “Rosa,” featuring American civil rights icon Rosa Parks two episodes later, has similar scores of 83 and 7.0. Overall, the first seven episodes averaged 5th place in their respective weeks, rivaling only the 2009-10 Tennant Christmas and standalone specials. Moreover, those seven episodes have been rated by an average of 6,548 IMDB users, rivaling the average 6,740 IMDB raters for the last six episodes of Series 4, which aired a full decade earlier.

For all that attention, however, those seven episodes have a mean IMDB rating of 6.06, which does not materially differ from the Series 11 average of 5.93 and is lower than the Series 12 average of 6.26; the latter series featured the only three other 13th Doctor episodes with IMDB ratings of 7.0 or higher: “Ascension of the Cybermen” (7.0), “The Haunting of Villa Diodati” (7.3) and “Fugitive of the Judoon” (7.7). And every one of these episodes still ranks below the overall average of 7.78. Plus, the 14 episodes which followed “Kerblam!” ranked an average 23rd in their respective weeks, following the historic pattern of a sharp ratings decline over the course of each Series.

Nine of these 21 episodes (43%), meanwhile, have IMDB ratings between 4.1 and 5.9. For context, here are 38 movies in the same range (full disclosure—I have seen each one multiple times, and I genuinely like some of them):

The Adventures of Rocky and Bullwinkle (2000)

Batman Forever (1995)

The Big Mouth (1967)

Bloodhounds of Broadway (1989)

Bright Lights, Big City (1987)

Casual Sex? (1988)

City Heat (1984)

Cookie (1989)

Delirious (1991)

Desperately Seeking Susan (1985)

Doctor Detroit (1983)

Dog Park (1998)

Earth Girls are Easy (1989)

The Gun in Betty Lou’s Handbag (1992)

Hexed (1993)

The League of Extraordinary Gentlemen (2003)

Legal Eagles (1986)

Mannequin (1988)

The Marrying Man (1991)

Memoirs of an Invisible Man (1992)

The Meteor Man (1993)

Mixed Nuts (1994)

Mr. Saturday Night (1992)

Once Upon a Crime… (1992)

The Opposite Sex, and How to Live With Them (1993)

The Phantom (1996)

The Pick-Up Artist (1987)

Queens Logic (1991)

Random Hearts (1999)

The Spirit (2008)

Summer Lovers (1982)

Sunset (1988)

Tapeheads (1988)

Thank God, It’s Friday (1978)

Wholly Moses (1980)

Who’s Harry Crumb? (1989)

Wild Wild West (1999)

Young Doctors in Love (1982)

It is certainly possible that these 21 episodes, as was the case with the first Eccleston episodes, will be positively reevaluated in later years.

Figure 5: Average AI Scores and IMDB Ratings, Doctor Who Doctors (2005-17)

DW Figure 5

Doctors. Figure 5 displays average values for all 9th (n=13), 10th (n=47), 11th (n=44), 12th Doctor (n=40) and 13th Doctor (n=21) episodes; excluding Christmas episodes and Specials made no appreciable difference.

While websites like WatchMojo.com suggest David Tennant’s 10th Doctor is the best-regarded Doctor ever (rivaling Tom Baker’s 4th Doctor), this is not necessarily borne out by the data. The 10th and 11th Doctors have essentially identical average AI Scores—86.3 and 86.0, while the 12th and 9th Doctors are not that far behind at 82.7 and 82.2, respectively; even the 13th Doctor’s average AI Score of 80.7 is broadly respectable. Moreover, Tennant’s 8.12 average IMDB rating is not appreciably higher than Smith’s 8.04, Eccleston’s 8.01 and Capaldi’s 7.89—though all are considerably than the lowly 6.08 for Whittaker’s 21 episodes.

Conclusions. Overall, the resurrected Doctor Who has been enormously popular by all three primary metrics used above. Its 8.6 overall IMDB rating places it in the rarefied heights between Back to the Future and The Dark Knight. Still, the show did not find its footing until late in Series 1. The 10th and 11th Doctors are held in modestly higher regard than the 9th and 12th Doctors, even if the ends of Series 1 and 9 are very highly-regarded now. The pinnacle of the revived series is the latter half of Series 4, although the most highly-rated episode currently is “Blink” (Series 3), followed by “Heaven Sent” (Series 9) and the 50th-anniversary special “The Day of the Doctor. “Blink” and “Day” also have received the most IMDB user-ratings by far (>15,000 each). By contrast, it is best to avoid the Series 3 episode “Love and Monsters,” the Series 9 episode “Sleep No More” and many episodes in Series 11 and 12, though not “Fugitive of the Judoon” and “The Haunting of Villa Diodati.” While many 10th Doctor episodes have lost stature over time, a similar number of 12th Doctor episodes have done the opposite. Finally, there are extreme warning signs in the dramatic decline in ratings and public esteem following the ascension of Chibnall as show runner and the first female Doctor.

We shall see if this changes in Series 13 in 2021.

If you are interested, here is a PDF of the data compiled for these analyses.

Doctor Who Episode Data, 2005-20

Until next time…please stay safe, sane and healthy…

[1] The “classic” series aired from November 1963 to December 1989, with only one 1996 television movie—intended to be an American series pilot—before its triumphant return in 2005.

[2] As of March 28, 2020

Dispatches from Brookline: Home Schooling and Social Distancing V

On Wednesday, March 25, 2020, Massachusetts Governor Charlie Baker issued an executive order extending the closure of all public schools in the Commonwealth until at least May 4, 2020.

In four previous posts (I, II, III, IV), I described how my wife Nell, our two daughters—one in 4th grade and one in 6th grade—and I were already coping with social distancing and the closure of the public schools in Brookline, Massachusetts until at least April 7, 2020. Besides staying inside as much as possible, we converted our dining room into a functioning classroom complete with workbooks, flip charts and a very popular white board.

**********

To give our daughters something of a break during the week—especially our younger daughter, who has a yet-to-be-formally-diagnosed learning disability and attention deficit disorder—there is no “school” on Wednesday mornings. This means that when I came downstairs on March 25, 2020, Nell had not written a daily schedule on the flip chart. This likely saddened our younger daughter who was apparently going to have free reign over what the afternoon classes would be called.

To be fair, the girls had done something broadly educational that morning. With Nell, they had watched and discussed two episodes of The Blue Planet.

And they are continuing to produce drawings at a solid clip.

Wall of art March 25

The framed painting in the middle is one of two I bought when I first moved back to the Boston area—Waltham, to be precise—from my native Philadelphia in early September 2005. I do not recall why I entered the Martin Lawrence Galleries on Newbury Street (which appears recently to have closed), but once inside I was quite taken with a collection of paintings by Liudmila Kondakova. Using funds from a recent inheritance, I bought this painting and a smaller one. Both depict Paris street scenes, and both have my last name written somewhere in them.

**********

The break from school work does not extend to the afternoons, so we convened just after 2:45 pm to discuss the history of the American presidential nominating system. My attached notes for this class were a bit more scattershot than usual, but they worked well enough to tell a series of what I hoped would be interesting stories.

March 25.docx

I noted in “Dispatch IV” our daughters’ penchant for assigning monikers to historical figures. Well, they came close to doing that when I came to the 1960 Democratic nomination process, and I explained one of the primary contenders that year was United States Senator (“Senator”) Hubert Humphrey of Minnesota.

“Who names a kid ‘Hubert?” asked our older daughter. “Did his parents want him to get teased his whole life?”

After observing his middle name was Horatio—he was once erroneously referred to as Hubert Horatio Hornblower—I defended the late Vice President as a good and honorable man, though I never did get around to discussing his groundbreaking speech on civil rights at the 1948 Democratic national convention.

We concluded with a rapid-fire discussion of how Democrats—proportionally, with a minimum of 15% statewide or in a Congressional district—and Republicans—mostly winner-take-all—differ in the way they apportion nominating convention delegates.

This was followed by easily the most cringeworthy moment I have thus far endured as a parent.

I had been talking about the role “expectations” play in the modern primary and caucus system, One example I used was the way then-Arkansas Governor Bill Clinton used a 2nd place finish in the 1992 New Hampshire Democratic Primary to label himself “The Comeback Kid.”

They had been vaguely aware of Clinton’s marital indiscretions, and they understood he had been impeached for lying under oath about cheating on his wife while he was president of the United States. What they did not know, though, were the sordid details.

And they very much wanted to know what they are; they essentially promised to hear the end of my spiel in exchange.

So…after pouring myself a fresh cup of hot black coffee, half-decaffeinated to brace myself…I told them.

I did not use the words “blow job” or “fellatio,” but I described how a government shutdown in 1995 had allowed Clinton to spend time alone in the Oval Office with a young White House intern named Monica Lewinsky. And how one time she had worn a blue dress. And how she kept that dress after it came to have Clinton’s semen on it after a certain action I described…

…at which our older daughter interjected, “Oooo, gross! He peed through that! Why would anyone ever want to do that?!?”—or words broadly to that effect. Our younger daughter, meanwhile, just sat quietly, listening.

They particularly wanted to know why Ms. Lewinsky had kept that dress.

“Well, Clinton kept lying about what they had done. So she kept it as proof.”

And that was that.

Oy.

**********

At just after 4:45 pm, we reconvened for what I had thought would be the most fun part of the afternoon.

I wanted to talk about random sampling—the idea that you could get, for example, a fairly accurate impression of the distribution of attitudes in a very large population by randomly identifying a much smaller proportion of that population. However, I should have known that things would go awry when I used this example: a group of one million people includes 750,000 (75%) who prefer chocolate ice cream and 250,000 (25%) who prefer vanilla ice cream. Rather than ask every one of those people which flavor they prefer one could simply randomly select 1,000 of them to ask. Most of the time, if you sample properly, you will come within a few percentages either way of 75 and 25.

Well, our younger daughter simply wasn’t having it.

“What if someone doesn’t like either?” she began.

I explained this was merely an example, but that did not work.

“What if you like some other flavor?”

“It is a forced choice,” I weakly noted.

At this point, her sister chimed in.

“Well, which one do you prefer?”

This led to a long pause which ended in a non-answer.

At this point, I simply began talking about the activity we were about to do, one that involved 100 carefully selected cards from an UNO deck.

What I wanted to do was illustrate how queried multiple random samples from an identical population will center around “true” values within that population. My original conception was to put something like 60 blue and 40 red of the same small objects into a hat—Nell’s grandfather’s top hat lives in my home office—and have them draw 15 balls from that hat 10 times. We would record those draws to see how close they came to 60% blue and 40% red in the aggregate.

Of course, we did not have quite the objects I was envisioning, and I did not really want to cut up small bits of blue- and red-colored paper. That was when I remembered our bedraggled deck of UNO cards. There were enough cards remaining for me to compile a deck of 100 cards:

  • 50 blue and green cards, with the former “definitely voting Democratic” and the latter “leaning toward voting Democratic”
  • 43 red and yellow cards, with the former “definitely voting Republican” and the latter “leaning toward voting Republican”
  • 7 wild cards, for undecided voters

What I had not counted on was just how hard it is to shuffle—and I mean really, properly, thoroughly shuffle—a deck of 100 cards. Thus, what I thought would be a fun exercise where the girls alternated which one drew 15 cards and which one tallied the colors on the white board quickly devolved into a “why is this taking so long?” battle of long stretches of card shuffling, slow drawing and slower tallying.

Perhaps I was still reacting to the news we would be home schooling five weeks longer than we had anticipated. Perhaps I was overtired—this is more exhausting than I had expected. Or perhaps I was mad at myself for choosing an overly-thick deck of cards I could not properly “randomize.”

Whatever the reason, I snapped multiple times at both daughters, making the older one huffy and the younger one teary. I apologized—again; Nell, who taught elementary school for more than a decade, gently pointed out this is why you do not teach children “at 5:30…they are toast.”

For all the drama, however, we managed to draw 15 sets of cards. As you see, the results were not what I had anticipated. The 21 yellow cards kept making a disproportionate appearance.

Sampling results March 25

Here is a graphical representation of the results. Had I not counted the cards very carefully, I would almost think I simply had the “true” totals reversed; it is more likely simply very difficult properly to shuffle a double-deck of cards…and that randomness does not guarantee anything.

Biased sampling March 25

Even teachers have things to learn from their own lessons.

Until next time…please stay safe and healthy…

Dispatches from Brookline: Home Schooling and Social Distancing IV

On Monday, March 23, 2020, Massachusetts Governor Charlie Baker called for the closure of all non-essential businesses and asked residents to stay in their home as much as possible: to “shelter in place.” The order went in to effect at noon on Tuesday, March 24, and it will stay in effect until noon at April 7.

In three previous posts (I, II, III), I described how my wife Nell, our two daughters—one in 4th grade and one in 6th grade—and I were already coping with social distancing and the closure of the public schools in Brookline, Massachusetts until at least April 7, 2020. Besides staying inside as much as possible, we converted our dining room into a functioning classroom complete with workbooks, flip charts and a very popular white board.

**********

After a successful, albeit exhausting, first week of home schooling, we laid low over the weekend.

The highlight of Saturday stemmed from an idea our older daughter had: she desperately wanted a burrito, which she would happily eat at every meal. Choosing not to walk down the street to our preferred takeout joint, we explored delivery options instead…and discovered that our favorite Mexican restaurant—a drive of at least 20 minutes away in Cambridge—would deliver to us. It felt like such a ridiculous treat, and the food was so good, I did not mind they had given soft, not crunchy, tacos. While I ate my food and worked on my “lectures,” Nell and the girl swatched Onward, which emotionally wrecked my wife.

Later that night, I walked our golden retriever up to our local dog park—and I mean “up;” Brookline is renowned for its many streets that slope upward at nearly a 45-degree angle. To be honest, I needed the outing and the exercise more than she did. We stayed about 15 minutes, as she ecstatically chased an increasingly-filthy fuzzy ball hurled by a Chuck-It. Returning home, I put her to bed, bathed and settled down to watch the excellent I Wake Up Screaming via Turner Classic Movies OnDemand.

The choice of film–other than its sudden aviability–was in keeping with my discussion of film noir with the girls the previous day, during which I used “oneiric” to describe the dream-like quality of many films noir. This spurred a conversation about we all are having intense, more-anxiety-than-nightmare dreams during our “lockdown.”

Also in keeping with Friday’s “lecture,” our younger daughter and I watched Stranger on the Third Floor on Sunday evening. She very much enjoyed it, patiently allowing me to pause the movie at times to explain the difference between “high-key” and “low-key” lighting.

As to why we watched this particular film, here is an excerpt from Chapter 6 of the book I am writing—and need to finish soon:

Another myth to be exploded was film noir’s origin story. In the traditional telling, first outlined in Schrader’s essay, waves of mostly-German émigré filmmakers arrive in Hollywood throughout the 1930s, bringing with them the cinematic techniques of Expressionism and, later, French poetic realism. Vincent Brook, as we saw in the Introduction to Part 1, argues these filmmakers were often deeply and specifically influenced by their Jewish heritage, a primary reason they abandoned Europe, however temporarily, in the first place. Meanwhile, starting in 1931, Universal Studios—aided by German cinematographer Karl Freund, who had arrived in Hollywood two years earlier—makes a series of dark shadowy horror films (about which more in Chapter 8).  That same year, rival studios like Warner-First National, later Warner Brothers, start to produce high-quality gangster films, inspired by the lawlessness of Prohibition, ironically set to be repealed just two years later. Needing work for this influx of cinematic talent, studio heads take a long second look at works of hard-boiled crime fiction, ultimately relegating their new talent to the B-movie backlots to turn those works into films. Applying everything they know about filmmaking, and drawing upon the visual style of the popular horror films and the rapid-fire plots of the gangster films, they make films that would later be labeled film noir. The quality of these films is only enhanced throughout the 1940s by a slow loosening of the restrictive Hays Code of “voluntary” censorship, Italian neo-realism and technological advances. And the first of these films is almost certainly a 64-minute-long B-movie directed by an Eastern European émigré named Boris Ingster—and featuring an Eastern European actor named Peter Lorre—called Stranger on the Third Floor. Released on August 16, 1940, it has 33.0 POINTS, tying it for 71st overall—and, if forced to choose, it is what I designate the first film noir as commonly understood today.

For an explanation of POINTS, please see here.

**********

On Monday, March 23, 2020, I came downstairs to find this in the “classroom.”

March 23

The night before, Nell had drawn this homage to author Mo Willems—whom we once met in Maine—on the ever-popular white board.

Happy Monday Gerald and Piggy

Our younger daughter had again had a very rough morning—literally getting no work done even as our older daughter continued to thrive; indeed, on Tuesday, the latter would finish her work by at 11:30 am then ask “Is that it?!?” Still, the former daughter recovered sufficiently to sit attentively through the first hour of “Pop school,” during which we discussed the history and composition of American political parties.

March 23

For…reasons…our daughters have assigned nicknames to some of our early national leaders. Alexander Hamilton is “Hottie” Hamilton, while his rival Thomas Jefferson is “Smoking Hot” Jefferson. Our seventh president is now, unfortunately, “A**hole Jackson.” Our older daughter thought the name “Martin Van Buren” sounded “nice,” but she did not assign him a nickname.

We used two handouts to explore two ways to understand contemporary political parties:

  1. Elected officials and voters who share a common philosophy of government and policy preferences
  2. Coalitions of groups based on such factors as demographics, socioeconomic status, religiosity and cultural outlook.

The first sheet condensed an analysis I performed in August 2017 of issues on which a majority of Democrats—and often Independents—differed from a majority of Republicans. Our older daughter, fully in the throes of puberty and naively exploring her own sexuality, was particularly interested in partisan stances on LGBTQ+ rights.

Issue Differences Democrat v Republican

Whatever makes you happy, kid.

The second sheet, however, provoked the most interest. Less so from our fading younger daughter, but definitely from the older daughter, who delighted in reading aloud for Daddy to note on the white board which groups had voted, on average over the previous four presidential elections, at least 55% for the Democratic nominee or the Republican nominee; data taken from CNN exit polls conducted in 2004, 2008, 2012 and 2016.

How Groups Voted for President 2004-16.docx

You can see how that ended, complete with the tissues I use in lieu of a proper eraser:

Group voting for president

Following a break of an hour or so, we reconvened to begin to learn about probability. Which meant we each flipped a penny 30 times; by a neat fluke, in total, we had 45 heads and 45 tails—there was an a priori 8.3% chance this would happen. Then we rolled a die 30 times—the totals diverged sharply from 1/6 for each number; the number two noticeably received very little love. Our younger daughter asked to record my rolls on the white board, and, regretfully, I grew testy with her when she did not write numbers evenly on the row. I apologized immediately; clearly sheltering in place takes its toll on everyone at some point.

**********

Knowing the Commonwealth would be shuttering its doors the following day even more than it already had, I was tasked with making a run to our local Star Market. I chose to drive to one ten minutes away on Commonwealth Avenue, a stone’s throw from the main campus of Boston University; not surprisingly, we call it “the BU Star.” It normally closes at midnight, and with the campus all-but-deserted I thought this would be a relatively sane place to search for the 27 items listed in a text message from Nell on my iPhone, mostly varieties of fresh fruit and vegetable.

I never got the chance to determine it sanity, however. When I drove by its lower rear entrance, I could see the vast parking lot to my left was practically empty. Nonetheless, I parked and walked across the street to the locked sliding glass doors. A series of notices taped to those doors informed me this Star now closes at 8 pm every night.

Rather than turn around and drive home, though, I realized I was enjoying being out of the apartment and decided to drive over the nearby Charles River into Cambridge, through Harvard Square—eerily quiet—and north on Massachusetts Avenue to Porter Square. Like the BU Star, the Star Market used to be open 24 hours a day; it was my primary grocery store when I lived one block away in Somerville between September 1989 and February 2001. Driving to this Star always feels a bit like traveling back in time, with many landmarks remaining from two, three decades ago.

This Star now closes at 8 pm as well, meanwhile, which did not really surprise me. The silver lining is that a CVS sits in the same Porter Square parking lot; it is mandated by law never to close so that it can dispense emergency medications at any time of the day. When Nell nearly “broke her face” falling into a gate latch four years ago this May, this is where I acquired her pain medications after she was released from the hospital at around 1 am.

The older, deeply-freckled, red-haired manager of the CVS wore a blue face mask and darker-blue gloves. There was a strip of duct tape on the carpeting every six feet reminding patrons to observe social distancing. I collected what foodstuffs from the list I could find—including fresh-looking cut strawberries in clear plastic containers—and went to a register to pay. The manager scanned and helped bag my groceries—using the reusable bags I always keep in my car–as we chatted amiably.

As I thanked him for being there, he pointed out a woman I had noticed earlier—heavy-set, a bit unkempt and of indeterminate age—hunched over a wheelchair loaded with items she was pushing slowly around the store.

“I have to worry about thieves,” he said.

“Really? Her?” I responded, or words to that effect.

“Last week she managed to get all the razors…This never happens when George is in charge.”

He may not have been that upset, though, as he cheerfully handed me four dollars bills and some change—“You could have bought one more thing!”—before gently warning me not to forget my iPhone.

My route back to Brookline took me past the 7-Eleven on Market Street in the Boston neighborhood of Brighton, which was also still open. They had respectable-looking bananas, limes, lemons and small red and green apples, so I purchased a handful of each along with a few other items. Returning home five or so minutes later, I thoroughly washed my hands before putting away the four total bags of groceries.

A few hours later, as I was preparing a steaming-hot bath, Nell—who had gone to sleep hours earlier but now was restlessly tossing and turning—informed me she had put her wakefulness to good use by placing an Amazon Fresh order on her iPhone. She added that rather than give the recommended $10 tip, she chose to give $25 instead.

“Was that right?” she asked me as I soaked sleepily.

Of course it was,” I assured her.

When Nell placed the order, meanwhile, she thought it would arrive Tuesday night at 6 pm, only to realize later that morning it would not arrive until Thursday.

C’est la vie.

**********

The next afternoon, Tuesday, March 24, 2020, I came downstairs to find this in the “classroom.” Apparently there was no “word of the day.”

March 24

“FATHER COLLEGIO” did not start until 2:52 pm, as I was moving slowly this day. Once we assembled, though, after a BRIEF review of political parties, I began to tell the story of the 2000 presidential election by way of introducing American presidential elections generally and the Electoral College specifically. And our younger daughter was riveted.

March 24

The night before, Nell and I had discussed whether she should start taking her Ritalin on weekday mornings again. The last time she had taken any was two Thursdays earlier, her last day at her elementary school before it temporarily closed due to COVID-19, in part because we thought it was why she had been having a hard time falling asleep at night recently.

But despite refusing to take any of “her medicine” that morning, she was fully attentive and engaged as I described watching CNN continually reverse itself on who had won Florida that November night in 2000. Her attention did not wane as I walked through the history and defenses of the Electoral College, breaking more than 200 years of elections into a handful of epochs. We concluded with a discussion of how few states actually appeared to be in play as the 2016 presidential election approached—mooting the argument repealing the Electoral College would limit campaigning only to the most populous areas. At this point, our older daughter turned to her and said, “You probably don’t even remember that election. You were only [pause for arithmetic] six.”

I reminded them how both had cried the following morning upon learning that Hillary Clinton had not, in fact, been elected the first female president.

Breaking at 3:45 exactly, we reconvened one hour later to do two things as our “applied math” lesson:

Discuss how exactly Clinton lost the Electoral College in 2016 while winning the national popular vote

How Hillary Clinton Lost in 2016.docx

This is where our older daughter perked up again. While both daughters read from the one-page sheet, it was the older daughter who said “Wow!” every time I described how the Republican percentage of the non-urban vote in the pivotal states of Michigan, Pennsylvania and Wisconsin had skyrocketed between 2012 and 2016. And when we were finished, this is what the white board looked like.

Discussing 2016 election

Incidentally, you may find the answer to the question posed in the upper right-hand corner of the white board here.

I also used my wall maps of the 1988 and 1992 presidential elections to help to illustrate why the notion of a Democratic “blue wall” was absurd—voting patterns can clearly change dramatically from one election to the next. Those wall maps, by the way, are covering up an original painting by my maternal first cousin once removed; yes, that really is what my great-aunt and uncle named her. 

Color in a blank map to show current state partisanship

A few years ago, I developed 3W-RDM to assess how much more or less Democratic a state is—at least at the presidential level—than the nation as a whole.

States Ordered from Most to Least Democratic

Using the attached list of states and the District of Columbia, we each colored in our blank map as follows:

  • Dark blue = ≥10 percentage points (“points”) more Democratic
  • Light blue = 3-10 points more Democratic
  • Purple = between 3 points more Republican and 3 points more Democratic
  • Light red/pink = 3-10 points more Republican
  • Dark red = ≥10 percentage points (“points”) more Republican

Given how much both our daughters love to draw—they doodle and do other art projects as they sit and listen to me talk—this was easily their favorite afternoon activity so far. Even as our younger daughter was trying to keep up with which states were which—she got there soon enough—our older daughter was touting her “perfectionism” in carefully coloring in each state. She even gently chided me for my blunt-instrument approach to filling in “all those islands off of Alaska,” which she delicately colored one by one.

Hand drawn Democratic strength map

This is what my final map looked like. I may not be as good at drawing as my cousin, or even my wife and daughters, but I still think I produced a solid work of art, despite the single sweep of dark red across the Aleutian Islands.

Until next time…please stay safe and healthy…

Dispatches from Brookline: Home Schooling and Social Distancing III

In two previous posts (I, II), I described how my wife Nell, our two daughters and I were coping with social distancing and the closure of the public schools in Brookline, Massachusetts until at least April 3, 2020. Other than staying inside as much as possible, we converted our dining room into a functioning classroom complete with workbooks, flip charts and a very popular white board.

**********

On Friday, March 20, 2020, I came downstairs to find this in the “classroom.”

March 20

And this homage to the brilliant Netflix series Stranger Things by our 4th-grade daughter was on the always-popular white board; apparently she still retains the obsessive love of the show I instilled in her, and one I discussed last December.

Upside Down Nora

While that same daughter had something of a rough morning, our 6th-grade daughter had a terrific morning; the latter girl is genuinely enjoying her workbooks and other projects. Providing ample time for each daughter to exercise and/or FaceTime friends helps immensely as well. That said, it was our younger daughter who, in the evening, asked if we could have “school” again tomorrow (Saturday). I am certainly happy to oblige—I have a review “quiz game” I have been thinking about putting together—but Nell and I suspect her outlook will be different in the morning. Still, to the extent these “classes” are about imposing structure and routine in the era of social distancing, maybe we should do some form of group learning activity every day, including weekends.

As I noted in the first “dispatch,” I planned to teach basic politics/government for an hour and basic applied math for an hour every weekday afternoon—except Friday. To break up the monotony, I will teach a hopefully-more-entertaining form of history on Fridays.

It is no secret I am a massive film noir fan. In October 2018, I had the opportunity to teach a course titled “What Is Film Noir” through Brookline Adult and Community Education. I only had six students, and I had a series of technical glitches trying to show movie clips—using my own DVDs—using Nell’s ancient laptop, but I nonetheless immensely enjoyed those four Wednesday nights.

Our daughters have actually watched a handful of classic films noir: both girls have seen The Maltese Falcon; Murder, My Sweet; Laura; The Naked City; Strangers on a Train and Rear Window; as well as long chunks of Out of the Past. Our older daughter has also seen Double Indemnity. They each spent some time at the first-ever NOIR CITY Boston in June 2018, watching the aforementioned Murder, My Sweet and helping their father sell Film Noir Foundation merchandise; this was my “reward” for having help to set up the festival. And they have certainly heard their father talk at great length about the subject.

It thus made perfect sense when it was time for “Daddy Prepatory” yesterday afternoon for me to set up my desktop computer in the “classroom” and open the PowerPoint slides from my first class. While I basically jumped ahead to slide 22 (of 130), in which I begin to tell the history of film noir as an idea, we did linger briefly on two photographs I had used to help to establish my bona fides to teach this class in 2018.

What is Film Noir

The first one I took in July 2017. It shows part of the “film noir” section at the now-defunct Island Video Rentals on Martha’s Vineyard.

IMG_3137

The second photograph was of yours truly attending NOIR CITY 16 in San Francisco, California the following winter.

IMG_3603

The two-part class went extremely well, with both girls asking insightful questions for the most part; our younger daughter did try to invoke Stranger Things once or twice, along with other more recent bits of pop culture. In the first hour, we focused on how “film noir” was a label first imposed after the fact on a particular set of American crime films, starting with two French film critics in 1946. After a 30-minute break, I told them two different, albeit broadly overlapping, “origin stories” for film noir:

  1. Traditional story: it was an inevitable organic artistic movement
  2. What in my opinion is a more accurate modification: it emerged from economic and creative necessity with the rise of B-movies in the 1930s

To be fair, by the middle of the second origin story, the usual doodling-based fidgeting had become sitting on the floor playing with our golden retriever, so I wrapped up quickly.

And with that we ended—possibly—classes for week one of our necessary experiment in home schooling.

**********

In my previous post, I briefly discussed some thoughts I had about the efficacy of using a designated test to determine whether a person has a condition such as the novel coronavirus. Specifically, I introduced the concepts of sensitivity (the percentage of persons who have the condition who test positive for it) and specificity (the percentage of persons who do not have the condition who test negative for it). And, given how hard it is to have 100% sensitivity and 100% specificity, I asserted epidemiologists generally prefer to have higher specificity (i.e., fewer false positives), which is achieved by loosening the criteria used to identify the condition. This preference stems from the relative rarity of most conditions epidemiologist study, which results in having many more false positives than false negatives.

Being the sort of person who does these sorts of things, though, I decided to use Microsoft Excel to test this idea. I set up a series of 2×2 tables such as the following in which I varied four values: sensitivity, specificity, prevalence (a proxy for whether everyone is tested, or only those persons deemed likeliest to have the condition) and the total number of tests performed.

Truth

Positive Negative
Observed Positive 142,500 42,500 185,000
Negative 7,500 807,500 815,000
150,000 850,000 1,000,000
Sensitivity 95%
Specificity 95%
Prevalence 15%
# Tested 1,000,000
Ratio FN/FP 5.7

What I was primarily interested in, beyond the raw number of false positives (FP) and negatives (FN), was the ratio of the former to the latter. Table 1 summarizes the results; the number of tests administered did not alter these ratios given the same set of sensitivity, specificity and prevalence values, so I omitted it from the table.

Table 1: Ratio of False Positives to False Negatives Using Different Combinations of Sensitivity, Specificity and Prevalence, Based on 1,000,000 Tests

Prevalence Sensitivity Specificity FP/FN #FP #FN
15% 95% 95% 5.7 42,500 7,500
90% 95% 2.8 42,500 15,000
95% 90% 11.3 85,000 7,500
80% 95% 1.4 42,500 30,000
95% 80% 22.7 170,000 7,500
33% 95% 95% 2.0 33.350 16,650
90% 95% 1.0 33.350 33,300
95% 90% 4.0 66,700 16,650
80% 95% 1.5 33.350 66,600
95% 80% 8.0 133,400 16,650
50% 95% 95% 1.0 25,000 25,000
90% 95% 0.5 25,000 50,000
95% 90% 2.0 50,000 25,000
80% 95% 0.25 25,000 100,000
95% 80% 4.0 100.000 25,000
85% 95% 95% 0.18 7,500 42,500
90% 95% 0.09 15,000 42,500
95% 90% 0.35 7,500 85,000
80% 95% 0.04 30,000 42,500
95% 80% 0.71 7,500 170,000

A test with sensitivity<80% and/or specificity<80% should not be utilized. Also, for any prevalence, the ratio of FP to FN will be the same across cases where sensitivity=specificity, albeit with different raw values.

Here are the primary conclusions from Table 1:

  • The lower the prevalence—or, in the case of COVID-19, the less you restrict testing only to those deemed likeliest to have it—the higher the likelihood you will have many more false positives than false negatives, irrespective of sensitivity and specificity
  • Within a given prevalence level, FP/FN is
    • Lowest when specificity > sensitivity
    • Highest when sensitivity > specificity
    • In the “middle” when sensitivity = specificity
  • The total number of “false” values (FP + FN) is
    • Lowest when both sensitivity and specificity are equal and close to 100%
    • Highest when sensitivity >> specificity

I saw a report on Twitter that 33% of persons testing positive were false positives. Based on these 20 scenarios, that would seem to indicate a situation where a fairly wide swath of the population is being tested (prevalence=15%), both sensitivity and specificity are at least 90%, and sensitivity > specificity. That percentage, which is not THAT meaningful, to be pehonest, would decrease if specificity were equal to or higher than sensitivity.

If you want to explore other scenarios like this, here is a protected copy of the workbook.

Disease Testing Worksheet

Until next time…please be safe and sensible out there…

Dispatches from Brookline: Social Distancing and Home Schooling I

In response to widespread social distancing being used to slow the spread of the novel coronavirus COVID-19, I plan to increase the frequency of my posts. And with the 2020 Democratic presidential nomination contest having effectively ended, I will not post nearly as often about American politics. Rather, I will describe how my family and I are dealing with the crisis, while presenting what I hope will be entertaining stories about…well, anything. 

**********

As of Monday, March 16, 2020, public schools in the suburban Boston town of Brookline are closed until at least Friday, April 3, 2020. I write “at least,” because public schools in Boston closed on Tuesday, March 17, 2020 and will not reopen until at least April 24, 2020; Brookline traditionally follows Boston’s lead in this regard.

My wife Nell—a former elementary school teacher who now works part-time as a children’s librarian at a local Catholic school—saw this coming the previous week. Knowing we would need to implement some classroom structure for our 4th– and 6th-grade daughters, we immediately took the following steps:

  1. We converted our dining room into a classroom, complete with white board and flip charts
  2. Nell ordered teaching supplies, including workbooks for math, science and reading; puzzles and drawing projects
  3. We began to sketch out a teaching schedule, determining that Nell would take the morning shift, and I would take the afternoon shift.

We divided the “school day” this way because while Nell is a morning person, I am an extreme night owl. Since I was laid off from my last professional salaried position in June 2015—and especially after I declared myself a writer in July 2017 and launched my “interrogating memory” book-writing project—I have maintained a distinctly counter-cyclical schedule. Basically, once the household quiets down around 11 pm—and after I have finished cleaning the kitchen and taking out any trash and/or recycling—I settle down at the desk in my home office for a few hours; this is why I publish most posts at around 3 or 4 in the morning EST. Following some quiet down time, I go to sleep close to dawn, awaking well past noon. There are exceptions, however. I awake at 7 am on Tuesdays and Thursdays to take the girls to school, allowing Nell to go to work. After taking the dog to the park for 20 or 30 minutes—often bathing in the upstairs walk-in shower when we return home, though, I get back in bed for a few hours until I need to pick up the girls from school…though I spend way too much of that time reading on my iPhone.

Still, this meant that I was already working at home. With Nell working just two days a week, this is far less of an adjustment for us than it might otherwise have been.

Meanwhile, putting my advanced degrees in political science and applied math (biostatistics, epidemiology) to good use, I decided to devote an hour each day to a general introduction to politics and government and an hour to a general introduction to statistics. That is, from Monday to Thursday. On Fridays, I plan to do something different: discuss the history of film noir, or watch a documentary about Hedy Lamarr, or something equally offbeat but still educational.

**********

On Monday, March 16, 2020, I came downstairs to find this in the “classroom.”

March 16

According to Nell, the girls had very much enjoyed their first morning of home-schooling. In fact, given the chaos that has recently descended upon the Brookline public elementary schools (e.g., two principals recently resigned in protest, even as the school district is negotiating a new teachers’ contract), they may learn more in these few weeks than they might otherwise have. I do not mean to disparage the quality of teaching in the Brookline public schools, which is generally very high. Our younger daughter has attention deficit disorder and a yet-to-be-formally-diagnosed learning disability, but with her school-based IEP (individual enrichment program) she has advanced by leaps and bounds; our older daughter is a voracious reader and diligent student, so she could probably thrive anywhere. It is just very difficult to teach and learn effectively in elementary schools with shaky leadership.

When it was my turn to teach, I used this document as a guide to begin sketching out the notion of politics as power: who has it, who decides who has it, and for how long. The section headings suggest the path our conversation took:

  1. What is politics?
  2. Birth of civilization
  3. Ancient Greece
  4. The Fall of Rome and its aftermath
  5. John Locke and the social contract

After an hour-long break, we reconvened—even as our younger daughter was fading somewhat—and I started talking about statistics, which I described simply as a way to describe a lot of information with only one or a few numbers. These latter sessions are far more interactive. Using a sheet of data about the presidents of the United States—year he first took office, length of term, age when took office, height and party affiliation (labeling Andrew Johnson a Democrat for simplicity), we focused on the most basic statistic—counts, also known as frequencies; this led to the idea of a variable, as opposed to a constant. They genuinely enjoyed seeing how old and how tall the various presidents were—and learning that about two-thirds of them were taller than my five feet, 9¾ inches.

**********

This is what greeted me on Tuesday, March 17, 2020, when I came downstairs.

March 17

When “Daddy school” started, we briefly reviewed what we had learned on Monday, then returned to Ancient Greece and Aristotle’s six types of government. This easily filled half of our time before we turned to a broader discussion of two types of modern governments: liberal democratic and authoritarian. The former we grounded in the social contractand John Stuart Mill’s “harm principle;” I pulled out my old paperback copy of On Liberty to read the key passage directly. We drew a distinction between classical liberalism (though I did not use that phrase; as Nell has pointed out more than once, these are not 20-year-old college students) and libertarianism. We then dipped briefly into ideology, contrasting liberalism with nationalism and fascism.

In the “applied math” class, we reviewed frequencies before turning to measures of central tendency (without using that term): mode, median and mean; we also defined range as the arithmetic difference between the maximum and minimum values. The president dataset was once again up to the challenge. And our older daughter got to use the white board to practice adding more than two numbers of two or more digits as well as long division.

When “Daddy school” ended, I ventured out into the world, stopping at our local CVS to pick up prescriptions for Nell and a few other items before driving to a nearby Star Market. The bread shelves where practically empty, as were most of the frozen vegetable freezers, though I was still able to find broccoli and spinach. Neither here nor at CVS could I find a single bottle of rubbing alcohol or can of Lysol disinfectant for our downstairs neighbors. As I waited quietly in the checkout line, the quirky young cashier—who told us she was 19 years old and this was her first job—told the man in front of me the Star was now limiting customers to two containers per day for milk, eggs, and a host of other products. I was only buying one gallon of milk and no eggs, so I was in no danger there. And despite 19-year-old-cashier’s worry people would try to skirt the two-per-day limit or, worse, get into fights, everyone I encountered—which was not very many people—was patient and understanding.

When I came home, I tried to convince Nell to make the following day’s word of the day “rationing.” She chose a different word, though, as you will see.

Much later that night—err, early morning—I was unwinding to various YouTube videos on our big screen HD “smart” television. I have long been a fan of WhatCulture’s take on pop culture, but this video I watched—in which members of the staff present how they are responding to the need for social distancing—is especially remarkable for how they address the new reality both soberly and comically.  Malinda Kathleen Reese’s humorous take on how properly to wash your hands is in a similar vein.

**********

This is what greeted me on Wednesday, March 18, 2020, when I came downstairs.

March 18 schedule

This was the first day the strain of being homebound began to show on our daughters. The older one—hormones coursing through her five-foot-six-inch frame—melted down in the morning over a range of issues; for the record, our younger daughter is only a few inches shorter. The latter daughter, meanwhile, perhaps responding to the fact one of her closest friends was having brain surgery that day, felt extremely nauseous.

Nonetheless, despite our older daughter now careening into wild hysterics over the Kool Aid man (your guess is as good as mine on this one), we soldiered on into “Daddy academy.” After another brief review, we turned to the American colonies in the 17th and 18th centuries. Specifically, we wondered how 13 disparate colonies, after overthrowing the “no taxation without representation” rule of tyrannical—Aristotle’s term for a solo ruler who makes rules solely on her/his own behalf—King George III of England could then fashion themselves into a nation.

It was while reading the first two paragraphs of the Declaration of Independence that our older daughter decided she wanted to visualize the word “usurp.” This is a fairly accurate depiction, actually.

Usurp

Careening rapidly through the Articles of Confederation, we came to the process of writing the Constitution of the United States between May and September 1787. And once again, our older daughter had some thoughts on two unfortunate historical realities of the document as originally drafted: a slave being considered 3/5 of a citizen for the decennial census, and the fact women could not vote until 1920.

Kool Aid man was not happy about these things.

Molly react to Constitution

The applied math was far less dramatic. We reviewed range, mode, median and mean before turning to types of statistical distributions—how data are arranged from lowest to highest value—including normal, poisson and exponential. I also touched briefly on the idea of variance, or how narrowly or widely dispersed around the mean values of a variable are.

Later that night, this appeared on the white board—courtesy of Nell, who once made extra money drawing wall murals; as she says, she cannot draw something original, but she can copy anything.

Welcome to Thursday

Until next time…please be safe and sensible out there…

Ranking every Marvel Cinematic Universe film

My memory is slightly fuzzy on this point, but I believe I had already heard of the excellent British comedy Coupling the night I happened upon the hysterical Series 4 episode “Nightlines” sometime in late 2004 or early 2005; the show first aired on May 17, 2004. Despite being completely unfamiliar with any of the characters or previous storylines, I have rarely laughed that hard before or since.

And I was hooked, to the point where I have seen all 28 episodes multiple times. In so doing, I learned the name of the man who wrote every episode: Steven Moffat.

A little over five years later, in the late spring of 2010, a friend sent me this short video to watch. This was what finally convinced me to set aside my reticence and watch an episode of Doctor Who; please see here and here to see how THAT turned out.

Among other things, that video marked the advent of Moffat as Doctor Who showrunner, a fitting reward for writing some of the best episodes of the post-2005 revival to that point. It also meant that by the end of 2010, two of my favorite television shows—period—had Moffat’s fingerprints all over them.

This probably made it inevitable, especially given my lifelong obsession with detective fiction, that I—along with my wife Nell—would eventually start watching the television show Moffat co-created and co-wrote with Mark Gatiss,[1] the one which debuted on October 24, 2010, just six months after his tenure as Doctor Who showrunner began:

Sherlock

Though I had seen him act before, in 2013’s Star Trek Into Darkness, Sherlock marked the first time I was aware I was watching an actor named Benedict Cumberbatch.

Flash forward to early January 2020, by which point I had seen every episode of Sherlock, as well as every episode of Coupling and post-revival Doctor Who. Having worked through my obsession with Stranger Things, I was casting about for the next film or television series over which to obsess. I was well aware of the pop culture phenomenon that is the Marvel Cinematic Universe (MCU), but until then I had not been especially interested in watching any of its 23 interconnected films. Curiously, one of our daughters had already seen and loved Guardians of the Galaxy, as well as portions of Avengers: Infinity War, while the other one had seen Captain Marvel on the big screen with one of her best friends. As much as I had enjoyed the Spider-Man trilogy starring Tobey Maguire, though, none of the other characters who seemed to inhabit the MCU—Iron Man, The Incredible Hulk, Captain America and Thor—particularly spoke to me. And that could well be because my primary association with those characters was spending five days as a nine-year-old in early January 1976 staring blankly at Captain America on my Mighty Marvel Bicentennial wall calendar as I recovered from one of the worst flus I have ever had.

Well…there had been one mild exception. When Doctor Strange was released in 2016, starring Cumberbatch in the title role, I was intrigued. Nell and I had loved Cumberbatch in Sherlock, and there was something about his being both a doctor—my Twitter handle is @drnoir33, after all—and a “master of the mystic arts” that felt like a fresh twist on the classic superhero epic battle trope.

Plus, I was really curious about the sparkly golden circles he kept making with his hands.

Which is how I found myself watching—and genuinely enjoying—Doctor Strange roughly six weeks ago. Following a pre-credits fight scene and the opening credits, we meet Doctor Stephen Strange as he prepares for surgery. A short time later, nearing the end of the procedure, he asks a fellow physician to play the “challenge” round in a musical trivia game. After easily identifying “Feels So Good” by Chuck Mangione, along with its correct year of release—1977, not 1978—Strange is asked about all the “useless” knowledge he has.

His flabbergasted response recalls my own immersion in musical esoterica: “Useless? The man charted a top ten hit with a flugelhorn!”

Doctor Strange FunkoPop

As we have been told for years about Pringles, you cannot stop at just one MCU film—especially not when your wife has a massive lifelong crush on Robert Downey, Jr., who portrays Tony Stark/Iron Man in nine MCU films. Subsequent days of film watching culminated with the four of us watching the wholly-satisfying, 3-hour-long Avengers: Endgame on the evening of February 16, 2020; for me, I now only have Spider-Man: Far From Home left to watch. And we are already making plans to see Black Widow when it is released in May 2020.

**********

In two previous posts, I gathered online film rating data to rank…

As I watched the MCU films, I decided to perform a similar analysis of this set of films.[2] Opening a blank Microsoft Excel worksheet, for each film I entered its:

  • Title
  • Date of release (according to the Internet Movie Database, or IMDb)
  • Year of release (ditto)
  • Length in minutes (ditto)
  • Estimated budget (ditto)
  • Gross worldwide earnings (ditto)
  • IMDb score and number of raters
  • Rotten Tomatoes (RT) Tomatometer score (% RT-sanctioned critics deeming film “fresh”), average critic rating (0-10) and number of critics
  • Audience Score (% RT users rating the film 3.5 or higher on 0-5 scale), average user rating and number of user raters

I collected budget and earnings data because I was curious whether, and how much, estimated profit—gross earnings minus budget—was related to perceived quality. Data are current as of 1:30 am EST on February 24, 2020. Analyses were performed using Microsoft Excel (Office Home and Student 2016) and Intercooled Stata 9.2[3].

History of a financial juggernaut.

As Table 1 reveals, the Marvel Cinematic Universe kicked off on May 2, 2008 with the release of Iron Man. Produced for an estimated $140 million, it ultimately earned nearly $585.4 million worldwide; the resulting $445.4 million profit was more than three times what the film cost to make. At the end of the film, in the first MCU post-credits scene, Nick Fury (Samuel L. Jackson) first reveals something called “the Avenger initiative” to Downey’s Stark.

Table 1: MCU Films by release date and financial status

Title Release date Run time (mins.) Estimated budget Gross worldwide earnings Estimated profit Profit/

Budget

Iron Man 5/2/2008 126 $140 million $585,366,247 $445,366,247 3.18
The Incredible Hulk 6/13/2008 112 $150 million $264,770,996 $114,770,996 0.77
Iron Man 2 5/7/2010 124 $200 million $623,933,331 $423,933,331 2.12
Thor 5/6/2011 115 $150 million $449,326,618 $299,326,618 2.00
Captain America: The First Avenger 7/2/2011 124 $140 million $370,569,774 $230,569,774 1.65
Marvel’s The Avengers 5/4/2012 143 $220 million $1,518,812,988 $1,298,812,988 5.90
End of Phase 1
Iron Man 3 5/3/2013 130 $200 million $1,214,811,252 $1,014,811,252 5.07
Thor: The Dark World 11/8/2013 112 $170 million $644,783,140 $474,783,140 2.79
Captain America: The Winter Soldier 4/4/2014 136 $170 million $714,421,503 $544,421,503 3.20
Guardians of the Galaxy 8/1/2014 121 $170 million $772,776,600 $602,776,600 3.55
Avengers: Age of Ultron 5/1/2015 141 $250 million $1,402,805,868 $1,152,805,868 4.61
Ant-Man 7/17/2015 117 $130 million $519,311,965 $389,311,965 2.99
End of Phase 2
Captain America: Civil War 5/6/2016 147 $250 million $1,153,296,293 $903,296,293 3.61
Doctor Strange 11/4/2016 115 $165 million $677,718,395 $512,718,395 3.11
Guardians of the Galaxy Vol. 2 5/5/2017 136 $200 million $863,756,051 $663,756,051 3.32
Spider Man: Homecoming 7/7/2017 133 $175 million $880,166,924 $705,166,924 4.03
Thor: Ragnorak 11/3/2017 130 $180 million $853,977,126 $673,977,126 3.74
Black Panther 2/16/2018 134 $200 million $1,346,913,161 $1,146,913,161 5.73
Avengers: Infinity War 4/27/2018 149 $321 million $2,048,359,754 $1,727,359,754 5.38
Ant-Man and the Wasp 7/6/2018 118 $162 million $622,674,139 $460,674,139 2.84
Captain Marvel 3/8/2019 123 $175 million $1,128,274,794 $953,274,794 5.45
Avengers: Endgame 4/25/2019 181 $356 million $2,797,800,564 $2,441,800,564 6.86
Spider Man: Far From Home 7/2/2019 129 $160 million $1,131,927,996 $971,927,996 6.07
End of Phase 3

Six weeks after Iron Man hit theaters, The Incredible Hulk was released—and while it turned a modest $114.8 million estimated profit, it remains the only MCU film to have a lower estimated profit than estimated budget. Perhaps this is why the third MCU film, Iron Man 2, did not arrive in theaters for nearly two more years; while not as successful as its predecessor, its estimated profit was still more than twice its estimated budget. The same was true of the next two films, which introduced Thor (and Clint Barton/Hawkeye) and Captain America; we had already met Natasha Romanoff/Black Widow in Iron Man 2.

On May 4, 2012, these six “Avengers” would unite in the most successful MCU film to date: Marvel’s The Avengers. This was not only the first film in the franchise to earn more than $1 billion in estimated profit—a staggering 5.9 times its estimated $220 million budget—it is fully 23 minutes longer than the first five films, on average; it also contains my favorite post-credits scene. Avengers provided a highly profitable end to what is now known as “Phase 1,” with the six films combining for more than $2.8 billion in estimated profit.

Phase 2 launched almost exactly one year later with Iron Man 3, the second consecutive MCU film to top $1 billion in estimated profit and have a profit/budget ratio (PBR) of at least 5.0. The next five films, ending with the more explicitly-comedic Ant-Man, all had a PBR of at least 2.79, with Avengers: Age of Ultron becoming the third MCU film to top $1 billion in estimated profit. Overall, the six Phase 2 films earned nearly $4.2 billion in estimated profit, as the franchise steadily increased in popularity. Besides Ant-Man (and, by implication, The Wasp) this Phase also introduced War Machine/James Rhodes, The Falcon/Sam Wilson, Vision/Jarvis, Scarlet Witch/Wanda Maximoff, Rescue/Pepper Potts, Winter Soldier/James “Bucky” Barnes and the Guardians of the Galaxy: Star-Lord/Peter Quill, Rocket Raccoon, Groot, Drax the Destroyer, Gamora and (though not yet an Avenger) Nebula.

Phase 3, the final Phase of what is known collectively as “The Infinity Saga,” began with the release of Captain America: Civil War on May 6, 2016; this film was the longest film to date, at 2 hours, 27 minutes, and it featured the debut of Spider Man. The aforementioned Doctor Strange was released six months later, also introducing Wong, with three films—one introducing Mantis and another introducing Valkyrie, Korg and Miek—following in 2017. The release of Black Panther on February 16, 2018 not only signaled the impending showdown with Thanos in the subsequent Avengers: Infinity War, it also introduced four more Avengers: the titular Black Panther/T’Challa, Okoye, Shuri and M’Baku, bringing the total to 31. Black Panther and Infinity War would become the fourth and fifth MCU films to top $1 billion in estimated profit; the latter’s estimated $1.73 billion in profit easily made it the most profitable film in the franchise to date.

Following an Ant-Man sequel and the introduction of Captain Marvel, the interlocking storylines reached their crescendo on April 25, 2019 with the release of Avengers: Endgame. This latter film, the most profitable of all time at an estimated $2.44 billion—6.9 times its $356 million estimated budget, was just over three hours long, continuing a trend of increasing run times; the previous nine Phase 3 films average 2 hours, 12 minutes in length. Overall, the 11 Phase 3 films accrued $11.16 billion in estimated profit—meaning the average Phase 3 film netted more than $1 billion—bringing the total estimated profit across all 23 MCU films to $18.15 billion, for an average of more than $660 million per film.

As for the sheer length of these films, they combine for 2,996 minutes of run time: 2 days, 1 hour and 56 minutes in total. So, you could knock them off in one weekend-long epic marathon, though I would not recommend it.

Online ratings and increasing public awareness.

Table 2 presents five online ratings and three counts of online raters for the 23 films in the MCU.

Table 2: Ratings Measures for MCU Films

Title IMDb Score

(# Raters)

Tomato-

meter

Mean Tomato-

meter Rating

(# Raters)

RT Audience Score Mean

Audience Rating

(# Raters)

Iron Man 7.9

(898,514)

94 7.7

(278)

91 4.3

(1,082,398)

The Incredible Hulk 6.7

(416,152)

67 6.2

(231)

70 3.7

(739,115)

Iron Man 2 7.0

(686,963)

73 6.5

(297)

71 3.7

(480,400)

Thor 7.0

(711.939)

77 6.7

(284)

76 3.8

(247.469)

Captain America: The First Avenger 6.9

(701,165)

80 6.9

(267)

74 3.8

(188,979)

Marvel’s The Avengers 8.0

(1,218,614)

92 8.1

(384)

91 4.4

(1,135,342)

Iron Man 3 7.2

(721.159)

79 7.0

(318)

78 3.9

(484,684)

Thor: The Dark World 6.9

(565,662)

66 6.2

(271)

76 3.9

(310,425)

Captain America: The Winter Soldier 7.7

(698,659)

90 7.6

(295)

92 4.3

(281,813)

Guardians of the Galaxy 8.0

(996,682)

91 7.8

(322)

92 4.4

(255,076)

Avengers: Age of Ultron 7.3

(700,440)

75 6.8

(360)

83 4.0

(288,171)

Ant-Man 7.3

(533,917)

83 6.9

(321)

86 4.0

(166,462)

Captain America: Civil War 7.8

(621,385)

91 7.7

(406)

89 4.3

(179,582)

Doctor Strange 7.5

(554,767)

89 7.3

(364)

86 4.1

(109,969)

Guardians of the Galaxy Vol. 2 7.6

(624,996)

85 7.3

(403)

87 4.2

(108,403)

Spider Man: Homecoming 7.4

(472,178)

92 7.7

(384)

87 4.2

(107,475)

Thor: Ragnorak 7.9

(534,496)

93 7.6

(409)

87 4.2

(93.959)

Black Panther 7.3

(565,228)

97 8.3

(494)

79 4.1

(88,211)

Avengers: Infinity War 8.5

(748,778)

85 7.6

(455)

91 4.5

(57.790)

Ant-Man and the Wasp 7.1

(277,244)

88 7.0

(417)

76 3.7

(24,169)

Captain Marvel 6.9

(395,538)

78 6.8

(504)

48 2.9

(94,460)

Avengers: Endgame 8.5

(670,991)

94 8.2

(504)

90 4.5

(68,431)

Spider Man: Far From Home 7.5

(264,988)

91 7.5

(427)

95 4.6

(69,222)

 Table 3, meanwhile, summarizes all 14 measures.

 Table 3: Summary MCU Film statistics

Measure Mean

(SD*)

Median Minimum Maximum
Year of Release 2014.7

(3.4)

2015 2008 2019
Length (mins.) 130.3

(15.5)

129 112 181
Estimated Budget $192,782,609

($55,961,802)

$175,000,000 $140,000,000 $356,000,000
Gross Earnings $982,024,151

($576,952,326)

$853,977,126 $264,770,996 $2,797,800,564
Estimated Profit $789,241,543

($52,6038,263)

$663,756,051 $114,770,996 $2,441,800,564
Profit/Budget 3.8

(1.6)

3.5 0.8 6.9
IMDb Score 7.5

(0.5)

7.4 6.7 8.5
# IMDb Raters 629,584.6

(216,222.3)

621,385 277,244

 

1,218,614
Tomatometer 84.8

(8.9)

88 66 97
RT Critic Rating 7.3

(0.6)

7.3 6.2 8.3
# RT Critics 363.7

(80.2)

360 231 504
RT Audience Score 82.4

(10.5)

86 48 92
RT User Rating 4.1

(0.4)

4.1 2.9 4.5
# RT User Raters 289,652.4

(308,813.2)

179,582 24,169 1,135,342

*SD=standard deviation, a measure of how tightly packed values are around the mean: the smaller the value, the tighter the packing. In a normal distribution, 68% of values are within 1 SD, 95% are within 2 SD and 99% are  within 3 SD.

Two conclusions emerge from these data:

  1. As a group, these films are relatively well-regarded
  2. There is minimal variation in how well-regarded these films are.

The median IMDb score for the MCU films is a more-than-respectable 7.4, meaning half the films have a lower score and half have a higher score. Only four films have a score below 7.0: The Incredible Hulk at a good-but-not-great 6.7 and three films at 6.9. The median Tomatometer score was a very-high 88, with a solid average RT Critic rating of 7.3. Only Hulk and Thor: The Dark World have a Tomatometer score less than 70 and an average RT Critic Rating below 6.5. Finally, the median RT Audience Score is an impressive 86 and the median RT User Rating is a very solid 4.1. Only Captain Marvel has an RT audience score below 70 and an average RT User Rating below 3.5, a medicore 48 and 2.9, respectively.

For comparison, the median IMDb Score, Tomatometer, RT Critic Rating, RT Audience Score and RT User Rating values for the 557 films I analyzed in my “guilty pleasures” post are 7.2, 85, 7.1, 76 and 3.5, respectively.

At the other end of the spectrum, meanwhile, Avengers: Infinity War and Avengers: Endgame both have an IMDb score of 8.5, with two other films scoring 8.0.  Fully 10 films have Tomatometer≥90, topped by Black Panther at an eye-popping 97. Black Panther also has the highest RT Critic Rating at 8.3, followed closely by Endgame and The Avengers. Seven films have RT Audience Score≥90, topped by Far From Home at an astonishing 95. Finally, Infinity War, Endgame and Far From Home all have RT User Ratings of 4.5 or 4.6.

As for how little variance there are in these rating measures, all five standard deviations were lower than or (RT User Rating) identical to those for the far more disparate 557 films I analyzed in the earlier post. Broadly speaking, these films are clustered around an appraisal of “good, just shy of great.” Even the (relatively) lower-rated films like Hulk, Dark World and Captain Marvel are far more “meh” than “awful,” while films like Black Panther, The Avengers and Endgame approach “critical darling” status.

The three “number of raters” measures also have relatively low variance. Perhaps because it is the more-established online movie information resource, there are consistently many more IMDb Raters than RT User Raters. At the same time, while none of the 557 films discussed in the earliest post had more than 342 RT Critics, fully 13 MCU films do, topped by the 504 who weighed in on Captain Marvel and Endgame. Curiously, while the number of both IMDb Raters and RT User Raters appears to be lower for more recent films, as one would expect, the number of RT Critics actually seems to increase over time. Correlations (“r”)—a measure ranging from -1.00 to 1.00 of how closely two variables are linearly related to each other[4]—between date of release and each of these three measures confirm this: the former two are negatively correlated (r=-0.47 and -0.78, respectively) with date of release while RT Critics is very highly positively correlated at 0.88.

**********

To assess these films in a more sophisticated way, I used a statistical technique called factor analysis, which groups variables into underlying “dimensions,” or “factors,” used the 14 variables in Table 3. Each variable has a “factor loading” for each factor, essentially its correlation with the underlying dimension. This technique[5] generated three factors accounting for 90% of the total variance in these data, which is remarkably high.

The first factor (accounting for 39% of total variance) is dominated by gross worldwide earnings (0.96), estimated profit (0.96) estimated budget (0.93), run time (0.84) and PBR (0.77); number of RT critics (0.61) and IMDb score (0.56) also load relatively high on this factor. As this dimension mostly combines the cost and profitability of each film with its length, I label it “Epicness.”

The second factor (30%) is dominated by RT audience score (0.91), average RT user rating (0.88), Tomatometer (0.85), RT Critic Rating (0.81) and IMDB Score (0.74). This dimension is clearly “Perceived Quality” (PQ).

The third factor (21%) is dominated by year of release (0.88), number of RT audience raters (-0.84), number of RT critics (-0.74) and number of IMDB raters (0.72): precisely the same pattern outlined above. This dimension is effectively “Recency;” I do not dwell on it below, echoing the “guilty pleasures” post.

Table 4: How MCU Films Compare on Three “Ratings” Dimensions

Title

Epicness

Perceived Quality Recency
Iron Man -0.66 1.19 -1.97
The Incredible Hulk -0.81 -1.55 -1.25
Iron Man 2 -0.12 -1.27 -1.01
Thor -0.80 -0.61 -0.54
Captain America: The First Avenger -0.96 -0.55 -0.39
Marvel’s The Avengers 1.05 1.01 -1.64
Iron Man 3 0.68 -0.73 -0.92
Thor: The Dark World -0.35 -1.30 -0.48
Captain America: The Winter Soldier -0.64 0.97

-0.30

Guardians of the Galaxy -0.54 1.24 -0.48
Avengers: Age of Ultron 0.96 -0.76 -0.41
Ant-Man -1.05 0.15 0.35
Captain America: Civil War 0.33 0.62 0.19
Doctor Strange -0.67 0.44 0.55
Guardians of the Galaxy Vol. 2 -0.36 0.30 0.73
Spider Man: Homecoming -0.42 M 0.64 0.80
Thor: Ragnorak -0.48 0.76 0.81
Black Panther 0.47 0.49 1.06
Avengers: Infinity War 1.87 0.32 M -0.03 M
Ant-Man and the Wasp -0.74 -0.35 1.68
Captain Marvel 0.68 -2.47 1.50
Avengers: Endgame 2.98 0.47 0.33
Spider Man: Far From Home -0.66 1.19 -1.97

Table 4 reveals how many SD above or below the mean (set to 0) all 23 films are on these three dimensions.[6] Values≥1.0 are boldfaced, and values≤-1.0 are italicized; median value is marked with an “M.”

When reading these values, keep in mind that each of these factors is as “disentangled” from the other two as possible, though Epicness and PQ still overlap to some extent. This is why, for example, Infinity War and Endgame have by far the highest “Epicness” scores—they are the longest films with the highest budgets earning the most money—but do not have as high PQ scores despite their generally high ratings: they are far more “epic” than they are “high quality” according to these data. And it is why Guardians and Iron Man top these films on PQ—they are the highest-rated films which, while very profitable, were not quite on the scale of the final two Avengers films; the well-received Captain America: The Winter Soldier falls into this category as well. Somewhere in between are epic, but not well-regarded films like Ultron and less-epic, but relatively well-regarded films like Ant-Man and Doctor Strange.

The only film, meanwhile, with value≥1.0 on both measures is The Avengers, while the only film to have positive values on all three measures is Civil War.

At the other end of the spectrum, not surprisingly, are films like Hulk, Dark World, and Iron Man 2 that made far less money and are relatively lower-rated, as well as the anomalous Captain Marvel, which turned a tidy profit despite the lowest PQ score by far. In fact, every film between Iron Man and Avengers has negative values for all three measures, as does Dark World.

Summary. For those new to the MCU, these data suggest starting at the beginning, with Iron Man then jumping ahead to The Avengers; you do not miss much along the way, with the mild exception of First Avenger, which introduces key characters and plot points. Watch Winter Soldier and Guardians next, then Civil War. I personally would watch Ant-Man, Doctor Strange and Black Panther after that, if only because each is interesting in its own right and, like First Avenger, relay key characters and plot points. And then you can conclude with Infinity War and Endgame, bearing in mind their combined run time is 5 hours and 30 minutes.

Or, you can choose your own MCU adventure, which these data strongly suggest you would enjoy.

Until next time…

[1] I strongly recommend Gatiss’ three-part series on the history of horror films. Part 1 may be found here.

[2] Only to learn Leonard Maltin stopped publishing his annual Movie Guide in 2015. ata Statistical Software: Release 9. College Station, TX: StataCorp LP.

[4] Essentially, a positive correlation means that as one variable increases, the other one does as well, while a negative correlation means that as one variable increases, the other one decreases.

[5] Principal factors, with an orthogonal varimax rotation, forced to three factors.

[6] Using the “Predict” command—regression scoring method—in Stata. In essence, it converts each variable to a “z-score” (mean=0, SD=1), recalculates the factor loadings, then sums each value weighted by the factor loadings.

2020 Iowa Caucuses: How did my polling averages fare?

Given the extremely volatile polling for the 2020 Democratic presidential nomination following the conclusion of the Iowa Caucuses, I will not provide global monthly updates for next few months. Instead, I will focus on the first handful of primaries and caucuses: Iowa on February 3, New Hampshire on February 11, Nevada on February 22, South Carolina on February 20, the 14 Super Tuesday contests on March 3, and so forth.

Also: I now weight polls conducted partially after February 3, 2020 either 1.333 or 1.667 times higher, and polls conducted entirely after February two times higher, than polls conducted entirely before February 4, 2020.

On the night of February 3, 2020, I was sitting on my usual spot on our sofa, watching MSNBC and anticipating returns from that day’s Iowa Caucuses.

Iowa Visitor Center Sep 1990

Earlier that day, I had published my final WAPA (weighted-adjusted polling average) for the 11 declared Democratic presidential candidates, calculated four different ways (Table 1):

  • Using all 58 polls conducted since January 1, 2019
  • Using only the 45 polls released since the 1st Democratic debate on June 26, 2019
  • Using only the 21 polls released since the 5th Democratic debate on November 19, 2019
  • Using only the 15 polls released since the 7th Democratic debate on January 14, 2020

Table 1: Final Iowa Caucuses WAPA for declared 2020 Democratic presidential nomination candidates

Candidate All Polls Since 1st Debate Since 5th Debate Since 7th Debate
Biden 19.9 19.8 20.1 20.3
Sanders 18.4 18.8 21.0 22.7
Warren 17.1 18.1 15.6 15.6
Buttigieg 15.9 16.8 16.7 16.7
Klobuchar 6.9 7.3 9.1 9.7
Yang 3.0 3.2 3.6 3.9
Steyer 2.8 3.1 3.1 3.5
Gabbard 1.5 1.6 1.5 1.6
Bloomberg 0.4 0.4 0.6 0.5
Bennet 0.3 0.3 0.2 0.3
Patrick 0.0 0.0 0.0 0.1
DK/Other 13.8 10.6 8.5 5.2

Based solely on these numbers, one would reasonably draw the following conclusions:

  • United States Senator (“Senator”) from Vermont Bernie Sanders and Minnesota Senator Amy Klobuchar were rising in the polls heading into the Iowa Caucuses, as to a lesser extent were entrepreneur Andrew Yang and businessman Tom Steyer.
  • Massachusetts Senator Elizabeth Warren was declining in the polls.
  • No other candidate was moving in the polls one way or the other.

By 11:37 pm EST, however, I had grown tired of waiting for results other than successive waves of entrance polls, so I tweeted the following:

RIP, Iowa Caucuses (1972-2020)

I have defended their idiosyncrasies for decades, believing the retail aspects of campaigning there outweighed the low-turnout mischegoss of the process.

 No more.

 This is ridiculous.

 #IowaCaucuses #iowacaucus2020

I will not relitigate here the myriad problems the Iowa Democratic Party had with tabulating, validating and releasing three distinct measures:

  1. Initial headcount of support for each Democratic candidate (“Initial tally”)
  2. Post-realignment headcount of support for each Democratic candidate (“Final tally”)
  3. Allocation of “state delegate equivalents,” or SDE’s, the only measure ever previously reported

Moreover, my annoyance has abated since Monday night, primarily because I suspect these vote-reporting snafus revealed that the byzantine process of converting persons standing in rooms, then possibly standing in different parts of the room, into SDE’s has always been “riddled with errors and inconsistencies,” to quote a recent New York Times headline. And if this marks the beginning of the end of using caucuses to allocate delegates to each party’s nominating conventions, so be it; they are undemocratic, exclusionary and overly complex.

As for which states “should” come first in future presidential nominating processes, I am currently agnostic.

Three days later, we finally have near-final results from the Iowa Caucuses (Table 2):

Table 2: Near-final Iowa Democratic Caucuses results, February 3, 2020

Candidate Initial Tally Final Tally SDE’s
Biden 15.0 13.7 15.8
Sanders 24.8 26.6 26.1
Warren 18.4 20.2 18.0
Buttigieg 21.3 25.0 26.2
Klobuchar 12.7 12.3 12.3
Yang 5.0 1.0 1.0
Steyer 1.7 0.2 0.3
Gabbard 0.2 0.0 0.0
Bloomberg 0.1 0.0 0.0
Bennet 0.1 0.0 0.0
Patrick 0.0 0.0 0.0
Uncommitted 0.6 0.1 0.2

The following three tables list the arithmetic differences between each candidate’s final Iowa Caucuses WAPA and each of the three reported measures; positive values indicate better performance in the Caucuses than in the polls.

Table 3: Arithmetic difference between Initial Iowa Caucuses % of vote and Iowa Caucuses WAPA

Candidate All Polls Since 1st Debate Since 5th Debate Since 7th Debate Mean

Difference

Biden -4.9 -4.8 -5.1 -5.3 -5.0
Sanders 6.4 6.0 3.8 2.1 4.6
Warren 1.3 0.3 2.8 2.8 1.8
Buttigieg 5.4 4.5 4.6 4.6 4.8
Klobuchar 5.8 5.4 3.6 3.0 4.5
Yang 2.0 1.8 1.4 1.1 1.6
Steyer -1.1 -1.4 -1.4 -1.8 -1.4
Gabbard -1.3 -1.4 -1.3 -1.4 -1.4
Bloomberg -0.3 -0.3 -0.5 -0.4 -0.4
Bennet -0.2 -0.2 -0.1 -0.2 -0.2
Patrick 0.0 0.0 0.0 -0.1 0.0
DK/Other -13.2 -10.0 -7.9 -4.6 -8.9

Initial tally. If the Iowa Caucuses were instead the Iowa Primary, this would have been the only vote reported. On this measure Sanders, Klobuchar and former South Bend, IN Mayor Pete Buttigieg averaged 4.5-4.8 percentage points (“points”) higher in the initial tally than in their WAPA. And the closer in time the polls were to the Iowa Caucuses, the more “accurate” the WAPA.

Warren (+1.8 points) and Yang (+1.6) also overperformed their WAPA in the initial tally, albeit by smaller margins. And for Warren, older polls were more predictive than recent polls.

By contrast, former Vice President Joe Biden did an average of 5.0 points worse in the initial Iowa Caucuses tally than his WAPA. Steyer and United House of Representatives Member from Hawaii Tulsi Gabbard (-1.4 each) also performed somewhat worse than their WAPA.

Table 4: Arithmetic difference between Final Iowa Caucuses % of vote and Iowa Caucuses WAPA

Candidate All Polls Since 1st Debate Since 5th Debate Since 7th Debate Mean

Difference

Biden -6.2 -6.1 -6.4 -6.6 -6.3
Sanders 8.2 7.8 5.6 3.9 6.4
Warren 3.1 2.1 4.6 4.6 3.6
Buttigieg 9.1 8.2 8.3 8.3 8.5
Klobuchar 5.4 5.0 3.2 2.6 4.1
Yang -2.0 -2.2 -2.6 -2.9 -2.4
Steyer -2.6 -2.9 -2.9 -3.3 -2.9
Gabbard -1.5 -1.6 -1.5 -1.6 -1.6
Bloomberg -0.4 -0.4 -0.6 -0.5 -0.5
Bennet -0.3 -0.3 -0.2 -0.3 -0.3
Patrick 0.0 0.0 0.0 -0.1 0.0
DK/Other -13.7 -10.5 -8.4 -5.1 -9.4

Final tally. Only three candidates improved their vote totals after supporters of non-viable candidates shifted to a viable candidate (15% of attendees at a precinct caucus):

  • Buttigieg (+5,638 supporters; +3.7 points)
  • Warren (+2,238; +1.8)
  • Sanders (+2,155; +1.8)

These three candidates, as well as Klobuchar (-1,288; -0.4), performed better in the final tally than their WAPA, on average. As with the initial tally, WAPA using more recent polls was most predictive for Sanders, Buttigieg and Klobuchar, while WAPA using older polls was most predictive for Warren.

Biden, on the other hand, lost 2,693 supporters and dropped 1.3 points between the initial and final tallies; Yang and Steyer also lost considerable support between the initial and final tallies. For all three candidates, WAPA using earlier polls was most predictive.

Table 5: Arithmetic difference between Iowa Caucuses SDE % and Iowa Caucuses WAPA

Candidate All Polls Since 1st Debate Since 5th Debate Since 7th Debate Mean

Difference

Biden -4.1 -4.0 -4.3 -4.5 -4.2
Sanders 7.7 7.3 5.1 3.4 5.9
Warren 0.9 -0.1 2.4 2.4 1.4
Buttigieg 10.3 9.4 9.5 9.5 9.7
Klobuchar 5.4 5.0 3.2 2.6 4.1
Yang -2.0 -2.2 -2.6 -2.9 -2.4
Steyer -2.5 -2.8 -2.8 -3.2 -2.8
Gabbard -1.5 -1.6 -1.5 -1.6 -1.6
Bloomberg -0.4 -0.4 -0.6 -0.5 -0.5
Bennet -0.3 -0.3 -0.2 -0.3 -0.3
Patrick 0.0 0.0 0.0 -0.1 0.0
DK/Other -13.6 -10.4 -8.3 -5.0 -9.3

SDEs. The same pattern holds for SDEs as for final vote tally, with one minor modification.

  • Buttigieg, Sanders and Klobuchar outperformed their WAPA, with the difference decreasing with more recent polls
  • Warren outperformed her WAPA, with the difference increasing with more recent polls
  • Biden, Steyer and Yang underperformed their WAPA, with the difference increasing with more recent polls.

The bottom line. To evaluate these comparisons globally, I used the sum of the squared differences (“SSE”) between each WAPA value and the results value. Excluding “DK/Other,” Table 6 lists the SSE for each comparison; higher values indicate lower predictive power.

Polling period Initial Tally Final Tally SDEs
All Polls 136.5 240.5 224.9
Since 1st Debate 115.8 210.8 198.2
Since 5th Debate 88.3 190.4 168.0
Since 7th Debate 77.1 177.8 156.1

WAPA was most predictive of the initial tally, not surprising given that poll respondents are asked which candidate they planned to support upon arriving at the caucus site, and not about second or third choices. WAPA was also slightly more predictive of the distribution of SDEs than of the final raw tally of supporters, though neither was especially predictive.

For each reported measure, WAPA was more predictive the closer the polls were to the Caucuses; I will admit this rather surprised me, given the candidate-specific differences detailed above. One explanation is that including older polls, however low-weighted, masks late polling movement of the kind that occurred to Sanders, Buttigieg and Klobuchar.

For now, however, I will continue to report multiple versions of WAPA, if only to see if this pattern holds for later contests.

Now, on to New Hampshire!

Until next time…

Rituals and obsessions: a brief personal history

It started with “Taxman” by The Beatles.

Its distorted vocal opening had gotten stuck in my head despite my stated antipathy toward the band—really more pose than position, in retrospect.

Whenever I run a bath, I like to be in the tub while the faucet(s) run. Until quite recently,[1] when the tub was nearly full, I would turn off the cold water and turn on the hot water to its scalding limit, counting down “one-two-three-four, one-two-three-four, one-two-three-four, one-two-three-four” in the same slow tempo as the opening of “Taxman.” Only then would I turn off the hot water and settle in for a steamy cleansing soak.

I realize the actual track opens with “one-two-three-four, one-two” before George Harrison sings “Let me tell you how it will be/There’s one for you, nineteen for me.”

But, hey, my ritual, my rules.

At some point, I stopped employing that ritual to start a bath—only to replace it with one for exiting a bath, even as most of the water had drained around me. During my senior year at Yale, two other seniors and I lived off-campus. Our second-floor walkup had a bathtub, which I used most nights. One night, for…reasons, before the water fully drained, I squatted down and scooped up some water, quickly shaking it out of my hands as though I had just washed my hands in a sink. I repeated that sequence twice, except on the third iteration, I stood up, shaking out my hands as I did so. Only then did I step onto the bath mat.

I have performed this ritual—or some slight variant of it—every single time I have exited a bathtub since the fall of 1987. It is not as though I expect something bad will happen if I do not do so—I am not warding off anxiety; when that particular coin is flipped, it lands on depression for me nearly every time. It is simply that having started doing it, I continued to do it, making it an essential part of my bathtub “routine.”

Funnily enough, I have yet to mention this routine to my psychotherapist.

**********

In a recent post, I detailed ways the Netflix series Stranger Things had resonated with me at a deeply personal level. As of the evening of December 26, my wife Nell and I had watched the entire series—25 episodes over three seasons—twice, the second time with our two pre-teen daughters. Nell’s pithy takeaway: “I would watch it again.” Our younger daughter may already have, quietly watching in her bedroom on her new iPad. She now very much wants her friends to watch the show so she can discuss it with them…or at least have them understand why she suddenly—and with great affection—calls folks, mainly me, “mouth breather” or “dingus.”

Meanwhile, over the course of winter break, a small army of Funko Pop! figures appeared in our home, which our younger daughter arranged in rough chronological order; the short video I took of the sequence is my first ever “pinned” tweet.

Stranger Things tower.JPG

Clearly, I am not the only member of this household now utterly obsessed with the admittedly-excellent series. And one peek inside our younger daughter’s room, decorated in true Hufflepuff fashion, will reveal I am not the only member of this household who easily becomes obsessed.

But I am one of only two members of this household legally old enough to purchase and/or consume alcohol, and I am the only one who refused to drink alcohol until well into my college years—even as my high school classmates would try to get me to join them in beer drinking as we stayed in hotels for Youth in Government or Model UN—because I was very wary of my obsessive nature. I was well aware how often I could not simply enjoy something—I had to fully absorb it into my life.

Indeed, once I did finally sample that first Molson Golden in the converted basement seminar room I shared with two other Elis sophomore year, I liked it far more than I would have anticipated from sampling my father’s watered-down beer at various sporting events. Age prevented me from drinking too much, though, until I turned 21 early in my senior year. On my birthday, those same off-campus roommates took me to a local eatery called Gentree. An utter novice at drinking anything other than beer, I had no clue what to order; the gin and tonic I settled upon did nothing for me. Shortly thereafter, after a brief flirtation with Martini and Rossi (I still do not know how that bottle appeared in our apartment), I tried my first Scotch whisky.

It was love at first sip.

Over the next few years, I never drank enough for anyone to become, you know, concerned, but I did feel like I needed to have a glass of J&B or Cutty Sark with soda water—usually lemon Polar Seltzer—every day. When a close friend came to visit me in the Boston suburb of Somerville in January 1992, he presented me with a bottle of Glenfiddich—one of the better single-malt Scotches—and it was like having a revelation within a revelation, as this photograph from that night depicts.

Glenfiddich Jan 1992.jpg

This photograph reminds me I spent the 1990s and a significant chunk of the following decade living in turtlenecks—of all colors—because I decided one day while getting my hair cut, I liked the way the white cloth band looked around my neck. You know, the one hair stylists use to keep freshly-cut hair from dropping inside your shirt.

Eventually, I settled on Johnnie Walker Black (light rocks, club soda on the side[2]) as my primary poison—though I also developed a taste for a port wine called Fonseca Bin 27. Between 1991 and 1993, I spent way too much time at the bar of an terrific restaurant called Christopher’s. In 2005, I used old credit card receipts, which I had stuffed into a desk drawer for years, to calculate I spent $1,939.23 there (roughly $3,500 in 2019) in just those three years—and that sum excludes cash payments. Apparently, a hallmark of being both obsessive and a math geek is the construction of Microsoft Excel spreadsheets to calculate inconsequential values.

It would be another 10 years before I worked Scotch into my emerging Friday night bath ritual—the one with the curated music and the darkness and the single large pine-scented candle from L.L. Bean and the lavender milk bath stuff and the way I would turn off every light before walking into the candle-lit bathroom with my full tumbler of Johnnie Walker Black, or 10-year-old Laphroaig on special occasions. Ahh, that delectably peaty aroma…

More recently, Nell and I moved away from beer and whisky, respectively, toward red wine, going so far as to join Wine of the Month Club. Well, I also developed a taste for rye whisky, be it neat, mixed with ginger ale or in an Old Fashioned.

The point of this borderline-dipsomaniac history is that my high school instincts about my obsessive nature were remarkably close to the mark. Prior to being diagnosed with depression, I self-medicated with alcohol far more than I ever wanted to admit to myself. Perhaps not coincidentally, I recently cut my alcohol consumption down to almost nothing, though my stated reason is the toll it was taking on my sinuses, which have had more than enough trouble already.[3]

**********

Family lore holds I learned to read at the age of 2½, which my elementary school educator wife tells me is physiologically impossible. Whenever it was, by the time I was eight or so, I had already amassed a solid library of books.

And then I learned about the Dewey Decimal System.

With that, it no longer sufficed to organize my books alphabetically by subject or author or title, or even to use the Library of Congress classification system. No, I had to Dewey-Decimalize them, which meant going to Ludington Library, where I spent a great deal of my childhood and teenage years, to photocopy page after page of classification numbers. I still have a few books from those days, penciled numbers in my childish handwriting on the first page just inside the cover. I even briefly ran an actual lending library out of my ground-floor playroom—the one rebuilt after the fire of March 1973.

Meanwhile, my mother, our Keeshond Luvey and I spent the summers of 1974 and 1975 living in the “penthouse” of the Strand Motel in Atlantic City, NJ; my father would make the 60-mile drive southeast from Havertown, PA most weekends. In those years, the roughly 2½ miles of Pacific Avenue between Albany and New Hampshire Avenues were dotted with cheap motels and past-their-time hotels. The Strand was one of the better motels, with a decent Italian restaurant just off the lobby, dimly lit with its semi-circular booths upholstered in blood-red leather; I drank many a Shirley Temple over plates of spaghetti there. In that lobby, as in every lobby of every motel and hotel along the strip, was a large wooden rack containing copies of a few dozen pamphlets advertising local attractions.

At first, I simply took a few pamphlets from the Strand lobby to peruse later. Then I wanted all of them. Then I began to prowl the lobbies—yes, at seven, eight years old I rode the jitney by myself during the day, at just 35¢ a ride—of every motel and hotel along Pacific Avenue, and a few along Atlantic Avenue one block northwest, collecting every pamphlet I could find. They were all tossed into a cardboard box; when the winter felt like it was lasting too long, I would dump the box out on my parents’ bed and reminisce.

In the year after that second summer, I became attuned to pop music, leaving Philadelphia’s premiere Top 40 radio station, WIFI 92.5 FM, on in my bedroom for hours at a time, while I did homework, read or worked diligently on…projects.

Back in 1973, my parents had bought me a World Book Encyclopedia set, complete with the largest dictionaries I had ever seen. The W-Z volume had a comprehensive timeline of key events in world history. Late in 1976, I received a copy of the 1977 World Almanac and Book of Facts, which also had a comprehensive timeline of key events in world history. And I soon noticed some events were on one timeline but not the other.

Thus, in February 1977, with WIFI 92 as my personal soundtrack, I began to write out a collated timeline, drawing from both sources. Thirty-six lined notebook pages hand-written in pencil later, I had only gotten as far as June 30, 1841—so I decided to slap a red construction paper cover on it and call it Volume I.

Important Events and Dates.JPG

I assigned it Dewey Decimal value 909.

You could say I came to my senses—or I bought a copy of the astounding Encyclopedia of World History—because I never did “publish” a Volume II. In April 1978,[4] however, I wrote a similarly non-knowledge-advancing booklet—no cool cover this time—called 474 PREFIXES, ROOTS AND SUFFIXES. This volume, assigned Dewey Decimal number 423, was only 10 pages long, despite being more comprehensive.

**********

Even before I immersed myself in hours of 1970s Top 40 radio, I had heard bits and pieces of New Year’s Eve countdowns of the year’s top songs. The first one I remember hearing was at the end of 1974, because I heard Elton John’s “Bennie and the Jets,” which topped the Billboard Hot 100 in April 1974—though I could be mixing it up with John’s “Goodbye Yellow Brick Road,” released as a single the previous year.

In January 1980, Solid Gold debuted with a two-hour special counting down the top 50 songs of 1979. I was particularly curious to know the ranking of my favorite song at the time, Fleetwood Mac’s “Tusk;” if memory serves, it led off the show at #50. A few days earlier, my cousins and I had listened in the house we then shared to WIFI-92’s top 100 songs of 1979 countdown.

I was vaguely aware there were weekly magazines that tracked top songs and albums, but I did not buy a copy of Cashbox until late April 1980.[5] My Scotch whisky revelation nearly eight years later was a mere passing fancy compared to this slender combination of music and data. I pored over its charts for hours, even calling my best friend to all but read the singles and album charts to him; utterly disinterested, he was nonetheless very patient with my exuberance. That fall, I noticed that every Saturday, the Philadelphia Bulletin published that week’s Billboard top 10 singles, albums—and two other categories, possibly country and soul. Reading these charts—literally covering them with a napkin which I slid up to uncover each song/album from #10 to #1—became a staple ritual of my regular Saturday morning brunch with my father, from whom my mother had separated in March 1977. Not satisfied with reading them, I clipped each set of charts so I could create my own rankings along the lines of “top songs, September 1980 to March 1981.”

On December 31, 1980 and January 1, 1981, I heard two radio stations present their “Top 100 of 1980” countdowns. I listened to the first one with my cousins in my maternal grandmother’s apartment in Lancaster, PA; my mother and her sister were also there. The second one my mother and I heard in the car driving home, although we lost the signal halfway through the countdown; I still was able to hear one of my favorite songs then: “More Love” by Kim Carnes. The following weekend, I found a paper copy of yet another 1980 countdown while visiting the Neshaminy Mall with my mother and severely mentally-impaired sister, who lives near there. It was probably there I also found Billboard’s yearend edition, which I purchased—or my mother purchased for me.

After a delirious week perusing its contents, I obtained a copy of the first official weekly Billboard of 1981, for the week ending January 10—albeit released Tuesday, January 6. One week later, I bought the January 17 edition, then the January 24 edition, then the January 31 edition. In fact, I bought every single issue of Billboard for the next seven-plus years, ritualistically digesting its charts using the same uncovering method as the charts published in the Bulletin. I brought each issue to school with me, where my friends and I would pore over its contents during lunch period. Later, I happily scrutinized airplay charts from a selection of Top 40 radio stations across the country—I underlined particular favorites—while waiting to make deliveries for Boardwalk Pizza and Subs in the spring and summer of 1984.

On the few occasions I did not have the $4 purchase price, I sold an album or two to Plastic Fantastic, then located on Lancaster Avenue in Bryn Mawr, PA, to make up the difference; this was after cajoling my mother to drive me to the excellent newspaper and magazine store which then stood a short walk down Lancaster Avenue from Plastic Fantastic. While new issues of Billboard were released every Tuesday, in 1981 and 1982, I would have heard the new week’s Top 40 singles counted down the previous Sunday night on the American Top 40 radio program, then hosted by Casey Kasem.

Sometime in 1981, I began to compile weekly lists of the Top 10 groups, male artists and female artists…so it is not all surprising that over winter break from my sophomore year of high school, I calculated my own “Top 100 of 1981” lists. In the days prior to Excel, this meant I gathered all 51 weekly issues (the final chart of the year freezes for a week) into what I would later call a “mountain of Billboards” on the floor of my bedroom—sometimes the mountain would migrate into the living room—and tally every single and album that had appeared in the top 10 on blank sheets of paper, using acronyms to save my hands from cramping. I used a combination of highest chart position, weeks at that position, total weeks on the chart, and weeks topping such charts as Adult Contemporary, Rock, Country and Soul to generate my rankings. There would always be fewer than 100 singles or albums entering the top 10 in any given year so I would then move into the top 20 for singles and top 30 for albums. I had ways—long since forgotten—of adding up an artist’s singles and albums “points,” allowing me to produce an overall top 100 artist countdown.

Digging into my record collection, and pestering friends for whatever tracks they had, on January 1, 1982, I sat in my bedroom with my cousin and DJ’d my first Top 100 countdown, using a snippet of “Lucifer” by Alan Parsons Project for “commercial breaks.”

That first year, I stuck to the primary charts, but ambition seized me over the next few years, and I began to contemplate creating sub-generic lists; I would usually run out of steam after a week or so, however.  Fueling this obsessive data compiling were large navy mugs filled with a mixture of black coffee and eggnog. Even after enrolling at Yale in September 1984,[6] I would look forward to arriving back in our Penn Valley, PA apartment so I could dive into Billboard mountain and immerse myself in that year’s charts. I would come up for air to visit with family and friends, of course, but then it was right back into the pile, MTV playing on my bedroom television set.

Over the years, I never threw any issues away, which meant schlepping them with me on the Amtrak train from New Haven, CT to Philadelphia; my poor mother had to move giant piles of them twice, in 1986 (~275 issues) and 1987 (~325). They were a bit lighter then because I had gotten into the habit of taping some of the beautiful full-page ads depicting covers of albums being promoted that week. It started with Icehouse by Icehouse, then Asia by Asia; when my mother moved from our Penn Valley apartment, I had taped up a line of pages running nearly halfway around the walls of my bedroom.

Then, one week in September 1988, I did not buy the new edition of Billboard. Most likely, my musical tastes were shifting after I discovered alternative-rock station WHFS. Another explanation is that election data had been slowly replacing music chart data over the past four years. Moreover, I had landed on a new obsession: baseball, specifically the Philadelphia Phillies. Whatever the reason, I have not bought a Billboard since then, though I still have two Joel-Whitburn-compiled books from the late 1980s.

Besides the Phillies and American politics, I have had a wide range of obsessions since then, most recently film noir, Doctor Who, David Lynch/Twin Peaks and, of course, Stranger Things. My obsession with Charlie Chan is old news. But none of these had quite the immersive allure those piles of Billboards had in the 1980s.

Alas, my mother finally threw out all of them in the 1990s. While I wish she had at least saved the eight yearend issues, perhaps it is all for the best. Did I mention a college girlfriend once broke up with me—on Valentine’s Day no less—because I alphabetized my collection of button-down Oxford shirts by color, solids to the left of stripes?

Until next time…

[1] Nell reminds me that at some point in the year before our October 2007 wedding, she came into the bathroom while I was counting down. She apparently interrupted me because I told her, “Now I have to start again!”

[2] For reasons long since forgotten, I switched to Jack Daniels—bourbon—for a few years around 2000. I must have talked a lot about that being my default adult beverage order, because on a first date in December 2000, my soon-to-be girlfriend (my last serious relationship before Nell, for those keeping score at home) waited expectantly for me to ask for “that thing you always order.”

[3] I have long joked that if my upper respiratory system were a building, it would have been condemned decades earlier. In October 2011, I finally had surgery to repair a deviated septum and remove nasal polyps. I may still snore, but it longer sounds like I am about to stop breathing.

[4] April 19, to be exact

[5] I remember “Rock Lobster” by The B-52’s being listed, which narrows the editions to April 19 and April 26.

[6] I was so obsessed with Billboard, I actually suggested I analyze its charts for a data analysis course I took my sophomore year. Not surprisingly, that was a non-starter with the professor.