Positively pondering pesky probabilities, perchance

One inspiration to start this “data-driven storytelling” blog was the pioneering work of Nate Silver and his fellow data journalists at FiveThirtyEight.com; their analyses are an essential “critical thinking” reality check to my own conclusions and perceptions. Indeed, when I finally get around to designing and teaching my course on critical thinking (along with my film noir course), the required reading would include Silver’s The Signal and the Noise and a deep dive into Robert Todd Carroll’s The Skeptic’s Dictionary. I will also include Ken Rothman’s Epidemiology: An Introduction; what drew me to epidemiology (besides my long career as a public health data analyst) was its epistemological aspect. By that I mean how the fundamental methods and principals of epidemiology allow us to critically assess any narrative or story.

To that end, I have been reading with great interest Silver’s 11-part series that “reviews news coverage of the 2016 general election, explores how Donald Trump won and why his chances were underrated by most of the American media.” And while I highly recommend the entire series of articles, the September 21 conclusion is the jumping off point for my own observations about assessing the likelihood of various events.


Let me begin with a passage from that article:

In recent elections, the media has often overestimated the precision of polling, cherry-picked data and portrayed elections as sure things when that conclusion very much wasn’t supported by polls or other empirical evidence.

I personally think investigative journalists are heroic figures who will ultimately save American democracy from its current self-induced peril. But they are trained in a very specific way: deliver the fact of a story with certainty and immediacy. In so doing, they are responding to media consumers with little patience for complex narratives suffused with uncertainty.

To quote Silver again, “a story can be 1. fast, 2. interesting and/or 3. true — two out of the three — but it’s hard for it to be all three at the same time.”

One narrative that developed fairly early about the 2016 presidential election campaign was that Democratic nominee Hillary Clinton was the all-but-inevitable victor. I wrote about one version of this flawed narrative here.

Reinforcing this narrative were election forecasts issued during the last weeks of the campaign that practically said “stick a fork in Trump, he is finished.” But as Silver rightly observes, some of these models were flawed because they failed to account for the “correlation in outcomes between [demographically similar] states.” For example, were Republican nominee Donald Trump to outperform his polls in Wisconsin on Election Day, he would likely also do so in Michigan, Minnesota and Iowa. And that is essentially what happened.

Still, because aggregating polls yields a more precise picture of the state of an election at a given point in time, I aggregated these 2016 election forecasts. Going into Election Day, here were some estimated probabilities of a Clinton victory, ranked lowest to highest.

FiveThirtyEight 71.4%
Betting markets 82.9%[1]
The New York Times Upshot 84.0%
DailyKos 92.0%
HuffingtonPost Pollster 98.2%
Princeton Election Consortium (Sam Wang) 99.5%

The average and median forecast was 88.0%. Remove the most skeptical forecast (though Clinton still a 5:2 favorite), and the average and median jump to 91.3% and 92.0%, respectively. By contrast, if you remove the least forecast, the average and median drop to 84.1% and 83.5%, respectively.

It is an understandable human tendency to look at a probability over 80% and “round up” from “very likely, but not guaranteed” to “event will happen.” And, under the frequentist definition of probability, we would be correct more than 80% of the time in the long run.

But we would not be correct as much as 20% of the time.

Ignoring Wang’s insanely optimistic forecast for various reasons, the “aggregate” forecast I had in mind on Election Day was that Clinton had about an 84% chance of winning.

The flip side, of course, was that Trump had about a 16% chance of winning.

A good way to interpret this probability is to think about rolling a fair, six-sided die.

Pick a number from one to six. The chance that if you roll the die, the number you picked will come up, is 1 in 6, or 16.7%.

On Election Day, Trump metaphorically needed to roll his chosen number…and he did.

But even if take the Wang-inclusive average of 88%, that is still a 1 in 8 chance. Throw eight slips of paper with the numbers one through eight written on them in a hat (I like fedoras, myself), pick one and draw. If your number comes up (which will happen 12% of the time over many draws), you win.

Trump picked a number between one and eight then pulled it out of our hypothetical fedora, and he won the election.

One way people misunderstand probability (and one of many reasons I am resolutely opposed to classical statistical significance testing) is mentally converting event x has a very low probability (like, say, matching DNA in a murder trial—only a 1 in 2 million chance!) with that event cannot happen.

So, even the Wang forecast—which gave Trump only a 1 in 200 chance of winning—did NOT mean that Clinton would definitely win. It only meant that Trump had to pull a specific number between one and 200 out of our hypothetical fedora. He did, and he won.


On the other end of the spectrum is an overabundance of caution in assessing the likelihood of an event. This usually occurs when interpreting election polls.

In this post, I discussed Democratic prospects in the 2017 and 2018 races for governor.

One of the two governor’s races in November 2017 is in Virginia, where Democratic governor Terry McAuliffe is term-limited. The Democratic nominee is Lieutenant Governor Ralph Northam, and the Republican nominee is former Republican National Committee chair Ed Gillespie.

Here are the 13 public polls of this race listed on RealClearPolitics.com[2] taken after the June 13, 2017 primary elections:

Poll Date Sample MoE Northam (D) Gillespie (R) Spread
Monmouth* 9/21 – 9/25 499 LV 4.4 49 44 Northam +5
Roanoke College* 9/16 – 9/23 596 LV 4 47 43 Northam +4
Christopher Newport Univ.* 9/12 – 9/22 776 LV 3.7 47 41 Northam +6
FOX News* 9/16 – 9/17 507 RV 4 42 38 Northam +4
Quinnipiac* 9/14 – 9/18 850 LV 4.2 51 41 Northam +10
Suffolk* 9/13 – 9/17 500 LV 4.4 42 42 Tie
Mason-Dixon* 9/10 – 9/15 625 LV 4 44 43 Northam +1
Univ. of Mary Washington* 9/5 – 9/12 562 LV 5.2 44 39 Northam +5
Roanoke College* 8/12 – 8/19 599 LV 4 43 36 Northam +7
Quinnipiac* 8/3 – 8/8 1082 RV 3.8 44 38 Northam +6
VCU* 7/17 – 7/25 538 LV 5 42 37 Northam +5
Monmouth* 7/20 – 7/23 502 LV 4.3 44 44 Tie
Quinnipiac 6/15 – 6/20 1145 RV 3.8 47 39 Northam +8

Eight of these polls have Northam up between four and seven percentage points, including four of the last six. Two polls show a tied race. No poll gives Gillespie the lead.

And yet, here was the headline on Taegan Goddard’s otherwise-reliable Political Wire on September 19, 2017, referring to the just-released University of Mary Washington (Northam +5) and Suffolk polls (Even): Race For Virginia Governor May Be Close.

Granted, the two polls gave Northam an average lead of only 2.5 percentage points, which, without context, suggest a close race on Election Day. Furthermore, all three Political Wire Virginia governor’s race poll headlines since then have been on the order of: Northam Maintains Lead In Virginia.

Here is the thing, however. Most people (as I did) will equate “close” with “toss-up.” But there is a huge difference between “we have no idea who is going to win because the polls average out to a point or two either way” and “one candidate consistently has the lead, but the margin is relatively narrow.”

The latter is clearly the case in the 2017 Virginia governor’s race, with Northam’s lead averaging 4.4 percentage points in eight September polls within a narrow range (standard deviation [SD]=3.3). We are still more than five weeks from 2017 Election Day (November 7), so this is unlikely to be “herding,” the tendency of some pollsters to adjust their demographic weights and turnout estimates to avoid an “outlier” result (undermining the rationale for aggregating polls in the first place).

The problem comes when members of the media try to interpret the results of individual polls. They have absorbed the lesson of the “margin of error” (MoE) almost too well.

For example, the Monmouth poll conducted September 21-25, 2017 gives Northam a five percentage point lead, with a 4.4 percentage point MoE. Applying that MoE to both candidates’ vote estimates, we have 95% confidence that the “actual” result (if we had accurately surveyed every likely voter, not a sample of 499) is somewhere between Gillespie 48.4, Northam 44.6 (Northam down 3.8) and Northam 53.4, Gillespie 39.6 (Northam up 13.8). It is this range of possible outcomes, from a somewhat narrow Gillespie victory to a comfortable Northam win that leads members of the media to imply through oversimplification that this race will be close, meaning “toss-up.”

And yet, even within this poll, the probability (using a normal distribution, mean= 5.0, SD=4.4) that Northam is as little as 0.0001 percentage points ahead is 87.2%, making him a 7:1 favorite, about what Hillary Clinton was on Election Day 2016.

OK, maybe that was not the best example…

But when you aggregate the eight September polls, the MoE drops to about 1.3[3], putting the probability Northam is ahead at well over 99%. Even if the MoE only dropped to 3.0, the probability of a Northam lead would still be about 93%.

My point is this. Every poll needs to be considered not just as an item in itself (polls as NEWS!) but within the larger context of other polls of the same race. And in the 2017 Virginia governor’s race, the available polling paints a picture of a narrow but durable lead for Northam.

I have no idea who will be the next governor of Virginia. But a careful reading of the data suggests that, as of September 29, 2017, Lt. Governor Ralph Northam is a heavy favorite to be the next governor of Virginia, despite being ahead “only” 4 or 5 percentage points.


Finally, here is an update on this post about the Democrats’ chances of regaining control of the United States House of Representatives (House) in 2018.

Out of curiosity, I built two simple linear regression models. One estimates the number of House seats Democrats will gain in 2018 only as a function of the change from 2016 in the Democratic share of the total vote cast in House elections. The Democrats lost the total 2016 House vote by 1.1 percentage points, so if they were to win the 2018 House vote by 7.0 percentage points, that would be an 8.1 percentage point shift.

Right now, FiveThirtyEight estimates Democrats have an 8.0 percentage point advantage on the “generic ballot” question (whether a respondent would vote for the Democratic or the Republican House candidate in their district if the election were held today).

My simple model estimates a pro-Democratic House vote shift of 9.1 percentage points would result in a net pickup of 26.7 House seats, a few more than the 24 they need to regain control. The 95% confidence interval (CI) is a gain of 17.0 to 36.4 seats.

But the probability that Democrats net AT LEAST 24 House seats is 71.1%, making the Democrats 5:2 favorites to regain control of the House in 2018.

My more complex model adds a variable that is simply 1 for a midterm election and 0 otherwise, as well as the product of this “dummy” variable and the change in Democratic House vote share. I hypothesized (correctly) that this relationship would be stronger in midterm elections.

This model estimates that a 9.1 percentage point increase from 2016 in the Democratic share of the House vote would result in a net gain of 31.8 seats. However, with two additional independent variables (and only 24 data points), the 95% CI is much wider, from a loss of 7.0 seats to a history-making gain of 68.3 seats.

Still, this translates to a 66.1% probability (2:1 favorites) the Democrats regain the House in 2018.

Figure 1 shows the estimated probability the Democrats regain the House in 2018 using both models and a range of percentage point changes in House vote share from 2016.

Figure 1: Probability Democrats Control U.S. House of Representatives After 2018 Elections Based Upon the Change in Democratic Share of the House Vote, 2016-18

Democratic Probability 2018 House capture

The simple model (blue curve) gives the Democrats no chance to recapture the House in 2018 until the pro-Democratic change in vote share reaches 6.5 percentage points, after which the probability rises sharply and dramatically to a near-certainty at the 10.0 percentage point change mark. The more complex model (red curve), meanwhile, assigns steadily increasing chances for the Democrats, flipping to “more likely than not” at the 7.0 percentage point change mark; even at a truly historic 15 percentage point change, the complex model only gives the Democrats an 85.3% chance to recapture the House in 2018.

For the record, I lean toward the more complex model.

It is worth noting that in the current FiveThirtyEight estimate, 15.8% of the electorate is undecided or chose a third party candidate (when an option). If the undecided vote breaks heavily toward the party not controlling the White House in a midterm election (one way electoral “waves” form), a 66-71% would likely be an underestimate of the Democrats’ chances of regaining control of the House in 2018.

And…apropos of nothing…Happy 51st Birthday to me (September 30, 2017)!!

Until next time…

[1]  To be honest, I do not recall where I got this number from…possibly from fivethirtyeight.com or maybe from https://betting.betfair.com/politics/us-politics/…

[2] Accessed September 28, 2017

[3] The total number of voters sampled across these eight polls is 4,915, which is 9.85 times higher than the 499 sampled in the Monmouth poll. The square root of 9.85 is 3.14. Dividing 4.1 by 3.14 gives you 1.31.

Wait, when were you born??

I notice with some chagrin that I have only posted once (a paean to the late, great Walter Becker of Steely Dan) since August 26, 2017, which I regret, despite my assertion when I launched this blog that I would only post when I had something to say.

There are two reasons (but not, as my wife would correctly observe, excuses) for this prolonged (by blog standards) absence.

One reason is simply that I had been working closely with the Film Noir Foundation over the last few months to bring a satellite NOIR CITY festival to the Boston area. The reward for this hard work is that I am incredibly excited to announce that the Brattle Theatre will be hosting the first-ever NOIR CITY Boston over the weekend of June 8-10, 2018!


Please share this post with anyone you think would be interested in attending this festival; feel free to reach out to me for more information using the Contact information posted on the main page of this blog.

The more profound reason for this month-long dearth of new posts, however, is that I have been hip deep in researching my book (working title: Interrogating Memory: How a Love of Film Noir Led Me to Investigate My Identity). What started as the “simple” expansion of this post, about how I became such an ardent film noir fan, into a full-length book has somehow morphed into a deep-dive exploration of previously unknown family history.

Conducting this research has meant spending many joyful hours playing genealogy detective on Ancestry.com and Newspapers.com, supplemented by Google maps and books like Allen Meyers’ 2001 labor-of-love contribution to the “Images of America” series, The Jewish Community of West Philadelphia.[1] I have also been examining old photographs and reaching out to family members, some I have known my entire life and others I am only just discovering.

If you have followed my blog over the last few months (especially here, here, here and here), you are already familiar with some of what I have learned.

When you step back to look at the larger context, meanwhile, this is how the morphing occurred.

I first hypothesized that my devotion to film noir stemmed primarily from four pre-adult roots (leaving aside the still-unfolding “noir” story of my conception and adoption, which I plan to discuss further in a later post):

  • Discovering, and instantly loving, detective fiction at the age of seven or eight,
  • Discovering, and instantly loving, the Fox Charlie Chan films of the 1930s and early 1940s,
  • Exposure to a wide variety of classic and modern films through the six film societies operating at Yale when I was a student there (1984-88), and
  • Growing up a night owl in the Philadelphia suburbs means that the nocturnal city was—and is—innately alluring and fascinating to me.

The first and third “roots” are endogenous, solely a product of my actions.

I soon realized, though, that the second and, to a lesser extent, fourth “roots” are inextricably tied to my relationship with my father and the circumstances of our life in the summer of 1976, when I watched my first Charlie Chan film. That summer, my father was on the brink of losing the carpet business passed down to him by his father and uncle when he was 23, just before he married my mother in January 1960. The John Rhoads Co. (founded 1886 in West Philadelphia) had been taken over by his father and uncle in the mid-1920s and built into an even greater success. Running that business allowed my parents to move to the leafy, middle class suburb of Havertown in 1963. I can only imagine how much it pained and haunted my father that he was the one who ultimately bankrupted it, acting upon his own demons (gambling, primarily).

The point being (and thank you for continuing to “just bear with me”) that understanding my love of film noir means, in part, understanding my love of Fox Charlie Chan films and my suburban upbringing, which requires understanding the backstory of my father and mother. And understanding THAT history is what led me to become a “family archaeologist.”

One unexpectedly pleasant side benefit of this research has been opening the “black box” that is my knowledge of my father’s side of the family. My father was estranged from his own family (due to their disapproval, I had always thought, though now I suspect a kind of self-estrangement), and he rarely spoke about them, invaluable surviving school genealogy projects notwithstanding.

For example, while tracking down one particularly fascinating and deeply noir story about the death of my great-grandfather David Louis Berger in October 1919 (yes, this is a “teaser”), I came across the first photograph of him I have ever seen:

David Louis Berger (1869-1919)

All of which brings me to an example of how difficult it can be to pin down biographical information even when using seemingly unimpeachable sources.


In an early chapter of my book, after discussing the Pale of Settlement and the thriving Jewish community that dominated West Philadelphia in the middle decades of the 20th century, I begin to tell four stories (one for each grandparent) of immigration from the Pale of Settlement to West Philadelphia between 1893 and 1912. For it is a curious fact that a) both of my grandfathers were born in the Pale of Settlement (one in what is now eastern Poland and one in what is now Ukraine), as were the parents of both of my grandmothers (Latvia, Ukraine) AND b) all four families ultimately settled in one of the neighborhoods which comprised (or adjoined) West Philadelphia.

Now, in order to discuss, say, my father’s father Morris’s birth and emigration (as a ~ five year old) from the Pale of Settlement, it would help to start with his date of birth.

And here is where things get interesting (by which I mean “make a meticulous researcher want to pull her/his hair out, strand by strand”).

Since my father died in June 1982, I have visited his grave at least once every year. He is buried in a two lot, eight grave section, along with his parents, his father’s brother Jules and his father’s sister Anna.

Since Jules was born in the United States, and the salient details were recorded in an official Birth Certificate issued by the Commonwealth of Pennsylvania, we can set him aside for the rest of this post.

Clearly marked on Morris Berger’s grave is a date of birth: August 5, 1893. This is also the date listed on his death certificate. That said, he listed August 5, 1891 as his date of birth on this World War I Draft Registration card; he also said he was a natural-born citizen, a blatant lie most likely stemming from a desire to distance himself from his Teutonic last name (the US having just declared war on Germany in April) and his “Russian” origins (the Russian Revolution having begun in February). On the same card, he said he was the sole provider for his mother and father (perhaps due to his better English; his parents spoke heavily-accented Yiddish); making himself two years older would have made that assertion more plausible.

I do not have Anna Berger Halbert’s death certificate, but her gravestone lists her date of birth as February 15, 1900. As a side note, she died on December 19, 1999, meaning that the span of her life was nearly perfectly contiguous with the 20th century.

Morris was the eldest of five children. The two we have not yet met are Rose (born June 15, 1895 according to a 1963 Social Security application) and Mary, aka Mae (born sometime between late July/early August 1898 and late July/early 1899, per an obituary in the August 5, 1994 edition of the Philadelphia Jewish Exponent). I am oddly proud of the fact that all three of my paternal grandfather’s sisters lived well into their 90s (although neither brother lived past 62).

To recap, I had legitimate reason to believe that Morris was born in August 1893, Rose was born in June 1895, Mae was born sometime between late July 1898 and early August 1899, and Anna was born in February 1900.

Then, to my delight, I came across my great-grandfather David Louis Berger’s 1906 Petition for Naturalization; reading it actually gave me goosebumps.

In this application, “Louis” Berger records his birth in the town of Przasnysz (in what is now eastern Poland), his April/May 1898 journey to Philadelphia (by way of Quebec) on the SS Tungorahra (if I read my great-grandfather’s elegant handwriting correctly) [ed. note: This was almost certainly the Tongariro, which Louis Berger believed departed {likely from Liverpool} on May 5, 1898] , his absolute renunciation of Czar Nicholas II (listed as “Nicholas II, Emperor of Russia”), and his October 1906 residence in Philadelphia (2241 Callowhill Street, razed 100 years ago to allow for the construction of the Benjamin Franklin Parkway).

He also carefully records the dates of birth of his four eldest children as follows:

Morris:      October 3, 1892

Rosa           June 14, 1894

Mary:         July 5, 1896

Annie:       February 15, 1898

I’m sorry…what???

These dates of birth are anywhere from 10 months (Morris) to two years (Mae, Anna) earlier than what other public records claim.

Oh, but wait, it gets better.

A particularly invaluable resource for my research has been the detailed, house-by-house United States Censuses conducted in 1910, 1920, 1930, and 1940. I have found records of all four siblings in each Census, excepting Mae in the 1940 Census.

Based on the listed ages and Census enumeration date, here are the possible age ranges of the four Berger siblings:

Morris:    April 16, 1891 to April 26, 1894,         a gap of three years, 10 days

Rose:        January 4, 1893 to April 28, 1895,      a gap of two years, 114 days

Mae:         April 16, 1896 to April 8, 1898           a gap of one year, 357 days

Anna:       April 16, 1898 to April 16, 1902,        a gap of four years, 0 days

That is quite a range of possible dates of birth (average=2 years, 343 days) which could have resulted from simple transcription errors, poor arithmetic and/or faulty memory. It is also possible that there was genuine uncertainty on the part of David Louis Berger and his wife Ida (neé Rugowitz) as to the exact dates on which their first four children were born as they attempted to translate those dates from the Hebrew calendar to the Gregorian calendar.

But that begs the question of why Morris, Rose, Mae and Anna all adopted, for the purposes of American records, later dates of birth.

One possible clue is the fact that the Census-recorded date of arrival in the United States shifts from 1898 (the official date on the Petition is May 5, 1898) on the 1910 Census to 1900 on all subsequent Censuses. A related clue is that Mae and Anna are listed on the 1910 and 1920 Censuses as having been born in “Russia,” which then becomes “Pennsylvania” as of the 1930 Census.

If I were a conspiratorial type, I would suspect that the birth dates on the Petition were reverse-engineered, for some unknown reason, to conform to the stated arrival in Philadelphia sometime after May 5, 1898 of a family of six born in Przasnysz. However, the fact that the Petition (dated October 26, 1906) clearly states that the petitioner need only have lived in the United States continuously for five years (not, say, eight) throws cold water on this notion.

Ultimately, I will never know the exact dates on which my paternal grandfather and his three younger sisters were born. The precise dates do not REALLY matter to my larger narrative, though the lack of precision nags at me, and I will almost certainly use the later “American” dates in my book, despite what my great-grandfather wrote on his Petition for Naturalization.

All that really matters is that these six brave souls, a married Yiddish-speaking couple of modest means in their late 20s and their four (or three or two) Yiddish-speaking children, the eldest only about five years old, braved an Atlantic crossing to build new lives in the welcoming city of Philadelphia (albeit, where some siblings and cousins were already residing).

Had they not done so, my life would have turned out just a bit differently.

Until next time…

[1] Charleston, SC: Arcadia Publishing.

How do I love Steely Dan? Let me count…a whole lot of stuff and such.

Sometime in the spring of 1977 (probably), my mother found herself in a suburban Philadelphia record store. Maybe it was the (now long-since-gone) Sam Goody store on Lancaster Avenue in Ardmore. We were living only a short drive away in Havertown at the time, so why not?

My then-39-year-old mother rarely payed attention to music playing in stores, and she was heavily into her Cat Stevens/Neil Diamond phase. But for some reason, on this day, the music coming out of the ceiling speakers caught her attention.

She asked a store employee who the artist was.

“Steely Dan” was the response.

Not normally impulsive, my mother bought a copy of Steely Dan’s 1975 album Katy Lied on the spot.

Forty years later, and more than 42 years after its March 1975 release, I still have that album.


A few years earlier, I had begun to listen for hours on end to Philadelphia’s WIFI-92 FM, a mix of top 40 and rock album tracks (meaning, they would play Led Zeppelin’s “Stairway to Heaven,” even though it had never been released as a single).

By the time I was ten years old, and my mother was impulse-buying Katy Lied, I had at least a passing familiarity with Steely Dan standards like “Do It Again,” “Dirty Work” and “Reelin’ in the Years” from the 1972 album Can’t Buy a Thrill, “My Old School” from the 1973 album Countdown to Ecstasy, and “Rikki, Don’t Lose That Number” from the 1974 album Pretzel Logic. I was probably not as familiar at that time with tracks I would later come to love (“Kid Charlemagne,” “Don’t Take Me Alive,” “The Royal Scam”) from the 1976 album The Royal Scam.

And the magisterial Aja, which would dominate the worlds of pop and rock for years to come, would not be released until the following September.

No, in the summer of 1977, as my mother and I were recovering from her recent separation from my father and our move from a three-bedroom house to a smaller two-bedroom apartment, it was songs like the propulsive “Black Friday” and the laidback “Bad Sneakers” that were in heavy rotation on our newly-purchased stereo system.

From the album’s liner notes, I learned that Steely Dan was actually the singer-songwriter duo of Donald Fagen and Walter Becker. Fagen sang and played keyboards and saxophone, while Becker—who died on September 3, 2017 at the age of 67—played bass and guitar. They employed a wide range of studio musicians (including many future members of Toto) to play on their albums.

Becker died at his home in Maui, Hawaii. In 2001, a college friend of mine and I often frequented the bar at a Philadelphia-area TGI Friday’s. We became friendly with a young woman who tended bar there. One day she told us about the time she had been on Maui “with Steely Dan” enjoying some particularly potent “Maui Wowie.” She may well have meant the late Walter Becker.

My mother often made the same mistake, thinking there was a person named “Steely” Dan. I would remind her that it was actually a group, but I do not recall explaining that “Steely Dan” was originally the name given to a metallic dildo in the William Burroughs novel Naked Lunch.

The Katy Lied LP (“long-playing record”) and a handful of radio staples—including “Peg,” “Deacon Blues” and “Josie” from the ever-listenable Aja, and the title track from the 1978 film FM (a serious guilty pleasure)—were the sum total of my Steely Dan knowledge for the next three-plus years.

In November 1980, Steely Dan released Gaucho, their first new studio album in more than three years. The first single released from Gaucho was the shimmering “Hey Nineteen.” It had such a pure, clean sound that I quickly bought the single, which I still have to this day.

IMG_3258 (2)

I became equally fond of the B-side, a live recording of their 1973 song “Bodhisattva,” featuring one of the truly epic band introductions ever.

IMG_3259 (2)

This was just as I was beginning high school, and my musical horizons were rapidly expanding. In fact, I became such a frequent visitor to the renowned used record store Plastic Fantastic in Bryn Mawr that I was one of the only customers allowed to pay with a personal check.

It was there that I bought used copies of Can’t Buy a Thrill and Aja in around 1983, having already taped “Reelin’ in the Years” off the radio in February 1982.

IMG_3253 (2)

IMG_3252 (2)


Just bear with me while I present a brief history of 36 years of “mix” making.

In August 1981, I had just bought a dozen or so new LPs, including (relatively) recent releases by Steve Winwood, The Moody Blues, Foreigner, Phil Collins, Peter Gabriel, Fleetwood Mac and The Cars. (Side note: I still regret selling my original vinyl copy of Foreigner 4), and I decided to put my new (monaural) tape recorder to work.

By which I mean: I would put on an album, put the toner arm down on the start of a track, put a stereo speaker next to the tape recorder and hit “RECORD.” This required absolute silence during the playing of the track, of course.

But when I was finished I had produced a 60 minute Maxell cassette containing 15 or so tracks (the cassette and its clear plastic case are long gone, so I am relying upon my memory) that I cleverly called My Stuff. The tracks were grouped by artist (an average of 2 tracks per artist) in no particular order.

That first labor-of-love “mix” cassette was only the beginning. Over the next 35 years, I would produce 309 such mixes, getting progressively more sophisticated both in terms of technology (a series of beloved Walkman portable cassette players, Dolby noise reduction, high-end turntables, CD burners, digital CD burning, downloads) and the ordering of tracks (tracks should have a clean musical “flow,” no back-to-back tracks by the same artist except on rare occasions, alternating “new” tracks with “reruns”).

Originally, mixes were a mish-mash of tracks from my personal collection, LPs and cassettes borrowed from friends and relatives, and songs taped off of the radio (or, for a handful of “video” mixes, off of MTV or VH1). I spent countless hours in the summer of 1985, my first summer home from Yale, flipping back and forth between radio stations, creating Summer ’85 Vol I-VIII.

That fall, I abandoned the ad hoc nomenclature, as I christened an October 1985 mix cassette Stuff and Such Vol I. With few exceptions, every mix cassette from then through the May 2003 Stuff and Such Vol LXXXIX followed that naming convention. In August 2003, with the bluntly-titled CD Stuff Vol I, I abandoned cassettes entirely.

Over time, mixes shifted from primarily “one-off” creations (recording new favorite songs to play on my Walkman, filling in the rest of the cassette with old favorites) to carefully-planned multi-cassette productions, following my own evolving set of rules, specifically intended to be played on vacations or other long car drives. By 2005, I had permanently switched to creating playlists on my computer and rapidly burning those onto a CD; this meant that rather than being limited by time to at most four cassettes, I could create and burn as many as 12 CDs at one go. And by 2014, I had abandoned CDs entirely in favor of simply creating iPod playlists (each corresponding to a CD, as I liked to alternate “rocking” with “mellow” CDs) I could play in my car through a cassette adapter.


When I graduated from high school in 1984, one of my two yearbook quotes was this line from “Reelin’ in the Years:”

The things that pass for knowledge/I can’t understand

When I arrived at Yale that fall, among the dozens of LPs and 45 RPM records I brought with me were the three Steely Dan albums listed above, as well as the “Hey Nineteen/Bodhisattva (live)” single. Thus, I could record the tracks that open Side 1 of Can’t Buy a Thrill (“Do It Again,” “Dirty Work,” “Kings”) onto Yet More Good Stuff Vol I in January 1985. Among the tracks I taped off the radio for Summer ’85 Vol I-VIII were “FM (No Static At All)” and “Rikki, Don’t Lose That Number.”

When I moved from Philadelphia to Boston in September 1989 to attend my first doctoral program (the one I left ABD after six years), I already had created 56 mix cassettes containing 919 unique tracks covering 1,121 total “slots.” The six Steely Dan tracks, each occupying only one slot, accounted for 0.7% of all tracks and 0.5% of all slots[1]. Those six tracks put Steely Dan in a 10-way tie for 32nd and those six occupied slots put them in a six-way tie for 44th, and were far above the overall artist averages of 2.6 tracks and 3.2 slots. Thus, when you only look at the number of tracks finding their way on to a mix cassette, Steely Dan was already one of my 40 or 50 favorite artists. Throw in the three LPs and one single, and they may well have been ranked even higher.

Early in 1990, I bought my first CD player, which meant I needed to start acquiring CDs. So I joined Columbia House (or was it BMG Music?), and I received my 11 or 12 CDs for one cent (plus shipping and handling). I do not recall which CD I was later required to buy at a 200+% markup.

One of those CDs was the compilation A Decade of Steely Dan. Interestingly, when I next recorded a Steely Dan track onto a mix cassette (Stuff and Such Vol XVI; April 1990) it was not from that CD. Rather, it was “Josie” from my vinyl copy of Aja.

And that was it for the next nine years in terms of Steely Dan. I would play Decade from time to time, but Steely Dan somewhat faded into the background for me as I spent most of my time listening to Boston’s pioneering alternative station WFNX (101.7 FM) before abandoning listening to music on the radio altogether around 1997.


In January 1993, I began to muse on what my favorite songs, albums and artists were, and I realized that I had a seemingly straightforward quantitative way to answer the question: tally up all of the times a track had appeared on a mix cassette.

The logic was simple. The more I liked a track, the more times I would have recorded it onto a mix cassette. And the more tracks from an album I had recorded onto a mix cassette, and the more slots those tracks occupied, the more I liked the album. And the more tracks appearing, and slots occupied, by an artist, coupled with the number of albums I owned by that artist, the more I liked that artist.


The reality was not quite that simple, but I spent a very entertaining week or two compiling a list of every track appearing on each mix cassette (starting with the Boston Drive mixes, as I had not bothered to bring the previous 56 mix cassettes to Boston with me[2])

The end product was a list of my 100 favorite tracks; a corresponding Top 50 albums and Top 75 artists no longer survives.

However, here is where Steely Dan ranked throughout the 1990s, as this became an annual project:

  • #51 in January 1994 (favorite track “Josie”, favorite album Aja)
  • #50 in Summer 1995 (favorite track “Rikki, Don’t Lose That Number)
  • #58 in Summer 1996 (favorite track “Rikki, Don’t Lose That Number)
  • #42 in Summer 1999 (favorite track “Kings,” favorite album Can’t Buy a Thrill)

In August 1997, I treated myself to a brand new Panasonic turntable, and I began to rediscover my vinyl. One album I pulled out was that copy of Katy Lied my mother had bought 20 years earlier.

I know this because in March 1999, “Doctor Wu,” the last track on Side 1, became the first new Steely Dan track to appear on a cassette mix (Stuff and Such Vol LXVI) in nearly nine years. While I no longer have copies of the charts, I recall that in 2000, “Doctor Wu” became the first Steely Dan track ever to crack my Top 100, somewhere around 85 or so.

I must have continued to play that album side often, because the next new Steely Dan track to appear on a mix was its third track, “Rose Darling” (Stuff and Such Vol LXXXVIII; May 2003).

That fall, inspired by a recent VH1 “Steely Dan Storytellers” performance, I purchased a CD copy of The Royal Scam, my first new Steely Dan record in more than 13 years. In November 2003, “Don’t Take Me Alive” and “Kid Charlemagne” appeared on CD Stuff Vol IV and Vol V, respectively. “The Royal Scam” then appeared on CD Stuff Vol VII in April 2004.

Meanwhile, in late December 2003, having purchased a powerful new computer complete with the latest copies of Microsoft Excel and SPSS (my preferred statistical software package), I embarked upon my most ambitious project yet to determine my favorite tracks, albums and artists. I gathered together the 54 of the original 56 mix cassettes I still had (or still had a record of) and set to work creating a brand new Excel workbook compiling every piece of information I had.

This is how I calculated that, in January 2005, Steely Dan was now my 18th favorite artist overall.

Over the next ten years, I would record an additional seven Steely Dan tracks on various CD mixes: “Black Friday” (March 2006), “Time Out of Mind” (April 2008), “Deacon Blues” and “Hey Nineteen” (December 2008), “Peg” (June 2012), and “Aja” and “Midnite Cruiser” (May 2014).

As of August 206, there were 19 Steely Dan tracks (0.6% of ~3,380[3]) occupying 33 slots (0.5% of 6,338) across the 309 mix cassettes, videos and CDs, well above the artist averages of 3.5 tracks and 6.5 slots. In fact, Steely Dan is in a four-way tie for 16th in tracks and in a three-way tie for 19th in slots.


Since 2005, I have not performed a complete analysis of my favorite tracks, albums and artists, mostly due to a lack of time to regularly update the Excel workbook.

However, there is a quick and dirty way to assess how much I like Steely Dan, relative to other artists, based on the data I do have entered: multiply each artist’s number of mix cassette/video/CD tracks or slots by the number of albums I own by that artist.

A quick word about owning “albums.” A few years ago, I “cleaned” my iTunes data by obtaining, for each track (n=9,500, as of this post), its correct title, the studio album (or single) on which it was released (as well as an image and number of tracks), its position on that album, its year of release, and its “genre.”

For example, having just purchased from iTunes the four tracks on Gaucho I had not already owned, I now own five complete Steely Dan studio albums. I also own two tracks each from Countdown to Ecstasy and Pretzel Logic, as well as the FM title track and the live recording of “Bodhisattva.” That makes an additional six tracks. If we assume that a typical album has 10 tracks that would be an additional 6/10, or 0.6, albums. That means that I “own” 5.6 Steely Dan albums.

Multiplying 19 by 5.6 yields 106.4, which ranks 14th overall, making Steely Dan one of only 15 artists to crack 100 (Genesis, at 2735.4, laps the field). Similarly, multiplying 33 by 5.6 yields 184.8, which ranks 19th overall.

By a variety of measures, then, Steely Dan still ranks somewhere among my top 20 artists.

And that ranking might improve slightly when I next compile lists of my favorite tracks, albums and artists.

Of the 39,499 total track “plays” on iTunes (or my iPod) since I bought my current computer in January 2013, 492 were by a Steely Dan track (n=49). This ranks 4th overall, behind only Genesis, Miles Davis and Stan Ridgway, my top three artists overall. And while all 9,500 tracks have been played an average 4.2 times over this period, the average Steely Dan track has been played 10.0 times, which ranks 11th (among the 327 artists with 10 or more tracks).

This surge in plays is led by five tracks among my 300 most played: “Doctor Wu” (20 plays, tied for #236), “Midnite Cruiser” (29, tied for #93), “Aja” (35, tied for #55), “Hey Nineteen” (48, tied for #26) and “Deacon Blues” (53, #20).

Basically, the more Steely Dan I play, the more I love Steely Dan.

Rest in peace, Mr. Becker.

Until next time…

[1] From these mixes I created a six-cassette “best-of” collection of 136 tracks (including nine “new” tracks) called Boston Drive Vol I-VI. The only Steely Dan song to make the cut was “Rikki, Don’t Lose That Number.”

[2] I relied on my memory to give an album credit for any listed tracks appearing on those original 56 mix cassettes that had not either appeared on Boston Drive or been recorded on a subsequent mix cassette.

[3] I have not yet completed the data entry for the 150 tracks appearing on eight CDs I created in August 2016. However, I estimate that 75 of the 150 tracks are “new.”