Rating Doctor Who, post “reboot,” episode 2

In a previous post, I posed the following question (edited for brevity):

Which of the 131 Doctor Who episodes, nine Series’ and four Doctors have been the most (and least) admired since Rose first aired on March 26, 2005?

We learned that AI scores, IMDB ratings, and number of IMDB user-raters for each episode reveal:

  • It took time for audiences to warm to the Doctor Who reboot, while audiences were cool to the 12th Doctor until late in Series 9
  • Episodes later in Series 4—topped by the two-part Series ending The Stolen Earth/Journey’s End (AI scores=91)—were among the most admired when first aired, as is true of Series-ending episodes generally
  • Blink has become the most-admired episode ever, with an astonishing 9.8 IMDB rating and 12,881 user-raters (so far)
    • Two other “introduction for non-fans” episodes (The Girl in the Fireplace and Vincent and the Doctor) have also become very well-admired (9.3 each)
    • The 2nd half of Series 2 has three of the least well-admired episodes

AI scores and IMDB ratings were moderately-highly correlated (0.43), as were IMDB ratings and number of user-raters (0.47). Indeed, Doomsday, Silence in the Library/ Forest of the Dead, The Stolen Earth/Journey’s End, The End of Time: Part Two, The Pandorica Opens/The Big Bang, A Good Man Goes to War and The Day of the Doctor remain among the most admired (and oft-rated), while Sleep No More and Love & Monsters are still best forgotten

The next question is: Which episodes have become more admired over time, and which have lost their luster?

One good way to answer this question would be to compare every episode’s AI score to its current IMDB rating (as I did here: ai-score-vs-imdb-rating): if the latter is higher, the episode’s stature has increased, and if the latter is lower, the opposite has occurred.

This is tricky, however, because the two measures have different scales. One good solution is to convert each value to its “z-score.” A z-score is simply how many standard deviations (SD) above (+) or below (-) the average any value is; any measure converted to z-scores has an average of 0 and SD of 1. For example, my favorite episode, A Good Man Goes to War, has an IMDB rating of 9.1. Overall, IMDB ratings have an average of 8.1 and SD of 0.8. Subtracting 8.1 from 9.1, then dividing by 0.8, yields a z-score of 1.2, meaning this IMDB rating is more than one SD higher than the average. For context, in a normal distribution (also called a “bell curve”), 68% of all values are within 1 SD of the average, while 99% are within 3 SD of the average.

But let’s not get too bogged down (blogged down?) in statistical arcana…I can only ask you to bear with me so much.

Figure 1: AI Score vs. IMDB Rating (z-scores), Doctor Who episodes, 2005-16 (n=131)


According to Figure 1 above, 77 episodes have the same relative stature after multiple viewings and commentary as they did when they first aired: 40 episodes remain below-average in admiration (lower left quadrant), while 37 episodes remain above-average (upper right quadrant).

However, 27 episodes that had a below-average AI score now have an above-average IMDB rating (upper left quadrant), topped by Heaven Sent, the middle episode of the three-part Series 9 finale: a full 3.7 unit increase from -1.9 (AI score=80) to 1.9 (IMDB rating=9.6)! Episodes with a similarly-large jump in admiration include Hell Bent (+2.2; 82 to 9.0), the follow-up episode to Heaven Sent, Listen (+2.2; 82 to 9.0), The Girl in the Fireplace (+1.9; 84 to 9.3), The Husbands of River Song (+1.7; 82 to 8.6); The Empty Child (+1.6; 84 to 9.1), The Witch’s Familiar (+1.5; 83 to 8.7), Last Christmas (+1.3; 82 to 8.4) and A Christmas Carol (+1.3; 83 to 8.6). Six of these nine listed episodes are 12th Doctor episodes, and four are from Series 9 and from the Christmas special which aired one month after the Series ended. The 12th Doctor is definitely growing on me, and there is evidence that his 2nd Series, at least, is gaining in stature with many fans.

Two other episodes merit attention for dramatically increasing in stature, while still being below average in IMDB rating. These are the very first post-reboot episodes: Rose (+2.6; 76 to 7.6) and The End of the World (+2.7; 76 to 7.7).

At the other extreme are 27 episodes that went in the opposite direction: from above-average in AI score to below-average in IMDB rating (lower right quadrant), including three with at least a 2-unit decrease in stature: The Lazarus Experiment (-2.1; 86 to 6.8) and The Curse of the Black Spot (both -2.1; 86 to 6.8), and Daleks in Manhattan (-2.0; 87 to 7.2). The first two of these episodes featured the 10th Doctor, as did four other episodes which dropped considerably in stature: Planet of the Dead (-1.8; 88 to 7.6), The Poison Sky (-1.7; 88 to 7.7), as well as Partners in Crime and The Doctor’s Daughter (both -1.6; 88 to 7.8).

Finally, two other episodes dropped from just-below-average in stature to well-below-average: Fear Her (-1.8; 83 to 6.2) and In the Forest of the Night (-1.5; 83 to 6.4). Other than the fact that seven of the 10 episodes named here and in the previous paragraph feature the 10th Doctor, there is no obvious pattern I can discern as to which episodes most declined in stature.

In the final “episode” in this series, I will use these data to determine which of the nine Series’ and four Doctors since the 2005 reboot were—and are—the most- and least-admired.

Until next time…


3 thoughts on “Rating Doctor Who, post “reboot,” episode 2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s