I first watched The Cotton Club (Francis Ford Coppola, 1984) as a sophomore in college, under curious circumstances.
That year, I lived with two other men in a converted basement seminar room in Ezra Stiles College. The year before, that room had been occupied by a student we generally referred to as the “Saudi prince” (or was it “sheikh?”); I forget his actual nationality and title. He apparently purchased a great deal of electronic equipment—and by “purchased,” I mean “charged without ever paying”—which he used in secretive solitude.
All that remained when my friends and I moved into the room was the mid-1980’s version of a big screen television. Another classmate lent us her early-model VCR—which made the fact that one of my roommates worked in the Audio-Visual department all the more valuable.
I do not remember how a copy of Café Flesh turned up in our room…but that was quite an education for me (the previews were a hoot), back when adult films were expected to have at least some coherent plot. The film made enough of an impression on me that I purchased the terrific Mitchell Froom soundtrack on vinyl.
But back to The Cotton Club. I recall vaguely enjoying it (it is a beautiful film), even though much of the historical “back story” eluded me. I also remember hearing stories about how its production was more interesting than the movie itself.
I thought little about the film after that until I kept happening upon it on television in the mid-1990s. And when I sat and watched it from start to finish for a second time, I very much enjoyed it. So much so that I bought the excellent John Barry soundtrack, my first tentative foray into jazz (which I now love) and learned more about the historical “back story” I referenced earlier.
Yes, the plot is overly ambitious and convoluted. Yes, it garbles and condenses and rewrites the compelling underworld history of late-1920s/early-1930s New York City (e.g., the film ostensibly ends in 1931 with the slaying of Dutch Schultz—which occurred on October 23, 1935). Yes, it is too long…or too short, depending how interested in the interweaving plot threads one is.
But I now rank it among my 10 or 20 favorite films, recently purchasing a DVD copy when I was unable to watch it on of our streaming services. As it happens, I also have a copy of Café Flesh (on VHS), and I have previously discussed my continuing love another critical non-favorite I recently purchased on DVD, Times Square.
One thing these three films have in common is a middling average score (on a 0-10 scale) on the Internet Movie Database (IMDB): 6.5 for Café Flesh and The Cotton Club and 6.7 for Times Square. For context, in his 2008 video guide, esteemed film critic Leonard Maltin gives The Cotton Club 2.5 stars (out of four) while giving Times Square a rating of “BOMB;” for obvious reasons, he does not include Café Flesh in his guide.
While not the worst-reviewed films ever (hello, Ed Wood!), neither are they among the greatest films ever made. Which begs the question (and setting aside the pornographic nature of Café Flesh) whether they could be characterized as “guilty pleasures.”
Which further begs the question: what makes a pleasure “guilty?”
In this post, I gathered IMDB, RottenTomatoes (RT) and Maltin ratings data to “rank” the 47 Charlie Chan films released between 1926 and 1949. I decided to take the same approach with the larger universe of movies I like (loosely defined as “movies I have seen multiple times, to the best of my recollection”) to see if I could statistically distinguish “guilty pleasure” films (ones I love but to which critics/users respond with “meh”–or worse) from critically-praised films I love (e.g., L.A. Confidential, The Maltese Falcon, numerous films directed by Alfred Hitchcock or Woody Allen—or starring The Marx Brothers), as well as from films SO bad they have become cult classics and/or been parodied on Mystery Science Theater 3000).
To that end I compiled a list of 557 films I am fairly certain I have seen in their entirety twice (or, at least, I have seen all the way through once and large segments of at different times). I excluded the Charlie Chan films discussed in the previous post.
For each film I entered its:
- Year of release (according to IMDB)
- Length in minutes (ditto)
- IMDB score and number of raters
- Tomatometer score (% RT-sanctioned critics deeming film “fresh”), average critic rating (0-10) and number of critics
- Audience Score (% RT users deeming film “fresh”), average user rating (0-5) and number of user raters
- Number of stars assigned by Maltin, with BOMB = 0.
I included year of release and length as a way to distinguish older, shorter films from more recent, longer films. There are six slightly different ways to broadly measure a film’s perceived quality. I included three “number of raters” measures to see if there was a relationship between a film’s perceived quality and the number of viewers willing to take the time to quantify their opinions on-line.
I also divided the films into six broad categories:
- General (64%)
- Film Noir (19%)
- Other Pre-1960 (7%)
- Woody Allen (5%)
- Alfred Hitchcock (3%)
- Marx Brothers (2%)
Arguably, there is overlap between Film Noir (restricted for this analysis to films released between 1940 and 1959) and Alfred Hitchcock…and a few Other Pre-1960 films…but I am comfortable with these general categories.
I have complete data for 515 films. Eight films have no Maltin rating, either because they were released in 2008 or later (Frozen, Night at the Museum 2: Battle of the Smithsonian, The Spirit, Star Trek), are relatively obscure films noir (The Guilty, Night Editor—and the excellent Spanish film Muerte de un Ciclista [Death of a Cyclist]) or…I don’t know why (the charming 1992 film Jersey Girl). The latter four films also have no Tomatometer rating or critic average rating (along with 34 other films, primarily Film Noir); I entered “0” for the number of critic raters. All analyses were performed using Intercooled Stata 9.2.
Some of these variables do not follow a “bell curve” (or “normal”) distribution (Table 1). For example, while the average year of release is 1974, the median year (the value at which half of all values are lower, and half are higher) is 1982. The difference results from a “skew” towards earlier films.
Table 1: Summary statistics for Film Ratings Measures
|Year of Release
|# IMDB Raters
|RottenTomatoes Audience Score
|RottenTomatoes User Rating
|# RottenTomatoes User Raters
*SD=standard deviation, a measure of how tightly values cluster around the mean: the smaller the value, the tighter the clustering. In a normal distribution, 68% of values are within 1 SD, 95% are within 2 SD and 99% are within 3 SD.
Indeed, as Figure 1 shows, the distribution of release year is bimodal, meaning there are two “peaks” in the data: one in 1946-50, reflecting the preponderance of film noir titles among my multiple-viewing films, and one between roughly 1978 and 1999, my prime movie-attendance years (ages 11-33).
Figure 1: The Distribution of Year of Release is Bimodal
See here for the distribution of Length, in minutes
There is also heavy skew to the right (a long “right tail”) in the three “number of raters” measures, with the median consistently lower than the mean. In the most extreme case, while 452 films (81%) had between 29 and 99,999 RT user raters, 13 films had more than 1,000,000 raters, topping out at a staggering 30,984,432 RT user raters for Donnie Darko and 34,296,962 for Spider-Man. Not surprisingly, these three measures are strongly related to each other: the average correlation between them is a moderately high 0.41; the extreme right-skew of these measures is likely lowering the correlations. There is also a modest relationship between year of release, length and number of raters: films have gotten slightly longer over time (correlation [r]=0.25), while more recent films have more raters (mean r=0.22).
Here are the distributions of these variables:
The remaining seven variables were generally normally distributed (means≈medians. Thus, films averaged 103 minutes in length (one hour, 43 minutes), with approximately two-thirds of films (66%) between 88 and 113 minutes long; eight films were more than 2½ hours long, topped by JFK (three hours, nine minutes), It’s a Mad Mad Mad Mad World (three hours, 25 minutes) and The Ten Commandments (three hours, 40 minutes). Not surprisingly, the 33 films between 61 (Dick Tracy, Detective) and 79 minutes long had a mean year of release of 1943.5.
There was reassuring consensus between the ratings, as the means of IMDB score (7.1), critic rating (6.9), RT user rating (3.5 out of 5 = 7.0 out of 10), and Maltin stars (2.8 out of 4 = 7.1 out of 10) all converge around a “good, but not great” 7 out of 10. Moreover, values tended to cluster relatively around the means (i.e., SD<<mean). Thus, 90% of IMDB scores were between 6.1 and 8.3, 80% of critic ratings were between 5.5 and 8.8, 93% of RT user ratings were between 5.8 and 8.4 (adjusted for a 0-10 scale), and 73% of films were assigned between 2½ and 3½ stars by Maltin (6.2-8.8 on a 1-10 scale). Fifty films I have seen more than once were assigned four stars by Maltin, whereas he rated only four of them “BOMB”. The average correlation between the six pairs of ratings is a 0.75, meaning there is broad agreement between IMDB users, critics, RT users and Maltin (though mean correlation jumps to 0.85 without Maltin’s scores).
RT User rating
The story is similar for the Tomatometer and Audience Scores, although the former is skewed by 50 films with a Tomatometer of 100 (Audience Scores top out at 96); both measures have higher medians than means. On average, 77.1% of critics, but just 71.9% of RT users, rate a given film as “fresh.” Fully two-thirds (67%) of Tomatometers are 75 or higher, while a similar percentage of Audience Scores (65%) are between 67 and 94. The correlation between the two measures is 0.72.
Across all six ratings measures (15 pairs of measures), finally, the average correlation is 0.76; without Maltin’s ratings, the average jumps to 0.83 (mean r w/Maltin=0.64).
In general, however, the vast majority of these 557 films fall in a fairly narrow range between “not bad” and “fairly good.” Bear in mind, however, that this is the universe of films I have chosen to see again; this could easily skew all of the ratings values up slightly.
To separate the films into “quality” categories, I used a technique called factor analysis.
Factor analysis groups variables into underlying “dimensions” (or “factors”). We have already seen evidence of two dimensions in these 11 measures: six (IMDB score, Tomatometer, critic rating, Audience Score, RT user rating, Maltin stars) are all fairly highly correlated with each other—and thus with a single dimension we could call “perceived quality,” while the three “numbers of raters” measures (plus year of release and length) are modestly correlated with each other—and thus with a single dimension we could call “public awareness.”
And that is precisely what the factor analysis revealed. Two factors alone accounted for 95% of the total variance in these data, which is remarkably high.
The first factor (71%) was dominated by IMDB Score, Tomatometer, critic rating, Audience Score and RT user rating as well as Maltin stars and year of release. This is clearly “perceived quality.” For each film, I determined how many SD above or below the mean (set to 0) its perceived quality (PQ) is.
Here are the 17 films with PQ>1.5:
|The Maltese Falcon (1941 version)
|To Be or Not To Be (1942 version)
|North by Northwest
|It’s a Wonderful Life
|Kind Hearts and Coronets
|On the Waterfront
|The Cabinet of Dr. Caligari
|The Third Man
Just to reiterate: these are not the best films ever made, nor are these my favorite films (to be honest, I don’t love Sunset Boulevard, and I burned out on It’s a Wonderful Life). They are simply the most highly-rated films I have seen multiple times; Nonetheless, this is a very impressive list of films, of which The Maltese Falcon is easily my favorite, followed by Rear Window.
In fact, on average, these films have an IMDB score of 8.3, a Tomatometer of 98.2 (all≥93; six=100), a critic rating of 9.1, an Audience Score of 93.1 and an RT user rating of 8.4 (on a 0-10 scale); three have 3½ Maltin stars, with the rest having four. These could all be considered “Classic” films, including three silent masterpieces (Metropolis, Caligari, The General), given their average release year of 1944; only Chinatown was released after 1970 (1974). The average length of these films was slightly higher than average (108 minutes).
At the other end of the spectrum—and now we are getting to the heart of the matter—are the 22 films with PQ<-2.0:
|Who’s Harry Crumb?
|The League of Extraordinary Gentlemen
|Once Upon a Crime…
|Young Doctors in Love
|The Marrying Man
|Thank God, It’s Friday
|The Meteor Man
|The Gun in Betty Lou’s Handbag
|Wild Wild West
|The Adventures of Rocky and Bullwinkle
|The Opposite Sex and How to Live With Them
Poor Arye Gross, who starred in two 1993 films—Hexed, The Opposite Sex…—that are two of the three worst-rated of the 515 films with complete data (I suspect The Spirit, from 2008, would also be in this low-rent neighborhood). On average, these films have an IMDB score of 5.3, a Tomatometer of 21.3 (Once Upon a Crime… has the only Tomatometer of 0 in the group), a critic rating of 3.9, an Audience Score of 35.3 and an RT user rating of 5.0 (on a 0-10 scale); the average Maltin stars is 1.6, ranging from BOMB (n=3) to three (Cookie). These are relatively recent films, with an average release year of 1991; only Thank God, It’s Friday was released before 1980 (1978). Perhaps mercifully, these films averaged 98 minutes in length.
The three films closest to the mean of 0 are Murder by Decree, Everything You Always Wanted to Know About Sex *But Were Afraid to Ask and Heaven Can Wait, with PQ of -0.004, -0.004 and 0.004, respectively. All were released in the 1970s, with average scores similar to the overall averages.
As for The Cotton Club and Times Square, they had PQ of -0.68 and -0.85, respectively—definitely in the bottom 25% of films I have seen multiple times.
The second factor (24%), meanwhile, was dominated by critics (factor loading=0.78), IMDB users (0.73), year of release (0.54), length (0.39) and RT users (0.33). This is clearly “public awareness.” For each film, I determined how many SD above or below the mean (set to 0) its public awareness (PA) was. Topping the list, with a whopping 7.1, is The Dark Knight, followed by Batman Begins (4.9) and Spider-Man (4.4)—three blockbuster superhero films from the 2000s. At the other end of the spectrum are four films released between 1935 and 1943: Mad Love (-1.30), Journey Into Fear (-1.30), Room Service (-1.29) and the film I consider the first film noir of the classic era: Stranger on the Third Floor (-1.28).
From the perspective of guilty pleasures, however, this particular dimension is far less interesting than the first one.
Before determining what films are my “guiltiest pleasures,” here are mean PQ values by category:
Given that 11 of the 17 top-rated films are in the Other Pre-1960 category, it is not surprising that these 36 (of 39 overall) films have the highest average PQ, followed by my favorite director, Alfred Hitchcock.
As noted above, I do not necessarily love—or even much like—every one of these 557 films; some I saw multiple times when I was young (e.g., The Apple Dumpling Gang, Hot Lead and Cold Feet) but barely remember now. And there are films I quite like that are NOT on this list simply because I have yet to see them a second time (e.g., The Shawshank Redemption, Zodiac, Shutter Island, Watchmen). But those latter films are generally well-rated (e.g., mean IMDB score=8.2), so they are hardly “guilty pleasures.”
And…finally…to discover which of these multiple-viewed films are my “guiltiest pleasures,” here are the films with PQ<-1.00 I would give a 5 (or maybe 4.5, out of 5) on the “how much I like it” scale.
- Thank God, It’s Friday
- Doctor Detroit
- The Shadow
- Radioland Murders
- Legal Eagles
- Mystery Men
- Empire Records
- The Secret of My Success
- Johnny Dangerously
- So I Married an Axe Murderer
Each of these films are in the General category and were released during my prime movie-attendance years (1978-99), with a mean release year of 1989; I did not actually first view Thank God, It’s Friday and Empire Records until the last five or so years. They average 101 minutes in length, only slightly shorter than average. Their mean IMDB, critic and RT user ratings (on a 0-10 scale) are 6.0, 4.9 and 6.0, respectively, suggesting they are relatively more popular with the broader movie-watching public than with critics; this is echoed by having an average of only 1.8 stars from Maltin (median=2). By the same token, the average Audience Score for these 11 films (51) is higher than their average Tomatometer (43). Finally, they are far less well-known (or, at least, have fewer viewers willing to rate them online, even anonymously), averaging 19,604 IMDB raters (median=12,292), 28 critics (median=17; Mystery Men had 103) and 55,464 RT users (median=9,198).
As I hypothesized, while these films are certainly of less perceived quality compared to the other 546 films I have seen multiple times, objectively they tend to fall in the middle of the “quality” spectrum, or even a hair above it–neither truly excellent nor truly awful.
They are mostly just…meh, according to the larger universe of film critics and casual fans, with the latter being just a bit more accepting of these films than the former.
And all I will say in defense of these films is that there is a fascinating temporal intersection in Thank God, It’s Friday when the late Donna Summer (near the height of her career), a pre-fame Debra Winger and a pre-Berlin Terri Nunn are all looking into the same bathroom mirror.
Finally, to come full circle: The Cotton Club and Times Square rank as “only” my 13th and 16th guiltiest film pleasures, respectively, using this very subjective (and subject to change) method. Still, that puts them in…good?…company.
Until next time…
 I expect to revisit this film in more detail in a later post, but for now I will simply say the film revolves around the legendary Harlem night club—owned by powerful bootlegger and fixer “Owney” Madden—between 1928 and 1931, when “Duke” Ellington, then Cab Calloway, directed the house band. A key subplot revolves around Arthur Flegenheimer (aka Dutch Schultz) and his violent takeover of the Harlem numbers rackets.
 The film follows two sets of brothers in conflict with each other—one white, one black—with one of the white brothers being close friends with one of the black brothers, while each of those two friends has a love affair blocked by external forces. The parallels are fascinating and complex—but they are only part of the overall storyline.
 Maltin, Leonard ed. 2008. Leonard Maltin’s Movie & Video Guide: 2008 Edition. New York, NY: New American Library.
 Despite my ambivalence about Allen as a human being, I still love many of his films.
 Only Charlie Chan at the Wax Museum has a complete set of RottenTomatoes values.
 For nine older films, I used the rating in the 2003 edition, as Maltin stopped including many older films in later editions.
 As well as date of release, which I do not analyze here.
 Recognizing that these primarily measure a film’s overall “visibility.”
 I could easily have added “starring John Cusack,” “Jerry Lewis,” “David Mamet,” “Star Trek,” “The Pink Panther,” “Batman,” “Coen Brothers.”
 Including What’s New Pussycat.
 StataCorp. 2005. Stata Statistical Software: Release 9. College Station, TX: StataCorp LP.
 A measure of linear association between two variables ranging from -1.00 (every time one increases, the other decreases) to 1.00 (every time on increases, the other decreases).
 That said, Fritz Lang’s 1927 silent masterpiece Metropolis is a full 153 minutes long.
 Besides Times Square, they are Mannequin, The Opposite Sex and How to Live With Them and Thank God It’s Friday.
 Pulp Fiction, Raiders of the Lost Ark, Star Wars Episode IV: A New Hope, The Usual Suspects
 I experimented with cluster analysis, which groups cases instead of variables, but found little of interest.
 Principal factors, with an orthogonal varimax rotation, forced to two factors.
 Each had a “factor loading” (essentially, correlation with the “underlying dimension”) ≥0.87. The factor loadings for Maltin stars and year of release were 0.72 and -0.52, respectively.
 Using the “Predict” command in Stata. In essence, it converts each variable to a “z-score” (mean=0, SD=1), recalculates the factor loadings, then sums each value weighted by the factor loadings.
 To Be or Not to Be, Kind Hearts and Coronets, The Cabinet of Dr. Caligari.