Movie Analysis, part I

Movies1

I enjoy playing with data from time to time, and I enjoy movies, so I decided to compare how critics rate movies and how the viewers rate them. Take a look at the above chart. First, for you data hounds and methodology freaks, I’ll describe how I compiled it. All the ratings come from Rotten Tomatoes or Flixster, so neither the critics nor the viewers represent the universe of people in either category. Older (i.e. not so computer/social media savvy) people and foreigners, for example, may not be well represented. The movies to be included come from several sources: the ones I’ve rented or rated in Netflix since January 2011, including ones I streamed; nominees for Best Picture at the academy awards 2011 – 2014; all movies that made the top box office listing either in theaters or rentals according to Rotten Tomatoes for the last three months or so; several lists from major publications of the best all-time movies of several genres, e.g. thrillers. This last category is the only one that includes many older films. I may have rented an occasional older film, but the data is skewed toward newer releases. I acquired data on almost 300 movies but then sorted it to take only the 250 that had the most viewer ratings to eliminate some oddball losers, cult films, etc. I wanted movies with sufficient viewer input to be statistically significant. The cutoff number turned out to be 3400 viewer ratings. The number of viewer ratings correlates highly with box office numbers, but there could be some movies that get an inordinate number of ratings from very avid fans despite mediocre box office success, or limited success in the U.S. All used for this chart also had at least 20 critics’ ratings except for three, which had 8, 10, and 13 on that website.

As I understand it, the critics were given only two choices: to “recommend” a movie or not. Viewers could rate on a five-star system, and the website converts that to a single number representing the percentage of viewers who “liked” the movie. That means the percentage who rated it 3.5 stars or higher. These two rating systems are not exactly equivalent, but the percentages of critics who recommend and viewers who “like” a movie are close enough in general to make comparison useful.

The diagonal line in the chart represents equality in percentages between viewers and critics. Each little blue diamond is the rating for a particular movie (or possibly more than one in the case of identical ratings). So any diamond to the left of the line is a movie that viewers rated higher than the critics and those on the right, the critics liked better. The closer to the upper right corner, the more a movie was a favorite in both categories.

So what does this chart show? Not much yet, since I haven’t shown you many movie names or other data, but I threw in a few teasers so you can begin to get an idea of what it could show. I labeled the ratings for a few of the outliers, that is, movies where the viewers and critics differed the most. What characteristics do you think would be most common in those movies to the far left of the line, i.e. fan favorites that critics hated? How about the other way? I’ll give you one that applies to outliers on both sides of the line: low viewership. Generally speaking, those movies were “specialty” movies that were seen and reviewed by relatively few people. This was not always true, however. Memoirs of a Geisha, for example, was a mainstream movie, although not a box-office smash by any means.

Wait! you may say. There’s Temple Grandin up near the corner, quite close to the diagonal line, highly rated by critics and viewers alike, yet it barely made the cutoff in terms of number of viewer ratings. True enough, but there are two reasons for this: first, it’s an older movie (2010) that I happened to rent a couple of years ago, and Rotten Tomatoes wasn’t as widely used back when it came out, and second, it’s a biopic (e.g. biographical in nature) of someone who was not widely known. I’ve found that documentaries and biopics generally got significantly lower viewership numbers than pure fiction movies of all types. Of course there are exceptions. And what about Whiplash, a movie that’s gotten the highest viewer rating of all, and you’ve probably not even heard of it? It’s at the opposite end of the timeline. It’s brand new and I’ve found that when movies first come out, they are generally watched (and rated) by those who are big fans of that particular genre, star, or in some case, series. The viewer ratings are usually high at first, but I can guarantee you that by the time that one comes out on DVD it will be several, maybe many, viewer rating points lower. So don’t put too much credence in the validity of these ratings, especially for brand new releases.

So there are a few hints about what the data can show. In future posts I’ll explore some of the interesting correlations I’ve found. I think they’ll surprise you. As a reward for reading all the way to here, I give you a list of the most popular movies (over 100000 viewer ratings) that were rated the highest by viewers and by critics. You may consider putting them on your list.

  • Top Viewer ratings:
    1. Amélie
    2. Rear Window (1954 version)
    3. American Beauty
    4. Guardians of the Galaxy
    5. (tie) X-Men: Days of Future Past
    5. (tie)The King’s Speech
  • Top Critics’ ratings
    1. Rear Window (1954 version)
    2. Toy Story 3
    3. Gravity
    4. 12 Years a Slave
    5. True Grit (2010 version)