Learn R Programming

fivethirtyeight (version 0.5.0)

ratings: An Inconvenient Sequel

Description

The raw data behind the story "Al Gore's New Movie Exposes The Big Flaw In Online Movie Ratings" https://fivethirtyeight.com/features/al-gores-new-movie-exposes-the-big-flaw-in-online-movie-ratings/.

Usage

ratings

Arguments

Format

Because of R package size restrictions, only a preview of the first 10 rows of this dataset is included; to obtain the entire dataset (80,053 rows) see Examples below. The preview is a data frame with 10 rows representing movie ratings and 27 variables:

timestamp

The date at which the rating was recorded.

respondents

The number of respondents in a category associated with a given timestamp.

category

The subgroups of respondents differentiated by demographics like gender, age, and nationality.

link

The website associated with a given category's responses.

average

The average rating reported by a given category.

mean

The mean rating reported by a given category.

median

The median rating reported by a given category.

votes_1

The count of votes denoting a rating of one that respondents gave.

votes_2

The count of votes denoting a rating of two that respondents gave.

votes_3

The count of votes denoting a rating of three that respondents gave.

votes_4

The count of votes denoting a rating of four that respondents gave.

votes_5

The count of votes denoting a rating of five that respondents gave.

votes_6

The count of votes denoting a rating of six that respondents gave.

votes_7

The count of votes denoting a rating of seven that respondents gave.

votes_8

The count of votes denoting a rating of eight that respondents gave.

votes_9

The count of votes denoting a rating of nine that respondents gave.

votes_10

The count of votes denoting a rating of ten that respondents gave.

pct_1

The percentage of votes denoting a rating of one that respondents gave.

pct_2

The percentage of votes denoting a rating of two that respondents gave.

pct_3

The percentage of votes denoting a rating of three that respondents gave.

pct_4

The percentage of votes denoting a rating of four that respondents gave.

pct_5

The percentage of votes denoting a rating of five that respondents gave.

pct_6

The percentage of votes denoting a rating of six that respondents gave.

pct_7

The percentage of votes denoting a rating of seven that respondents gave.

pct_8

The percentage of votes denoting a rating of eight that respondents gave.

pct_9

The percentage of votes denoting a rating of nine that respondents gave.

pct_10

The percentage of votes denoting a rating of ten that respondents gave.

Examples

Run this code
# NOT RUN {
# To obtain the entire dataset, run the following code:
library(readr)
library(dplyr)
ratings <- 
  "https://github.com/fivethirtyeight/data/raw/master/inconvenient-sequel/ratings.csv" %>%
  read_csv() %>%
  mutate(category = as.factor(category)) %>% 
  rename(
    votes_1 = `1_votes`, votes_2 = `2_votes`, votes_3 = `3_votes`, 
    votes_4 = `4_votes`, votes_5 = `5_votes`, votes_6 = `6_votes`,
    votes_7 = `7_votes`, votes_8 = `8_votes`, votes_9 = `9_votes`,
    votes_10 = `10_votes`,
    pct_1 = `1_pct`, pct_2 = `2_pct`, pct_3 = `3_pct`, pct_4 = `4_pct`,
    pct_5 = `5_pct`, pct_6 = `6_pct`, pct_7 = `7_pct`, pct_8 = `8_pct`,
    pct_9 = `9_pct`, pct_10 = `10_pct`
  )

# To convert data frame to tidy data (long) format, run:
library(dplyr)
library(tidyr)
library(stringr)
ratings_tidy <- ratings %>%
  gather(votes, count, -c(timestamp, respondents, category, link, average, mean, median)) %>%
  arrange(timestamp)
# }

Run the code above in your browser using DataLab