comic_characters: Comic Books Are Still Made By Men, For Men And About Men

Description

The raw data behind the story "Comic Books Are Still Made By Men, For Men And About Men" https://fivethirtyeight.com/features/women-in-comic-books/. An analysis using this data was contributed by Jonathan Bouchet as a package vignette at https://fivethirtyeight-r.netlify.com/articles/comics_gender.html.

Usage

comic_characters

Arguments

Format

Because of R package size restrictions, only a preview of the first 10 rows of this dataset is included; to obtain the entire dataset (23,272 rows) see Examples below. The preview is a data frame with 10 rows representing characters and 16 variables:

publisher: Comic publisher: DC Comics or Marvel
page_id: The unique identifier for that characters page within the wikia
name: The name of the character
urlslug: The unique url within the wikia that takes you to the character
id: The identity status of the character (Secret Identity, Public identity, [on marvel only: No Dual Identity])
align: If the character is Good, Bad or Neutral
eye: Eye color of the character
hair: Hair color of the character
sex: Sex of the character (e.g. Male, Female, etc.)
gsm: If the character is a gender or sexual minority (e.g. Homosexual characters, bisexual characters)
alive: If the character is alive or deceased
appearances: The number of appearances of the character in comic books (as of Sep. 2, 2014. Number will become increasingly out of date as time goes on.)
first_appearance: The month and year of the character's first appearance in a comic book, if available
month: The month of the character's first appearance in a comic book, if available
year: The year of the character's first appearance in a comic book, if available
date: The date of the character's first appearance in a comic book, if available

Examples

Run this code

# NOT RUN {
# To obtain the entire dataset, run the following code:
library(readr)
library(dplyr)
library(tidyr)
library(lubridate)
library(janitor)

# Get DC characters:
comic_characters_dc <- 
  "https://github.com/fivethirtyeight/data/raw/master/comic-characters/dc-wikia-data.csv" %>% 
  read_csv() %>% 
  clean_names() %>% 
  mutate(publisher = "DC")

# Get Marvel characters:
comic_characters_marvel <- 
  "https://github.com/fivethirtyeight/data/raw/master/comic-characters/marvel-wikia-data.csv" %>% 
  read_csv() %>% 
  clean_names() %>% 
  mutate(publisher = "Marvel")

# Merge two dataset and perform further data wrangling:
comic_characters <-
  comic_characters_dc %>% 
  bind_rows(comic_characters_marvel) %>% 
  separate(first_appearance, c("year2", "month"), ", ", remove = FALSE) %>%
  mutate(
    # If month was missing, set as January and day as 01:
    month = ifelse(is.na(month), "01", month),
    day = "01",
    # Note some years missing:
    date = ymd(paste(year, month, day, sep = "-")),
    align = factor(
      align, 
      levels = c("Bad Characters", "Reformed Criminals", "Netural Characters", "Good Characters"),
      ordered = TRUE)
  ) %>%
  select(publisher, everything(), -c(year2, day))
# }

Run the code above in your browser using DataLab