Learn R Programming

qdap (version 1.3.5)

name2sex: Names to Gender Prediction

Description

Predict gender from U.S. names (based on 1990 U.S. census data).

Usage

name2sex(names.list, pred.sex = TRUE, fuzzy.match = pred.sex,
  USE.NAMES = FALSE, database = NAMES_SEX, list.database = NAMES_LIST)

Arguments

names.list
Character vector containing first names.
pred.sex
logical. If TRUE overlapping M/F names will be predicted based on highest cumulative frequency. If FALSE the overlapping names will be denoted with a "B".
fuzzy.match
logical. If TRUE uses Levenshtein edit distance from agrep to predict gender from the closest name match starting with the same letter. This is computationally intensive and sho
USE.NAMES
logical. If TRUE names.list is used to name the gender vector.
database
A database of names (mostly for internal purposes).
list.database
A list version of the database of names broken down by first letter of the name (mostly for internal purposes).

Value

  • Returns a vector of predicted gender (M/F) based on first name.

References

http://www.census.gov/genealogy/www/data/1990surnames/names_files.html http://stackoverflow.com/a/818231/1000343 http://www.talkstats.com/showthread.php/31660

See Also

agrep

Examples

Run this code
name2sex(qcv(mary, jenn, linda, JAME, GABRIEL, OLIVA,
    tyler, jamie, JAMES, tyrone, cheryl, drew))

name2sex(qcv(mary, jenn, linda, JAME, GABRIEL, OLIVA,
    tyler, jamie, JAMES, tyrone, cheryl, drew), FALSE)

name2sex(qcv(mary, jenn, linda, JAME, GABRIEL, OLIVA,
    tyler, jamie, JAMES, tyrone, cheryl, drew), FALSE, TRUE)

name2sex(qcv(mary, jenn, linda, JAME, GABRIEL, OLIVA,
    tyler, jamie, JAMES, tyrone, cheryl, drew), TRUE, FALSE)

## Get rank percent frequency ratio of being a gender
library(qdapDictionaries)

orig_nms <- qcv(mary, jenn, linda, JAME, GABRIEL, OLIVA,
    tyler, jamie, JAMES, tyrone, cheryl, drew)

sex <- name2sex(orig_nms, FALSE, TRUE)

names(sex) <- rep("", length(sex))
names(sex)[sex == "B"] <- sapply(toupper(orig_nms[sex == "B"]), function(x) {
        y <- NAMES[NAMES[, 1] %in% x, ]
        round(log(Reduce("/", y[ order(y[, "gender"]), "per.freq"])), 2)
    })

## The log ratio of being a female name
sex
orig_nms
data.frame(name = orig_nms, sex = sex, `ratio_F:M` = names(sex),
    check.names=FALSE)

Run the code above in your browser using DataLab