Learn R Programming

qdap (version 2.1.1)

name2sex: Names to Gender Prediction

Description

Predict gender from U.S. names (based on 1990 U.S. census data).

Usage

name2sex(names.list, pred.sex = TRUE, fuzzy.match = pred.sex,
  USE.NAMES = FALSE, database = qdapDictionaries::NAMES_SEX, ...)

Arguments

names.list
Character vector containing first names.
pred.sex
logical. If TRUE overlapping M/F names will be predicted based on highest cumulative frequency. If FALSE the overlapping names will be denoted with a "B".
fuzzy.match
logical. If TRUE uses Levenshtein edit distance from agrep to predict gender from the closest name match starting with the same letter. This is computationally intensive and should not b
USE.NAMES
logical. If TRUE names.list is used to name the gender vector.
database
A database of names (mostly for internal purposes).
...
Other arguments passed to check_spelling.

Value

  • Returns a vector of predicted gender (M/F) based on first name.

References

http://www.census.gov/genealogy/www/data/1990surnames/names_files.html http://stackoverflow.com/a/818231/1000343 http://www.talkstats.com/showthread.php/31660

See Also

stringdist

Examples

Run this code
name2sex(qcv(mary, jenn, linda, JAME, GABRIEL, OLIVA,
    tyler, jamie, JAMES, tyrone, cheryl, drew))

name2sex(qcv(mary, jenn, linda, JAME, GABRIEL, OLIVA,
    tyler, jamie, JAMES, tyrone, cheryl, drew), FALSE)

name2sex(qcv(mary, jenn, linda, JAME, GABRIEL, OLIVA,
    tyler, jamie, JAMES, tyrone, cheryl, drew), FALSE, TRUE)

name2sex(qcv(mary, jenn, linda, JAME, GABRIEL, OLIVA,
    tyler, jamie, JAMES, tyrone, cheryl, drew), TRUE, FALSE)

## Get rank percent frequency ratio of being a gender
library(qdapDictionaries)

orig_nms <- qcv(mary, jenn, linda, JAME, GABRIEL, OLIVA,
    tyler, jamie, JAMES, tyrone, cheryl, drew)

sex <- name2sex(orig_nms, FALSE, TRUE)

names(sex) <- rep("", length(sex))
names(sex)[sex == "B"] <- sapply(toupper(orig_nms[sex == "B"]), function(x) {
        y <- NAMES[NAMES[, 1] %in% x, ]
        round(log(Reduce("/", y[ order(y[, "gender"]), "per.freq"])), 2)
    })

## The log ratio of being a female name
sex
orig_nms
data.frame(name = orig_nms, sex = sex, `ratio_F:M` = names(sex),
    check.names=FALSE)

Run the code above in your browser using DataLab