This function predicts the gender of a first name given a year or range of
years in which the person was born. The prediction can use one of several
data sets suitable for different time periods or geographical regions. See
the package vignette for suggestions on using this function with multiple
names and for a discussion of which data set is most suitable for your
research question. When using certain methods, the genderdata
data
package is required; you will be prompted to install it if it is not already
available.
gender(
names,
years = c(1932, 2012),
method = c("ssa", "ipums", "napp", "kantrowitz", "genderize", "demo"),
countries = c("United States", "Canada", "United Kingdom", "Denmark", "Iceland",
"Norway", "Sweden")
)
First names as a character vector. Names are case insensitive.
The birth year of the name whose gender is to be predicted. This
argument can be either a single year, a range of years in the form
c(1880, 1900)
. If no value is specified, then for the "ssa"
method it will use the period 1932 to 2012; acceptable years for the SSA
method range from 1880 to 2012, but for years before 1930 the IPUMS method
is probably more accurate. For the "ipums"
method the default range
is the period 1789 to 1930, which is also the range of acceptable years.
For the "napp"
method the default range is the period 1758 to 1910,
which is also the range of acceptable years. If a year or range of years is
specified, then the names will be looked up for that period.
This value determines the data set that is used to predict the
gender of the name. The "ssa"
method looks up names based from the
U.S. Social Security Administration baby name data. (This method is based
on an implementation by Cameron Blevins.) The "ipums"
method looks
up names from the U.S. Census data in the Integrated Public Use Microdata
Series. (This method was contributed by Ben Schmidt.) The "napp"
method uses census microdata from Canada, Great Britain, Denmark,
Iceland, Norway, and Sweden from 1801 to 1910 created by the
North Atlantic Population Project.
The
"kantrowitz"
method uses the Kantrowitz corpus of male and female
names. The "genderize"
method uses the Genderize.io
<https://genderize.io/> API, which is based on "user profiles across
major social networks." The "demo"
method is uses the top 100 names
in the SSA method; it is provided only for demonstration purposes when the
genderdata
package is not installed and it is not suitable for
research purposes.
The countries for which datasets are being used. For the
"ssa"
and "ipums"
methods, the only valid option is
"United States"
which will be assumed if no argument is specified.
For the "napp"
method, you may specify a character vector with any
of the following countries: "Canada"
, "United Kingdom"
,
"Denmark"
, "Iceland"
, "Norway"
, "Sweden"
. For
the "kantrowitz"
and "genderize"
methods, no country should
be specified.
Returns a data frame containing the results of predicting the gender. The exact components of the returned list will depend on the specific method used. They include the following:
The name for which the gender has been predicted.
The proportion of male names for the given range of years.
The proportion of female names for the given range of years.
The
predicted gender based on the proportion of male and female names. Possible
values are "male"
and "female"
for proportions above
0.5
, "either"
for proportions that are exactly 0.5
,
and NA
for combinations of names and years for which a gender cannot
be predicted using the given method.
The lower bound (inclusive) of the year range used for the prediction.
The upper bound (inclusive) of the year range used for the prediction.
# NOT RUN {
gender("madison", method = "demo", years = 1985)
gender("madison", method = "demo", years = c(1900, 1985))
# SSA method
# }
# NOT RUN {
gender("madison", method = "demo", years = c(1900, 1985))
# }
# NOT RUN {
# IPUMS method
# }
# NOT RUN {
gender("madison", method = "ipums", years = 1860)
# }
# NOT RUN {
# NAPP method
# }
# NOT RUN {
gender("madison", method = "napp", countries = c("Sweden", "Denmark"))
# }
Run the code above in your browser using DataLab