This data set is adapted from the 2000 Census (5% sample, person records). It is mainly restricted to programmers and engineers in the Silicon Valley area. (Apparently due to errors, there are some from other ZIP codes.)
There are three versions:
prgeng
, the original data, with categorical variables,
e.g. Occupation, in their original codes
peDumms
, same but with categorical variables
converted to dummies; due to the large number of levels the birth
and PUMA data is not included
peFactors
, same but with categorical variables
converted to factors
The variable codes, e.g. occupational codes, are available from the Census Bureau, at http://www.census.gov/prod/cen2000/doc/pums.pdf. (Short code lists are given in the record layout, but longer ones are in the appendix Code Lists.)
The variables are:
age
, with a U(0,1) variate added for jitter
cit
, citizenship; 1-4 code various categories of
citizens; 5 means noncitizen (including permanent residents)
educ
: 01-09 code no college; 10-12 means some college;
13 is a bachelor's degree, 14 a master's, 15 a professional degree and
16 is a doctorate
occ
, occupation
birth
, place of birth
wageinc
, wage income
wkswrkd
, number of weeks worked
yrentry
, year of entry to the U.S. (0 for natives)
powpuma
, location of work
gender
, 1 for male, 2 for female
data(prgeng)
data(peDumms)
data(peFactors)