parseSchools( original_name, resolution = 10, map=FALSE )
original_name
denotes an Nx1 vector of
university names read from the Grad Cafe.
resolution
controls the precision required before an original name is
replaced with the best standardized equivalent. Therefore, very low values
(between 0-5) are cautious selections leading to fewer mis-matches, but more
sparse results. Medium range values (8-12) lead to surprisingly accurate
replacements when the mother processing stages fail. One might expect a few
mis-matched name replacements, but the number of errors should be fairly low.
Finally, large values (more than 20) practically guarantee that a school name
which is not in our standard dictionary will be replaced with something. Be
weary of such large selections; the potential for many mis-matched replacements
is high. For the test set, the bulk of the nearest matchs were within 10 units
of the original value. Almost none were larger than 30. The default value is 10.
map
is a variable controlling whether or not the original school names
are included in the data frame returned by brewdata(). If map=TRUE, then the
returned data includes the parsed names as well as the original. The default
value is map=FALSE.
findScorePercentile
, parseResults
,
parseSchools
, translateScore
,
getGradCafeData
, getMaxPages
x = c( "university of california--berkeley","university of california--berkly",
"uc berkeley", "berkeley" )
parseSchools( x )
Run the code above in your browser using DataLab