The phonics
package for R
is designed to provide a
variety of phonetic indexing algorithms in common and not-so-common
use today. The algorithms generally reduce a string to a symbolic
representation approximating the sound made by pronouncing the
string. They can be used to match names, strings, and as a proxy for
assorted string distance algorithms. The algorithm reduces a string
to a symbolic representation approximating the sound. It can be used
to match names, strings, and as a proxy for assorted string distance
algorithms.
phonics(word, method, clean = TRUE)
string or vector of strings to encode
vector of method names to use
if TRUE
, return NA
for unknown alphabetical characters
Returns a data frame containing the phonetic spellings of the input for each method applied.
The phonics
package for R
is designed to provide a
variety of phonetic indexing algorithms in common and not-so-common
use today. The algorithms generally reduce a string to a symbolic
representation approximating the sound made by pronouncing the
string. They can be used to match names, strings, and as a proxy for
assorted string distance algorithms. The algorithm reduces a string
to a symbolic representation approximating the sound. It can be used
to match names, strings, and as a proxy for assorted string distance
algorithms.
The variable word
is a character string or a vector of
character strings to be encoded.
Different phonetic algorithm are only defined for inputs over the
limited alphabets, Non-alphabetical characters are removed from the
string in a locale-dependent fashion. This strips spaces, hyphens,
and numbers. For inputs outside of its known range, the output is
undefined and NA
is returned and a warning this thrown. If
clean
is FALSE
, phonics
attempts to process the
strings. The default is TRUE
.
The method
parameter should be a character vector containing one or
more methods that should be used. The available list of methods is
"caverphone", "caverphone.modified", "cologne", "lein", "metaphone",
"nysiis", "nysiis.modified", "onca", "onca.modified", "onca.refined",
"onca.modified.refined", "phonex", "rogerroot", "soundex",
"soundex.refined", and "statcan".
James P. Howard, II, "Phonetic Spelling Algorithm Implementations for R," Journal of Statistical Software, vol. 25, no. 8, (2020), p. 1--21, <10.18637/jss.v095.i08>.
Other phonics:
caverphone()
,
cologne()
,
lein()
,
metaphone()
,
mra_encode()
,
nysiis()
,
onca()
,
phonex()
,
rogerroot()
,
soundex()
,
statcan()
# NOT RUN {
phonics(c("Peter", "Peady"), c("soundex", "soundex.refined"))
# }
Run the code above in your browser using DataLab