A wrapper function for the best guess of a spelling mistake
based on the letters, the ordering of those letters, and the potential
for letters to be interchanged. The
Damerau-Levenshtein distance
is used to guide inferences into what word the participant was trying to spell from a dictionary
(see SemNetDictionaries
)
best.guess(word, full.dictionary, dictionary = NULL, tolerance = 1)
Character. A word to get best guess spelling options from dictionary
Character vector.
The dictionary to search for best guesses in.
See SemNetDictionaries
Character.
A dictionary from SemNetDictionaries
for monikers (enhances guessing)
Numeric.
The distance tolerance set for automatic spell-correction purposes.
This function uses the function stringdist
to compute the Damerau-Levenshtein distance, which is used to determine potential best guesses
Unique words (i.e., n = 1) that are within the (distance) tolerance are automatically output as best guess responses. This default is based on Damerau's (1964) proclamation that more than 80% of all human misspellings can be expressed by a single error (e.g., insertion, deletion, substitution, and transposition). If there is more than one word that is within or below the distance tolerance, then these will be provided as potential options.
The recommended and default distance tolerance is tolerance = 1
,
which only spell corrects a word if there is only one word with a DL distance of 1.
The best guess(es) of the word
Damerau, F. J. (1964). A technique for computer detection and correction of spelling errors. Communications of the ACM, 7, 171-176.
# NOT RUN {
# Misspelled "bombay"
best.guess("bomba", full.dictionary = SemNetDictionaries::animals.dictionary)
# }
Run the code above in your browser using DataLab