PNcheck: Spell checking using ternary search trees
Description
Spell checking using TST and Peter Norvig's approach.
Usage
PNcheck(tree, string, useUpper = FALSE)
Arguments
tree
a ternary search tree containing the dictionary terms.
string
the misspelled string to correct.
useUpper
if TRUE, uppercase letters are also used to construct insertions and alterations
of the string. Default is FALSE.
Value
A vector with the corrected words.
Details
The literature on spelling correction claims that around 80% of spelling
errors are an edit distance of 1 from the target.
For a word of length n, there will be n deletions, n-1 transpositions,
36n alterations, and 36(n+1) insertions, for a total of 74n+35 (of which a few
are typically duplicates). PNcheck computes all these variations and search
them in a ternary search tree.
For distance 2 the number of variations becomes (74n+35)^2 which makes PNcheck
3 orders of magnitude more expensive than SDcheck.