Learn R Programming

TSTr (version 1.2)

PNcheck: Spell checking using ternary search trees

Description

Spell checking using TST and Peter Norvig's approach.

Usage

PNcheck(tree, string, useUpper = FALSE)

Arguments

tree
a ternary search tree containing the dictionary terms.
string
the misspelled string to correct.
useUpper
if TRUE, uppercase letters are also used to construct insertions and alterations of the string. Default is FALSE.

Value

A vector with the corrected words.

Details

The literature on spelling correction claims that around 80% of spelling errors are an edit distance of 1 from the target. For a word of length n, there will be n deletions, n-1 transpositions, 36n alterations, and 36(n+1) insertions, for a total of 74n+35 (of which a few are typically duplicates). PNcheck computes all these variations and search them in a ternary search tree. For distance 2 the number of variations becomes (74n+35)^2 which makes PNcheck 3 orders of magnitude more expensive than SDcheck.

See Also

newTree

Examples

Run this code
fruitTree <- newTree(c("Apple", "orange", "lemon"))
PNcheck(fruitTree,"lamon")
PNcheck(fruitTree,"apple", useUpper = TRUE)

Run the code above in your browser using DataLab