Need a Stata function that does spellcheck? Ok, so not as good as spellcheck. The spellcheck function in MS word does a lot of checking for transpositions and probability of misspelling that this won’t do, but it’s definitely less crude than counting the position-specific differences of two strings.
Levenshtein Distance is a metric designed to measure similarity of two strings. Basically, Levenshtein Distance is the minimum number of additions, deletions, or replacements necessary to transform one string into another.
As with many math-related topics, Wikipedia does a pretty good job of explaining the mechanics. Also, here are some implementations in other languages.
Programming a mata function for this would be fairly easy, as would be creating a matrix for each pair of words. This program uses temp variables instead, which can be a bit computationally intensive, but is good for making lots of comparisons simultaneously (after all, Stata is good for vector manipulation).
I wrote this program to do record linkage using names, which can be accomplished by using joinby and then comparing strings of matches.
Here’s the link – enjoy!