Need a Stata function that does spellcheck? Ok, so not as good as spellcheck. The spellcheck function in MS word does a lot of checking for transpositions and probability of misspelling that this won’t do, but it’s definitely less crude than counting the position-specific differences of two strings.
Levenshtein Distance is a metric designed to measure similarity of two strings. Basically, Levenshtein Distance is the minimum number of additions, deletions, or replacements necessary to transform one string into another.
Programming a mata function for this would be fairly easy, as would be creating a matrix for each pair of words. This program uses temp variables instead, which can be a bit computationally intensive, but is good for making lots of comparisons simultaneously (after all, Stata is good for vector manipulation).
I wrote this program to do record linkage using names, which can be accomplished by using joinby and then comparing strings of matches.
Here’s the link – enjoy!