Here is a simple twitter-bot I created for fun. Every 5 minutes it searches for an instance of “I could care less” and replaces it with “I think you mean you couldn’t care less.” The bot itself is periodically down because it’s technically spam (even if it’s funny spam). Here’s the code, which requires the python-twitter library, and there’s a stable section on the code & software page for it.
TODAY I LIKE (3/6/12)
- Become a Programmer, Motherfucker
- The Recipes of Punchfork
- This article, which makes some really good points about redistribution
- Factual.com – a start up that is providing a harmonized API for a curated set of data from all over the web
Posted in misc
STATE DISTANCE MATRIX
dESCRIPTION
This is a tool that I created in order to shrink estimates for population statistics in the CPS using weights which decay by distance, but could certainly be used for other purposes. The datasets contain a distance measure by:
1. the minimum number of borders one must cross to enter each other state, and
2. the distance from the center of each state in miles.
LINKS FOR DOWNLOAD:
Source files used to create distance data
A browser view-able version of the code
Details below the jump. . .
Posted in distance measures, geography, stata
LEVENSHTEIN DISTANCE
Need a Stata function that does spellcheck? Ok, so not as good as spellcheck. The spellcheck function in MS word does a lot of checking for transpositions and probability of misspelling that this won’t do, but it’s definitely less crude than counting the position-specific differences of two strings.
Levenshtein Distance is a metric designed to measure similarity of two strings. Basically, Levenshtein Distance is the minimum number of additions, deletions, or replacements necessary to transform one string into another.
As with many math-related topics, Wikipedia does a pretty good job of explaining the mechanics. Also, here are some implementations in other languages.
Programming a mata function for this would be fairly easy, as would be creating a matrix for each pair of words. This program uses temp variables instead, which can be a bit computationally intensive, but is good for making lots of comparisons simultaneously (after all, Stata is good for vector manipulation).
I wrote this program to do record linkage using names, which can be accomplished by using joinby and then comparing strings of matches.
Here’s the link – enjoy!
Posted in distance measures, record linkage, recursion, stata
