MY BASIC TWITTER BOT

Here is a simple twitter-bot I created for fun. Every 5 minutes it searches for an instance of “I could care less” and replaces it with “I think you mean you couldn’t care less.” The bot itself is periodically down because it’s technically spam (even if it’s funny spam). Here’s the code, which requires the python-twitter library, and there’s a stable section on the code & software page for it.

TODAY I LIKE (3/6/12)

Red velvet cake from Punchfork

  1. Become a Programmer, Motherfucker
  2. The Recipes of Punchfork
  3. This article, which makes some really good points about redistribution
  4. Factual.com – a start up that is providing a harmonized API for a curated set of data from all over the web

STATE DISTANCE MATRIX

dESCRIPTION

This is a tool that I created in order to shrink estimates for population statistics in the CPS using weights which decay by distance, but could certainly be used for other purposes. The datasets contain a distance measure by:

1. the minimum number of borders one must cross to enter each other state, and

2. the distance from the center of each state in miles.

LINKS FOR DOWNLOAD:

Distance files for each state

Source files used to create distance data

A browser view-able version of the code

Details below the jump. . .

Continue reading

LEVENSHTEIN DISTANCE

Need a Stata function that does spellcheck? Ok, so not as good as spellcheck. The spellcheck function in MS word does a lot of checking for transpositions and probability of misspelling that this won’t do, but it’s definitely less crude than counting the position-specific differences of two strings.

Levenshtein Distance is a metric designed to measure similarity of two strings. Basically, Levenshtein Distance is the minimum number of additions, deletions, or replacements necessary to transform one string into another.

As with many math-related topics, Wikipedia does a pretty good job of explaining the mechanics. Also, here are some implementations in other languages.

Programming a mata function for this would be fairly easy, as would be creating a matrix for each pair of words. This program uses temp variables instead, which can be a bit computationally intensive, but is good for making lots of comparisons simultaneously (after all, Stata is good for vector manipulation).

I wrote this program to do record linkage using names, which can be accomplished by using joinby and then comparing strings of matches.

Here’s the link – enjoy!

WELCOME!

. . . to stuartvcraig.com!