This is a question I frequently get asked by new clients. They know there is a Twitter API available to collect tweets, but they have no idea how the results differ from just asking for tweets with Search.Twitter.com. I’ve recently explained the fact that a tweet database lets you create a long-term store that cannot be reproduced or purchased any other way. That is just the starting point. The real advantage of Twitter API programming is the way it allows you to add value to a collection of tweets:
- You can apply quality control rules that let you filter out false positives for the keywords you are using in your collection query.
- I also like to apply simple “filth controls” to all tweet streams that get displayed on sites. This starts with a list of George Carlin’s 7 words you can’t say on television, and grows into a list of the more creative racist and misogynist words so popular on Twitter. Excluding tweets with these words makes Twitter seem much more civilized.
- A simple language detection algorithm will let you tweets for a specific language and exclude all other languages.
- By checking the tweets you receive for spammy words, like free, coupon, buy now, or sale, you can clean out a high percentage of spam tweets, and if you check new tweets for duplicates, you can identify spammers and blacklist them.
- If you screen the user account data for each tweet’s author, you can exclude accounts that have a spammy profile, such as a default avatar, no followers, or an account that has only been in existence a few days.
- Or you can come up with an influence algorithm, such as follower count or frequency of mentions, to select tweets from the most influential users.
These are just the generic ways to add value to a tweet aggregation site. Once you start working with a client with specific application needs, there are many ways to add value to Twitter. This is an iterative process that keeps improving the quality of your tweet collection.
So the simple answer to the question is that Twitter programming produces much higher quality results than Twitter search.