Adam Green
Twitter API Consultant
adam@140dev.com
781-879-2960
@140dev

Twitter Consultant Tip: Tweet data is priceless

by Adam Green on May 31, 2012

in Data Mining Tweets,Database Cache,Twitter consultant,Twitter Database Programming

Most of the Twitter consulting I do involves some form of tweet collection and storage in a database. Even when clients approach me with this in mind, they hardly ever realize just how valuable tweet data can be. In fact, it is priceless in the truest sense of the word, because there is no way to buy tweets after they are sent. You either capture them in real-time, or they are gone forever. Anyone who wants to work as a Twitter consultant needs to be able to explain that value added message to potential clients. Here are the key selling points to keep in mind.

The Twitter search API only goes back in time 5 to 6 days, and will only return up to 1,500 tweets for any query. If you want old tweets from the API, that is an absolute limit. The streaming API is much more responsive, and will return up to 1% of the total stream, meaning that you can get up to 3 million tweets a day on any query, but these tweets are returned in real-time, not after the fact. So if you want to get all the tweets for a query, you must set up the streaming API connection before you need the results. Then you must store them in a database for later retrieval.

The Twitter terms of service (TOS) allow you to store tweets for use on your own server, either for display or analysis, but there are strict limitations on reselling this data. You can sell it in discrete data sets as a file, such as a PDF or Excel file, but you cannot resell it as an API or real-time service. This means that if someone has already collected tweets that you need, you are forbidden from buying them as a continuous stream for display on your site. If you haven’t collected them yourself, you can’t have a real-time display of tweets on your site, even if you are willing to pay for them.

But what about Twitter’s data partners, Gnip and Datasift? These sites don’t publicize the limitation on their site, but they are also forbidden by Twitter’s license from selling tweets for display on other sites. The tweets you buy from them may only be used for analysis, such as in a product like Radian 6.

All of this means that once a client has built up a long-term database of tweets, they have a priceless resource. There is no price at which these tweets can be bought and sold for continuous display. That makes a tweet database an incredibly valuable resource, and it means that you have to start collecting tweets and saving them in advance. There is no going back for them.

Once clients understand this, they suddenly become very acquisitive. They can collect all the tweets about politicians, celebrities, athletes, TV shows, etc., and have a iron-clad barrier to entry against any competitor coming along later. That is a valuable selling tool for any Twitter consultant who can do this type of database programming. My free, open source library is a good starting point for this type of coding.

Previous post:

Next post: