Adam Green
Twitter API Consultant
140dev@gmail.com
781-879-2960
@140dev

Delivering the new RSS feed

We are finally ready to put the new Twitter RSS feed together based on the tweets we’ve collected in the database. Let’s review what Twitter normally delivers, and then we can try to improve it. The standard feed for @BarackObama is delivered with this URL: https://api.twitter.com/1/statuses/user_timeline.atom?screen_name=barackobama

And here is what it looks like in Google Reader:

Try loading this URL in your browser and viewing the source to see how the feed is structured. You can see that there is a header element that identifies the feed, and then a series of entry elements for each tweet. We need to create the same structure, and then we can populate it with our own version of the tweet data. I’m sure there are some fancy PHP libraries for generating the XML data we need for this type of output, but I have a much simpler model. I create a set of simple text files that act as templates. Then I insert macros where the data goes. The macros are just field names with brackets, as in [tweet_text]. My PHP code can then read the template and use str_replace() to substitute data for each macro. This makes sense when you see it in action.

I have a template for the header section that has macros for the data, and an [entries] macro that will be used to add a complete list of entry elements.

feed_header.txt

1
2
3
4
5
6
7
8
9
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns:twitter="http://api.twitter.com" xmlns="http://www.w3.org/2005/Atom" xml:lang="en-US" xmlns:georss="http://www.georss.org/georss">
  <title>140dev.com version of [screen_name]'s Twitter feed</title>
  <link type="text/html" rel="alternate" href="http://twitter.com/[screen_name]"/>
  <link type="application/atom+xml" rel="self" href="http://140dev.com/tutorials/twitter_rss_feed/rss_feed.php"/>
  <updated>[created_at]</updated>
  <subtitle>Twitter updates from [name] / [screen_name].</subtitle>
[entries]
</feed>

I also created a template for the entry elements.

feed_entry.txt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
    <entry>
	<title>[title]</title>
      <content type="html"><img src="[image]"/> [tweet_text]</content>
      <id>tag:twitter.com,2007:http://twitter.com/[screen_name]/statuses/[tweet_id]</id>
      <published>[created_at]</published>
      <updated>[created_at]</updated>
      <link type="text/html" rel="alternate" href="http://twitter.com/[screen_name]/statuses/[tweet_id]"/>
      <link type="image/jpeg" href="[image]" rel="image"/>
      <author>
        <name>[name]</name>
        <uri>[url]</uri>
      </author>
      <twitter:source>web</twitter:source>
      <twitter:place/>
    </entry>

Now we can pull everything together and generate a custom feed.

rss_feed.php

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
<?php 
// rss_feed.php

// This user data is hardcoded for this tutorial
// You can also collect this in a database and use that version
$screen_name = 'BarackObama';
$name = 'Barack Obama';
$url = 'http://www.barackobama.com';
$profile_image_url = 'http://a0.twimg.com/profile_images/1735360254/icon_normal.jpg';

// Load the template for the feed header 
$feed_header = file_get_contents('feed_header.txt');

// Fill in the proper values
$feed_header = str_replace('[screen_name]',$screen_name,$feed_header);
$feed_header = str_replace('[created_at]',date('c'),$feed_header);
$feed_header = str_replace('[name]',$name,$feed_header);

// Start the final output string with the filled in header
$feed_output = $feed_header;

// Get the template for a feed entry
$feed_entry_template = file_get_contents('feed_entry.txt');

// Accumulate all the entries in this string
$feed_entries = '';

require_once('db_lib.php');
$oDB = new db;

// Get the most recent 20 tweets for this user
$query = "SELECT tweet_id, created_at, tweet_text
	FROM rss_feed
	WHERE screen_name = '$screen_name'
	ORDER BY created_at DESC
	LIMIT 20";
$result = $oDB->select($query);
while ($row=mysqli_fetch_assoc($result)) {
	$tweet_id = $row['tweet_id'];
	$created_at = $row['created_at'];	
	$tweet_text = $row['tweet_text'];
	
	// Get a fresh copy of the entry template
	$feed_entry = $feed_entry_template;
	
	// Fill in all the values
	$feed_entry = str_replace('[tweet_id]',$tweet_id,$feed_entry);
	$feed_entry = str_replace('[created_at]',date('c',strtotime($created_at)),$feed_entry);	
	$feed_entry = str_replace('[tweet_text]',$tweet_text,$feed_entry);	
	$feed_entry = str_replace('[screen_name]',$screen_name,$feed_entry);
	$feed_entry = str_replace('[name]',$name,$feed_entry);
	$feed_entry = str_replace('[image]',$profile_image_url,$feed_entry);	
	$feed_entry = str_replace('[title]',$name . ": " . date('F, d',strtotime($created_at)),$feed_entry);
	$feed_entry = str_replace('[url]',$url,$feed_entry);
	
	// Add this to the accumlated entry string
	$feed_entries .= $feed_entry . "\n";		
}

// Insert the string of entries into the header template
$feed_output = str_replace('[entries]',$feed_entries,$feed_output);

// Return the feed
print $feed_output;

?>

You can run this script from your browser to see the complete XML of the feed: http://140dev.com/tutorials/twitter_rss_feed/rss_feed_query.php

Your browser will probably show only the data, and not the XML. In that case, you can view the page source to see the actual XML. You can also subscribe to this new feed URL with something like Google Reader.

A major limitation of the Twitter version of the RSS feed is that it only delivers the most recent tweets. It would be much more useful if you could supply a search query, start date, and end date for the tweets you want to deliver. We can finish this tutorial with another version of the rss_feed_php script that can accept these values as URL arguments. Here is the code for this new version call rss_feed_query.php

rss_feed_query.php

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
<?php 
// rss_feed_query.php

$screen_name = 'BarackObama';
$name = 'Barack Obama';
$url = 'http://www.barackobama.com';
$profile_image_url = 'http://a0.twimg.com/profile_images/1735360254/icon_normal.jpg';

$feed_header = file_get_contents('feed_header.txt');
$feed_header = str_replace('[screen_name]',$screen_name,$feed_header);
$feed_header = str_replace('[created_at]',date('c'),$feed_header);
$feed_header = str_replace('[name]',$name,$feed_header);
$feed_output = $feed_header;

$feed_entry_template = file_get_contents('feed_entry.txt');
$feed_entries = '';

require_once('db_lib.php');
$oDB = new db;

// Get the starting date, ending date, and query arguments sent in the URL
// Add any arguments to the SQL query
$start_date = '';
$end_date = '';
$query = '';
$where = '';
if (isset($_GET['start_date'])) {
	// All values passed from outside must be escaped
	// to prevent against SQL injection attacks
	$start_date = $oDB->escape($_GET['start_date']);
	$where .= " AND created_at >= '$start_date' ";
}
if (isset($_GET['end_date'])) {
	$end_date = $oDB->escape($_GET['end_date']);
	$where .= " AND created_at <= '$end_date' ";
}
if (isset($_GET['query'])) {
	$query = $oDB->escape($_GET['query']);
	$where .= " AND tweet_text LIKE '%$query%' ";
}

$query = "SELECT tweet_id, created_at, tweet_text
	FROM rss_feed
	WHERE screen_name = '$screen_name' $where
	ORDER BY created_at DESC
	LIMIT 20";
$result = $oDB->select($query);
while ($row=mysqli_fetch_assoc($result)) {
	$tweet_id = $row['tweet_id'];
	$created_at = $row['created_at'];	
	$tweet_text = $row['tweet_text'];
	$feed_entry = $feed_entry_template;
	$feed_entry = str_replace('[tweet_id]',$tweet_id,$feed_entry);
	$feed_entry = str_replace('[created_at]',date('c',strtotime($created_at)),$feed_entry);	
	$feed_entry = str_replace('[tweet_text]',$tweet_text,$feed_entry);	
	$feed_entry = str_replace('[screen_name]',$screen_name,$feed_entry);
	$feed_entry = str_replace('[name]',$name,$feed_entry);
	$feed_entry = str_replace('[image]',$profile_image_url,$feed_entry);	
	$feed_entry = str_replace('[title]',$name . ": " . date('F, d',strtotime($created_at)),$feed_entry);
	$feed_entry = str_replace('[url]',$url,$feed_entry);
	$feed_entries .= $feed_entry . "\n";		
}
$feed_output = str_replace('[entries]',$feed_entries,$feed_output);

print $feed_output;

?>

To run this new script, you can add query string arguments to the URL. Each argument is optional and the order of the arguments does not matter, so any combination is acceptable. The only rule is that must be followed is that the dates should be in ‘YYYY-MM-DD’ format. Here are some examples. Again, your browser may require you to view the source of the resulting feed to actually see the XML.

http://140dev.com/tutorials/twitter_rss_feed/rss_feed_query.php?query=michele

http://140dev.com/tutorials/twitter_rss_feed/rss_feed_query.php?query=romney&start_date=2012-5-1

http://140dev.com/tutorials/twitter_rss_feed/rss_feed_query.php?query=romney&start_date=2012-3-1&end_date=2012-4-1

I know this has been a long tutorial, but hopefully you have learned a lot of useful techniques. I think we have satisfied all the original goals. We now have a Twitter RSS feed that can be called as many times as you want from your server with no rate limits. You can completely customize the appearance and data. Since this script generates XML, you can easily add the results to your web page to make it visible for Google SEO. In fact, you could always modify it to deliver HTML and embed that in your pages instead. And best of all, you collect and deliver all of the tweets in an account, not just the most recent 20.

Collecting all new and old tweets for any user

In the last installment of this tutorial you learned how to collect the most recent 100 tweets from a user’s timeline. We are almost ready to start delivering the RSS feed, but first I want to show you how to gather new tweets for the user in the most efficient manner. You could just run the collect_tweets.php script from Part 3 every hour or so with a cronjob, but that is not the best solution. You are forcing the Twitter API to deliver 100 tweets each time when there may be a lot fewer new tweets since you last asked. You also limit yourself to getting just a fixed number of tweets each time. If this account has been very busy, you may miss some new tweets.

The statuses/user_timeline API call has a since_id argument that lets you ask for just the new tweets for a user since you last collected them. If you use that in a loop, you can be guaranteed of getting all the new tweets, while only asking for a small number each time. Remember that the fewer tweets you ask for on each call, the less chance of an API error when Twitter is busy.

The since_id argument uses the tweet_id of the most recent (newest) tweet you already have, and returns any tweets that have a higher tweet_id (newer). It helps to look at actual tweet data. Here are the most recent tweets I have in MySQL after collecting tweets for @BarackObama while writing the last installment.

You can see that the tweet with the latest date (newest) has a tweet id of 202556034546532353. If I use the argument of since_id=202556034546532353 with the statuses_user_timeline call, Twitter will return tweets with higher ids. The next script, collect_new.php, uses this method in a loop to keep asking for newer tweets until there are no more returned. It basically starts with the newest tweet (highest tweet_id) in the database and moves forward in time. You can run this script from the SSH or Telnet command line on your server.

collect_new.php

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
<?php 
// collect_new.php
	
require_once('db_lib.php');
$oDB = new db;

// Get the tweet_id of the most recent tweet
$query = "SELECT max(tweet_id) as max
	FROM rss_feed";
$result = $oDB->select($query);
$row = mysqli_fetch_assoc($result);
$newest_tweet = $row['max'];

while (true) {
	// keep asking for newer tweets until there are no more available
	$newest_tweet = get_tweets($newest_tweet);
}
	
function get_tweets($newest_tweet) {
	global $oDB;
	
	// The since_id argument accepts a tweet_id and returns 
	// tweets with higher ids
	// We can ask for fewer tweets each time with a low count to 
	// help prevent timeout errors with the API
	$url = "https://api.twitter.com/1/statuses/user_timeline.json?" .
		"since_id=$newest_tweet&trim_user=1&include_rts=1&count=20" .
		"&screen_name=barackobama";
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, $url);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
	$response = curl_exec($ch);
	$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
	curl_close($ch);
	
	if ($http_code != 200) {
		exit("Error code: $http_code");
	}
	
	$tweet_count = 0;
	$tweets = json_decode($response,true);
	foreach($tweets as $tweet) {
	    $tweet_id = $tweet['id_str'];
		
		// keep track of the highest tweet_id we receive
		if ($tweet_id > $newest_tweet) {
			$newest_tweet = $tweet_id;
		}
	       
	    if (! $oDB->in_table('rss_feed',"tweet_id=$tweet_id")) {
	    	$created_at = $oDB->date($tweet['created_at']);
	    	$tweet_text = $oDB->escape($tweet['text']);
            $field_values = "tweet_id=$tweet_id,created_at='$created_at',
                 tweet_text='$tweet_text',screen_name='BarackObama'";
            $oDB->insert("rss_feed",$field_values);
            ++$tweet_count;
	    }
	}
	if ($tweet_count==0) {
		// Stop when no more tweets are returned
		exit;
	} else {
		// Pass back the highest tweet_id for the next cycle
		return $newest_tweet;
	}
}

?>

This gave me another 25 tweets starting with ones that were newer than those already in the database. I would typically create a cronjob that runs this script once an hour. That should use only a single API call. If you find that a user tweets more than 20 times an hour, you may want to increase the count value accordingly.

Now let’s try going in the other direction and get the older tweets from @BarackObama. Twitter will let you go back in time and collect up to 3,200 tweets from any user. This is often a fascinating exercise, especially for politicians. The argument for older tweets is called max_id. This doesn’t really make sense, since you give it the tweet_id for the oldest tweet you have, which actually has the lowest id number. It really is the minimum id. Confused? Try reading Twitter’s explanation of this. If this makes sense to you, you may have a future working for the IRS writing tax forms.

Again, I find the best approach is to just look at the actual data. When I collected the first 100 tweets for this account, the last one (oldest) had a tweet_id of 199922116576022530. If I use the argument of max_id=199922116576022530 with statuses/user_timeline, I will get back any tweet with a lower id (older).

The collect_old.php script is pretty much the same as collect_new.php, except that it looks back in time with max_id. It will stop when you have all the tweets found in an account, or you get the most recent 3,200 tweets.

collect_old.php

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
<?php 
// collect_old.php
	
require_once('db_lib.php');
$oDB = new db;
$query = "SELECT min(tweet_id) as min
	FROM rss_feed";
$result = $oDB->select($query);
$row = mysqli_fetch_assoc($result);
$oldest_tweet = $row['min'];

// Loop backwards in time getting older tweets
while (true) {
	$oldest_tweet = get_tweets($oldest_tweet);
}
	
function get_tweets($oldest_tweet) {
	global $oDB;
	
	// The max_id argument will give me any tweets with a lower tweet_id
	// I use a count of 100, because I know there are at 
	// least that many older tweets
	$url = "https://api.twitter.com/1/statuses/user_timeline.json?" .
		"max_id=$oldest_tweet&trim_user=1&include_rts=1&count=100" .
		"&screen_name=barackobama";
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, $url);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
	$response = curl_exec($ch);
	$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
	curl_close($ch);
	
	if ($http_code != 200) {
		exit("Error code: $http_code");
	}
	
	$tweet_count = 0;
	$tweets = json_decode($response,true);
	foreach($tweets as $tweet) {
	    $tweet_id = $tweet['id_str'];
		if ($tweet_id < $oldest_tweet) {
			$oldest_tweet = $tweet_id;
		}
	       
	    if (! $oDB->in_table('rss_feed',"tweet_id=$tweet_id")) {
	    	$created_at = $oDB->date($tweet['created_at']);
	    	$tweet_text = $oDB->escape($tweet['text']);
            $field_values = "tweet_id=$tweet_id,created_at='$created_at',
                tweet_text='$tweet_text',screen_name='BarackObama'";
            $oDB->insert("rss_feed",$field_values);
            ++$tweet_count;
	    }
	}
	if ($tweet_count==0) {
		exit;
	} else {
		return $oldest_tweet;
	}
}

?>

I now have 3,200 tweets from @BarackObama, going back to June, 2010. You only have to run this script once when you start the data collection for a user. Pretty cool stuff here. I especially like the one where he retweeted then Russian President Medvedev’s first tweet. The tweet eulogy of Robert Byrd is also interesting, considering Byrd’s KKK background.

Now we finally have all the tools completed for the database part of this RSS feed system. Tomorrow we’ll see how to create a customized RSS feed from it. For bonus points we will also add the ability to specify the start and ending dates for the tweets, and keywords that may be desired within the tweets.

Update: @jbertrand tweeted the great point that this is also a way to make a backup of your Twitter account. I should have mentioned that.

Inserting a user timeline into the database

Now that we have the database table ready, we have to loop through all the tweets returned by the statuses/user_timeline Twitter API call, and insert them. The database programming will be done with my db_lib.php library. This is a simple PHP class that I use for all of my Twitter API coding. You can review the blog post for it, if you want to see the source code, but you’ll find that using it is very simple.

In part 2 of this tutorial I used the file_get_contents() function to call the Twitter API. That is fine if you only need the data response. It is better practice to also check the HTTP code returned by the API, and for that I use the PHP cURL functions. They let you collect both the header fields, like HTTP code, and the response data. The API returns many possible codes, sometimes in a seemingly random fashion, but the one thing that is certain is that an HTTP code of 200 means that the API call worked.

Another change in this version of calling the statuses/user_timeline API is that I included the count argument to get more than the default 20 tweets. In theory, the count can be as high as 200, but I find that Twitter often times out with an error when I ask for that many. 100 is usually safe, but when Twitter is feeling really stressed, you may have to cut this down to 50. There is a trade-off here with the rate limit. You can either make fewer calls with a higher count, and save on your rate limit, or make more calls and be more sure of them succeeding. I find 100 at a time is a good middle ground.

I also use the include_rts=1 argument on line 6. Without it, this API call will not return any retweets by the requested user. One oddity is that if you don’t ask for retweets with this argument, the API will return an amount of tweets equal to the requested count minus the retweets. For example, if the most recent 100 tweets include 10 retweets, asking for a count of 100 without include_rts=1 will only return 90 tweets.

collect_tweets.php

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
<?php 
// collect_tweets.php

// Get the most recent 100 tweets, including retweets, by this account
$url = "https://api.twitter.com/1/statuses/user_timeline.json?" .
	"trim_user=1&include_rts=1&count=100&screen_name=barackobama";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

// Get the JSON data from the Twitter API
$response = curl_exec($ch);

// Get the HTTP code returned by the API
$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);

// Any code other than 200 means there was an error when calling the API
if ($http_code != 200) {
    exit("Error code: $http_code");
}

// Connect to the database with the rss_feed table
require_once('db_lib.php');
$oDB = new db;

$tweet_count = 0;

// Convert the JSON result into an array
$tweets = json_decode($response,true);

// Loop through the array of tweets
foreach($tweets as $tweet) {
    $tweet_id = $tweet['id_str'];
       
    // Make sure this tweet is not already in the rss_feed table
    if (! $oDB->in_table('rss_feed',"tweet_id=$tweet_id")) {
    	$created_at = $oDB->date($tweet['created_at']);
    	
        // All text values added to the database have to be escaped
        // to hide characters like single and double quotes
        $tweet_text = $oDB->escape($tweet['text']);
        $field_values = "tweet_id=$tweet_id,created_at='$created_at',
            tweet_text='$tweet_text',screen_name='BarackObama'";
			
        // Add this tweet to the rss_feed table
        $oDB->insert("rss_feed",$field_values);
        ++$tweet_count;
    }
}
print 'Tweets inserted: ' . $tweet_count;

?>

You can run this script from the SSH or Telnet command line on your server.
# php collect_tweets.php
Tweets inserted: 100

The inserted tweets can then be viewed in the MySQL database. I use PHPMyAdmin for this type of database query.

Now you can gather a starting set of tweets for use in your RSS feed. In the next part of this series we’ll learn how to collect new tweets on a regular interval with a cronjob, and also how to go back in time and collect up to 3,200 of the past tweets by any account. See you tomorrow.

There seems to be a good amount of interest in the new set of tutorials I’ve started writing, and most of the code I produce interacts with a MySQL database, so I’m going to post the code for my standard database library here. This makes it easy for me to link to this post multiple times, rather than include the source of this library in multiple posts. This is a simplified version of the library included in the 140dev Framework.

The login info for this library is kept in a separate script called db_config.php. For the sample code shown on this blog this configuration file will be kept in the same directory as the db_lib.php script. Security minded programmers will probably want to keep this in a different location on their server, preferably outside the web accessible directories.

db_config.php

1
2
3
4
5
6
7
<?php
// db_config.php
$db_host = 'localhost';
$db_user = 'ENTER USER NAME HERE';
$db_password = 'ENTER USER PASSWORD HERE'; 
$db_name = 'ENTER DATABASE NAME HERE'; 
?> 

The actual library code is written as a PHP class. This allows the code to open a MySQL connection once, and then keep it open for the entire time the script using the library is running. The library contains simple functions for preparing data for insertion, running any SQL query, checking to see if a value already exists in a table, and table insertion and update functions.

db_lib.php

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
<?php
// db_lib.php

class db
{
  public $dbh;

  // Create a database connection for use by all functions in this class
  function __construct() {

    require_once('db_config.php');
    
    if($this->dbh = mysqli_connect($db_host, 
      $db_user, $db_password, $db_name)) { 
	} else {
	  exit('Unable to connect to DB');
    }
	// Set every possible option to utf-8
    mysqli_query($this->dbh, 'SET NAMES "utf8"');
    mysqli_query($this->dbh, 'SET CHARACTER SET "utf8"');
    mysqli_query($this->dbh, 'SET character_set_results = "utf8",' .
        'character_set_client = "utf8", character_set_connection = "utf8",' .
        'character_set_database = "utf8", character_set_server = "utf8"');
  }
  
  // Create a standard data format for insertion of PHP dates into MySQL
  public function date($php_date) {
    return date('Y-m-d H:i:s', strtotime($php_date));	
  }
  
  // All text added to the DB should be cleaned with mysqli_real_escape_string
  // to block attempted SQL insertion exploits
  public function escape($str) {
    return mysqli_real_escape_string($this->dbh,$str);
  }
    
  // Test to see if a specific field value is already in the DB
  // Return false if no, true if yes
  public function in_table($table,$where) {
    $query = 'SELECT * FROM ' . $table . 
      ' WHERE ' . $where;
    $result = mysqli_query($this->dbh,$query);
    return mysqli_num_rows($result) > 0;
  }

  // Perform a generic select and return a pointer to the result
  public function select($query) {
    $result = mysqli_query( $this->dbh, $query );
    return $result;
  }
    
  // Add a row to any table
  public function insert($table,$field_values) {
    $query = 'INSERT INTO ' . $table . ' SET ' . $field_values;
    mysqli_query($this->dbh,$query);
  }
  
  // Update any row that matches a WHERE clause
  public function update($table,$field_values,$where) {
    $query = 'UPDATE ' . $table . ' SET ' . $field_values . 
      ' WHERE ' . $where;
    mysqli_query($this->dbh,$query);
  } 
 
}  
?>

There will be practical examples of using this library throughout the tutorials coming up in the blog. Just to show the simplest example possible, here is a script that makes a database connection and then runs a “SHOW TABLES” MySQL query.

db_lib_demo.php

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<?php 
// db_lib_demo.php

// Create a database connection
require_once('db_lib.php');
$oDB = new db;

// Run a MySQL query
$query = "SHOW TABLES";
$result = $oDB->select($query);

// Retrieve the first row of results as an array
$row = mysqli_fetch_assoc($result);
print_r($row);

?>

This script can be run from the command line of an SSH or Telnet client, however you normally connect to your server.

# php db_lib_demo.php

1
2
3
4
 Array
(
    [Tables_in_140dev_tutorials] => rss_feed
)

Storing tweets for the RSS feed in a database cache

The basic architecture I use for all interactions with the Twitter API is to store the results in a database and then deliver any data my scripts need from that database. I’ve written about the benefits of a database cache before, but the basic idea is that this eliminates the rate limit issues, because a single call to the Twitter API can return data that is then delivered any number of times by my server. A benefit that people don’t often think about is that once the data is on your server, you can keep delivering it even when Twitter is down. I like to promise my clients that their sites will be more reliable then Twitter.com, and usually a lot faster too.

Let’s take a closer look at a single tweet returned by the statuses/user_timeline API call. The API allows you to get data in multiple formats, but JSON is the most flexible and concise. This simple script will display the first tweet from the JSON data as an array:

first_tweet.php

1
2
3
4
5
6
7
8
9
10
<?php
// first_tweet.php

$url = "https://api.twitter.com/1/statuses/user_timeline.json?" . 
	"screen_name=barackobama";
$timeline_data = file_get_contents($url);
$timeline_array = json_decode($timeline_data,true);
print_r($timeline_array[0]);

?>

The output is kind of messy, so here is a cleaned up copy. You can see that along with the tweet you get a full copy of the user profile for the tweet’s author.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
Array ( [created_at] => Mon May 14 17:19:39 +0000 2012 
	[id] => 202085779449724929 
	[id_str] => 202085779449724929 
	[text] => Watch live: President Obama delivers the commencement address at Barnard College in New York. http://t.co/mm4eGrZJ 
	[source] => web 
	[truncated] => 
	[in_reply_to_status_id] => 
	[in_reply_to_status_id_str] => 
	[in_reply_to_user_id] => 
	[in_reply_to_user_id_str] => 
	[in_reply_to_screen_name] => 
	[user] => Array ( [id] => 813286 
		[id_str] => 813286 
		[name] => Barack Obama 
		[screen_name] => BarackObama 
		[location] => Washington, DC 
		[description] => This account is run by #Obama2012 campaign staff. Tweets from the President are signed -bo. 
		[url] => http://www.barackobama.com 
		[protected] => 
		[followers_count] => 15403374 
		[friends_count] => 678280 
		[listed_count] => 169240 
		[created_at] => Mon Mar 05 22:08:25 +0000 2007 
		[favourites_count] => 0 
		[utc_offset] => -18000 
		[time_zone] => Eastern Time (US & Canada) 
		[geo_enabled] => 
		[verified] => 1 
		[statuses_count] => 3915 
		[lang] => en 
		[contributors_enabled] => 1 
		[is_translator] => 
		[profile_background_color] => 77B0DC 
		[profile_background_image_url] => http://a0.twimg.com/profile_background_images/536612853/4.26.12_New_Twitter_Background.jpg 
		[profile_background_image_url_https] => https://si0.twimg.com/profile_background_images/536612853/4.26.12_New_Twitter_Background.jpg 
		[profile_background_tile] => 
		[profile_image_url] => http://a0.twimg.com/profile_images/1735360254/icon_normal.jpg 
		[profile_image_url_https] => https://si0.twimg.com/profile_images/1735360254/icon_normal.jpg 
		[profile_banner_url] => https://si0.twimg.com/brand_banners/BarackObama/1337005387/live 
		[profile_link_color] => 2574AD 
		[profile_sidebar_border_color] => C2E0F6 
		[profile_sidebar_fill_color] => C2E0F6 
		[profile_text_color] => 333333 
		[profile_use_background_image] => 1 
		[show_all_inline_media] => 
		[default_profile] => 
		[default_profile_image] => 
		[following] => 
		[follow_request_sent] => 
		[notifications] => ) 
	[geo] => 
	[coordinates] => 
	[place] => 
	[contributors] => 
	[retweet_count] => 98 
	[favorited] => 
	[retweeted] => [
	possibly_sensitive] => )

The user info is useful when you get a set of tweets from multiple users, but in the case of a timeline, they all come from the same account. We can greatly reduce the bulk of this API response by adding a trim_user argument to the API call (line 5). Try running this to see the difference in the data returned:

first_tweet_trim.php

1
2
3
4
5
6
7
8
9
10
<?php
// first_tweet_trim.php

$url = "https://api.twitter.com/1/statuses/user_timeline.json?" .
	"trim_user=1&screen_name=barackobama";
$timeline_data = file_get_contents($url);
$timeline_array = json_decode($timeline_data,true);
print_r($timeline_array[0]);

?>

Now that we can see the data we have available, we can create a MySQL table called ‘rss_feed’ to store the results. To save space we’ll just have fields for the data we need to deliver in the feed. Here is the SQL for this. I included a screen_name field in case we want to collect tweets from multiple users.

1
2
3
4
5
6
7
8
9
CREATE TABLE IF NOT EXISTS `rss_feed` (
  `created_at` datetime NOT NULL,
  `tweet_id` bigint(20) NOT NULL,
  `tweet_text` varchar(140) NOT NULL,
  `screen_name` varchar(20) NOT NULL,
  KEY `created_at` (`created_at`),
  KEY `tweet_id` (`tweet_id`),
  KEY `screen_name` (`screen_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

In tomorrow’s post we’ll see how to insert all the tweets from the statuses/user_timeline API call into this database table.

Someone recently asked the Twitter developer forum how to find the RSS feed for a Twitter account. This has gotten increasingly difficult as Twitter tightens up access to the API through OAuth and rate limits, so I thought I’d check out the latest status on this issue. Sure enough, there are no links to an RSS feed on the Twitter.com pages. Not even a link rel tag buried in the source of the HTML.

A number of blog posts point to the Twitter API as the solution, and suggest this URL format:
https://api.twitter.com/1/statuses/user_timeline.rss?screen_name=[username]

For example, you can view an RSS feed for @BarackObama with this type of URL. This looks like a legal RSS feed, but when I tried it with Google Reader, it was rejected as unreadable. I was able to overcome this problem by changing the format to Atom:
https://api.twitter.com/1/statuses/user_timeline.atom?screen_name=barackobama

This works, as the following screen shot shows, but it a remarkably stupid feed. The text of the tweet shows up 3 times in a row:

Another problem with this method is the issue of rate limits. Everything you do with the Twitter API needs to take rate limits into account. The docs for the statuses/user_timeline API call say that it is rate limited, and that you can call this 150 times per hour , if you don’t use OAuth authentication. Even with OAuth you can only call this 350 times an hour.

If you want to just read the feed in a feed reader, that may not seem like a problem, but things get more complicated when you want to follow multiple feeds. The rate limit is applied to the IP address of the server making the request. So Google Reader can request all Twitter feeds up to 150 times an hour. You may not be following that many feeds, but this rate limit is applied to the entire server IP, not just you. Google has many, many servers, and who knows how the requests are distributed among them. In general, I think you’ll be safe with this technique, but it is still a damn ugly feed.

Where things really get tricky is when you want to use the RSS feed on your own web pages or blog. Then you have two choices. You can run your own code on your server in a language like PHP that collects the feed every time the page loads and adds it to the page as HTML, or you can use a widget to do that for you in a WordPress blog. In this case, it is your server’s IP that is incurring the rate limit. So unless your site only delivers 150 pages an hour with the contents of any Twitter feed, you are going to get rate limited, and no data will be displayed on your pages.

Another possible solution is to use a Javascript based RSS widget, or write your own JS code to call the Twitter API and display the results. This solves the rate limit issue, because Twitter applies the limit to the IP of the user viewing the page. This means that a user can view up to 150 pages on your site an hour that have Twitter feeds. That should be more than enough. But this method has a serious drawback. If you are hoping for any SEO benefit from the content of the feed being on your pages, you are out of luck. Google can’t read the results of Javascript, so the RSS feed is invisible to it.

The final limitation is that the Twitter API call for statuses/user_timeline has no way to use a search query to filter the results.

Putting all this together, there are a number of ways we can improve on just calling the Twitter API for an account’s RSS feed:

  • Improving the output data, so you don’t just have the same tweet text as the title and contents of each feed item.
  • Overcoming the rate limits, so you can deliver this feed as frequently as you want.
  • Being able to inject the contents of the feed into your pages as HTML, so Google will improve your SEO ranking.
  • Having a way to limit the tweets in the feed to just those that match a search query.

The next few blog posts will address all of these issues. By the time we are done, you will have a singing, dancing RSS feed that is totally under your control. Come back tomorrow for part 2.

Cool new Twitter widget

April 8, 2012

We’ve been working a new site called TwitChimp for a client, and while it’s still in beta, the widget we just finished for it is worth trying out. TwitChimp lets you create lists of up to 100 Twitter accounts and share them either on your own site with a widget, such as the one below, [...]

Read the full article →

Initial impressions of Rackspace’s cloud servers

February 20, 2012

I have a bad habit acquired from my years as a Dot Com CTO. When the time comes to pick a server for a new project, I always overbuy. I’d rather pay a hundred dollars more per month then have a server that can’t take the load. One of the driving forces behind this decision [...]

Read the full article →

Dealing with tweet bursts

January 27, 2012

This week we got crushed by the State of the Union speech. We normally get about 30,000 to 50,000 tweets per day in the 2012twit.com database, and our largest server can handle that without any showing any appreciable load. During the SOTU tweet volume exploded. We got 500,000 tweets in about 4 hours. I was [...]

Read the full article →

Twitter API Tutorial: Identifying the best Twitter accounts for following

January 23, 2012

One of the account building services we perform for clients is building lists of suggested followers. Unless you are a celebrity, the only way to build a large follower list is to follow others. Twitter offers suggestions for follows, but you aren’t given any control over the criteria for selection. I recently built a list [...]

Read the full article →

Secrets of a donated tweet system

January 22, 2012

One of the coolest clients we have now is the Buddy Roemer for President campaign. Buddy may not be the best known candidate, but he does have some unique campaign issues. He’s fighting money in politics, and as part of that mission he is limiting himself to a max of $100 in donations per person. [...]

Read the full article →

Screening a tweet stream for quality control

January 20, 2012

We’ve been working on a college football recruiting site called DirectSnap.com for a couple of months, and the most interesting aspect of the technology behind this site is the quality control algorithm I had to develop. Most of the tweet streams we work on, such as 2012twit.com, are based on collecting tweets for either a [...]

Read the full article →