Free Source Code – Twitter Database Server: Install
Twitter Database Server
MySQL Database Schema
Update Notice: The latest version (0.30) of this source code has been released on January 23, 2014. If you need to upgrade an existing installation of this code, you can find change logs in each of the download files.
Before you begin this installation, you might want to read this blog post. It covers the most important rules for running this code.
1. Upload the files to a Web server
The first step is downloading the Twitter database server zip file to your computer and extracting the root
140dev directory. This contains the files for this core module and the directory structure needed for the entire 140dev Twitter framework. You then need to upload the
140dev directory to a publicly accessible Web server. This can be the server where your website is hosted, or any other server. The 140dev code is written with a decoupled architecture, so it can be on a separate server from your website. For now let’s assume you are using your website’s server.
The only requirements for this server are that the MySQL database must be running and configured for localhost access, and PHP must be installed. The instructions here are for *nix servers, but the code should have no problem running on a Windows server as well.
When you upload the files, the top level directory for this code can be given any name, but for simplicity, we’ll assume that you will call it
140dev. Within the
140dev directory you will find a
db directory, which is where the code for this module is found. These installation instructions refer to files within this
2. Create a MySQL database
This code needs a MySQL database to store all the tweet data. While it is possible to add tables to an existing database, it is best to create a new one. The owner of this database must be granted insert and update rights. You can give this database any name, but we will assume that you will call it
With the database in place you need to add the tables used by this code. The
CREATE TABLE commands for these tables are stored in the file
mysql_database_schema.sql in the unzipped files. There are several ways to execute these commands. The method I use is to run phpMyAdmin on the server, open the new database, and use the Import page to load the .sql file. Import wants to use a file on your local computer, so you can point it to the copy of
mysql_database_schema.sql that you extracted from the zip file.
3. Create a Twitter app with a set of OAuth tokens.
Starting with API version 1.1, each streaming API connection will require its own set of OAuth tokens. For simplicity, you can use a single-user set of tokens. If you haven’t worked with OAuth before, this ebook will get you started creating a single-user set of tokens. If you have any problems making an OAuth connection, this list of common problems may be helpful.
You will end up with 4 OAuth tokens. In keeping with tradition, the names of the tokens will be different in the Dev.twitter.com display page and in the Phirehose library code. This table lists the names Twitter displays in the left column and the names you have to use in the define statements used in step 4 below:
consumer_key = TWITTER_CONSUMER_KEY
consumer_secret = TWITTER_CONSUMER_SECRET
access_token = OAUTH_TOKEN
access_token_secret = OAUTH_SECRET
4. Modify the config files
Before you can run this code you must enter the proper values in the config files. You can edit them locally and then upload them, or use an online editor:
- Open db_config.php and fill in the user name, password, and database name for the MySQL database you just created.
- Open 140dev_config.php and fill in the email address for
TWEET_ERROR_ADDRESS. This address will be used to email error messages to you.
- You must also fill in a set of OAuth tokens for use with the streaming API connection. These values need to be placed in the following define statements in 140dev_config.php:
5. Test the installation
Included with this code is a simple test script, db_test.php. If this runs correctly from a browser, you will know that the database was created correctly, and the database configuration options are right.
6. Enter your keywords for tweet collection
This code collects tweets from the Twitter streaming API, which allows you to specify keywords for collection. Any tweet that contains the words will be delivered by the API in real-time. The Twitter docs say that you may track up to 400 keywords by default. if you need a greater level of access, you can apply to Twitter HQ. Each keyword can actually be a multi-word phrase, although the docs say nothing about the maximum length of each phrase.
The keywords for this code are stored in the get_tweets.php script, where they are passed to the Phirehose library. You need to modify the line near the bottom of the script that says
'recipe' with your own list of keywords and phrases. Each keyword and phrase must be entered within single quotes, and separated by commas, since these are elements of an array. For example, the keyword list could be expanded to
7. Run the tweet collection and parsing code in the background
As explained on the code architecture page, tweet collection is done in two steps. First tweets are captured from the Twitter streaming API with get_tweets.php, and then they are parsed into separate database tables by parse_tweets.php. Both of these scripts need to run as continuous background processes.
On *nix servers this is done by logging into the server with a telnet or SSH session, moving to the
/140dev/db directory, and executing these scripts with several added parts:
- To have a script run in the background, you end it with
&. This starts the script and then returns with the script still running.
&gets the script started in the background, but it doesn’t keep it running when you exit the telnet or SSH session. To keep the script running permanently, you precede it with the
nohupcommand. This stands for no hang up, which means don’t stop this when I hang up my connection.
nohupputs all of the script’s output into a file called
nohup.out, which is useful for debugging, but can get pretty large if the script runs for a long time. I kill this output by sending it to
So the entire commands for running get_tweets.php and parse_tweets.php from the db directory are:
nohup php get_tweets.php > /dev/null &
nohup php parse_tweets.php > /dev/null &
Once you run these commands tweets should start flowing into the database. You can look at the tables with phpMyAdmin to see the data accumulating. Depending on the keywords you use for tweet collection, you could gather up to tens of thousands of new tweets a day. Luckily, tweets are small, so the data storage needed isn’t great, but be sure to check the total disk space used from time to time.
You can see if the scripts are running at a later time with the command
ps aux, which will display the active processes. This also gives you the data you need to stop one of these scripts. The process list shows the id for each process at the left side of the list. You can use this id for get_tweets.php or parse_tweets.php with the kill command to stop it with
kill -9 [process id]. To be clear, if a script has an id of 1000, you would enter
kill -9 1000 in your telnet or SSH client to cancel the process.
Make sure that you always kill these scripts before trying to run them again. If you leave a copy running in memory, and then try to start the scripts again, Twitter will get very cranky, and when Twitter gets cranky bad things can happen.
What can go wrong?
There are quite a few moving parts in this code, but from my experience when server code fails, it is almost always a problem with directory paths and file permissions. To help see these types of errors, you can run get_tweets.php and parse_tweets.php with the error messages flowing into
nohup.out. To do that use these commands to start them and then view the contents of
nohup.out in the db directory:
nohup php ./get_tweets.php &
nohup php ./parse_tweets.php &
The other place to look for errors is the log file maintained by the code when making a MySQL database request. All of these errors are placed in
error_log.txt in the
Unfortunately, the Twitter streaming API does fail sometimes. Usually early Saturday morning, just after Twitter HQ has done a big code push on Friday afternoon. If you have clients who get cranky when their apps fail, you can install the monitor_tweets.php script to email you if tweets stop flowing.
Reporting any problems
If you have any problems with this code that you can’t solve, you can report it on the 140dev Google group, or contact Adam Green directly at 781-879-2960.