Free Source Code – Twitter Database Server: MySQL Database Schema
Twitter Database Server
Download
Install
Code Architecture
MySQL Database Schema
Phirehose Library
140dev_config.php
db_config.php
db_lib.php
db_test.php
get_tweets.php
monitor_tweets.php
parse_tweets.php
Twitter Display
streaming_framework
As described in the code architecture, the tweets received from the Twitter streaming API are stored in the database in two steps. First the entire payload for each tweet is stored in the json_cache
table by get_tweets.php. No parsing is done at this point, instead all the data for the tweet is inserted into the raw_tweet
field. This ensures that tweets can be processed as quickly as possible. Then parse_tweets.php parses each new tweet and distributes the data across a series of tables: tweets
, tweet_mentions
, tweet_tags
, tweet_urls
, and users
.
I only extract a subset of the user data available in each tweet payload. Some fields, like the user’s account background image, aren’t useful for my programming, so I don’t extract them into the users
table. You can add these fields to the table and the parse_tweets.php code when you install the framework.
In a perfectly normalized schema the tweets
table would only contain the user_id
as a link to the rest of the user data in the users
table. My database design philosophy is to first normalize the data into separate tables, and then break that rule for the sake of performance. In this case, the most common query will be getting everything needed to display a tweet, which includes tweet and user data. To allow that query to be extracted from a single table, the screen_name
, name
, and profile_image_url
values for the tweet’s author are included in each tweet row.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | -- -- mysql_database_schema.sql -- Schema for the 140dev Twitter framework MySQL database server -- CREATE TABLE IF NOT EXISTS `json_cache` ( `tweet_id` bigint(20) unsigned NOT NULL, `cache_id` int(10) unsigned NOT NULL AUTO_INCREMENT, `cache_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `raw_tweet` text CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL, PRIMARY KEY (`cache_id`), KEY `tweet_id` (`tweet_id`), KEY `cache_date` (`cache_date`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ; CREATE TABLE IF NOT EXISTS `tweets` ( `tweet_id` bigint(20) unsigned NOT NULL, `tweet_text` varchar(160) NOT NULL, `created_at` datetime NOT NULL, `geo_lat` decimal(10,5) DEFAULT NULL, `geo_long` decimal(10,5) DEFAULT NULL, `user_id` bigint(20) unsigned NOT NULL, `screen_name` char(20) NOT NULL, `name` varchar(20) DEFAULT NULL, `profile_image_url` varchar(200) DEFAULT NULL, `is_rt` tinyint(1) NOT NULL, PRIMARY KEY (`tweet_id`), KEY `created_at` (`created_at`), KEY `user_id` (`user_id`), KEY `screen_name` (`screen_name`), KEY `name` (`name`), FULLTEXT KEY `tweet_text` (`tweet_text`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; CREATE TABLE IF NOT EXISTS `tweet_mentions` ( `tweet_id` bigint(20) unsigned NOT NULL, `source_user_id` bigint(20) unsigned NOT NULL, `target_user_id` bigint(20) unsigned NOT NULL, KEY `tweet_id` (`tweet_id`), KEY `source` (`source_user_id`), KEY `target` (`target_user_id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; CREATE TABLE IF NOT EXISTS `tweet_tags` ( `tweet_id` bigint(20) unsigned NOT NULL, `tag` varchar(100) NOT NULL, KEY `tweet_id` (`tweet_id`), KEY `tag` (`tag`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; CREATE TABLE IF NOT EXISTS `tweet_urls` ( `tweet_id` bigint(20) unsigned NOT NULL, `url` varchar(140) NOT NULL, KEY `tweet_id` (`tweet_id`), KEY `url` (`url`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; CREATE TABLE IF NOT EXISTS `users` ( `user_id` bigint(20) unsigned NOT NULL, `screen_name` varchar(20) NOT NULL, `name` varchar(20) DEFAULT NULL, `profile_image_url` varchar(200) DEFAULT NULL, `location` varchar(30) DEFAULT NULL, `url` varchar(200) DEFAULT NULL, `description` varchar(200) DEFAULT NULL, `created_at` datetime NOT NULL, `followers_count` int(10) unsigned DEFAULT NULL, `friends_count` int(10) unsigned DEFAULT NULL, `statuses_count` int(10) unsigned DEFAULT NULL, `time_zone` varchar(40) DEFAULT NULL, `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`user_id`), KEY `user_name` (`name`), KEY `last_update` (`last_update`), KEY `screen_name` (`screen_name`), FULLTEXT KEY `description` (`description`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; |