Adam Green
Twitter API Consultant
adam@140dev.com
781-879-2960
@140dev

Search API: Are search results filtered for user quality?

by Adam Green on November 29, 2013

in Search API

A continual question on the Twitter developer mailing list is why certain tweets and even entire accounts don’t show up in search results. The standard answer is that the search API filters out tweets that don’t meet a minimum quality threshold. That makes a lot of sense, and should definitely be done, if it returns in better results.

I decided to test the Search API using the quality metrics I would typically apply in my own code to filter out spam accounts: account age, number of followers, and number of tweets. I wrote the following search_quality.php script, and ran it against 100 different query terms. Each execution of the script collected up to 100 tweets and returned the minimum account values found for these quality criteria. What I found was very surprising. For most queries I was able to get back tweets from accounts that were as little as 1 day old, had zero followers, and had sent only 1 or 2 tweets.

The test script uses the tmhOAuth.php OAuth library, as I do in all my code. If you don’t already have a copy of this library, you can download it along with the search_quality.php script. You will need to fill in a set of OAuth tokens to make the API request. You also need to fill in your own query. Try different queries and see what you get.

search_quality.php

<?php
// Connect through OAuth
require('tmhOAuth.php');

// You must fill in a set of valid OAuth keys here
$connection = new tmhOAuth(array(
'consumer_key' => '*****',
'consumer_secret' => '*****',
'user_token' => '*****',
'user_secret' => '*****'
));

// Get up to 100 tweets
// You must fill in the query term
$connection->request('GET', $connection->url('1.1/search/tweets'),
array('q' => '*****',
'result_type' => 'recent',
'count' => 100));

// Extract tweets
$results = json_decode($connection->response['response']);
$tweets = $results->statuses;

if (sizeof($tweets)==0) {
  print "No tweets found for: $query";
  exit;
}

// Loop through all tweets found
$tweets_found = 0;
$min_account_age = account_age($tweets[0]->user->created_at);
$min_followers_count = $tweets[0]->user->followers_count;
$min_statuses_count = $tweets[0]->user->statuses_count;
foreach($tweets as $tweet) {
  ++$tweets_found;

  if ($min_account_age > account_age($tweet->user->created_at)){
    $min_account_age = account_age($tweet->user->created_at);
  }
  if ($min_followers_count > $tweet->user->followers_count) {
    $min_followers_count = $tweet->user->followers_count;
  }
  if ($min_statuses_count > $tweet->user->statuses_count) {
    $min_statuses_count = $tweet->user->statuses_count;
  }
}

print "Tweets found: $tweets_found Minimum account age: $min_account_age " .
"Minimum followers: $min_followers_count Minimum tweets: $min_statuses_count";

// Return number of days since start date
function account_age($start) {
  date_default_timezone_set('America/New_York');
  $end = date('Y-m-d H:i:s',time());
  return round(abs(strtotime($start)-strtotime($end))/86400) + 1;
}

?>

After you run this, tweet your results to me @140dev, and I’ll pass them along to the rest of the 140dev community.

Of course, if you write your own search API code, and collect the results, you can filter out the tweets based on any quality control rules you want. This is one of the ways developers can add value to Twitter API results.

Test the search API to see if poor quality users are filtered out.

Leave a Comment

Previous post:

Next post: