<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>140dev &#187; Twitter API Tutorials</title>
	<atom:link href="http://140dev.com/twitter-api-programming-blog/category/tutorials/feed/" rel="self" type="application/rss+xml" />
	<link>http://140dev.com</link>
	<description>Twitter API Programming Tips, Tutorials, Source Code Libraries and Consulting</description>
	<lastBuildDate>Wed, 31 Jul 2019 10:03:15 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.6</generator>
		<item>
		<title>Twitter API Ebook: Javascript Programming for Twitter API 1.1</title>
		<link>http://140dev.com/twitter-api-programming-blog/twitter-api-ebook-javascript-programming/</link>
		<comments>http://140dev.com/twitter-api-programming-blog/twitter-api-ebook-javascript-programming/#comments</comments>
		<pubDate>Wed, 06 Feb 2013 00:17:46 +0000</pubDate>
		<dc:creator>Adam Green</dc:creator>
				<category><![CDATA[Javascript]]></category>
		<category><![CDATA[Twitter API Tutorials]]></category>

		<guid isPermaLink="false">http://140dev.com/?p=2004</guid>
		<description><![CDATA[Javascript programming for Twitter changes dramatically with Twitter API version 1.1. The requirement to use OAuth with every API request means that you can no longer call the API directly from Javascript. Instead you have to rebuild all your Javascript code to proxy your requests through your own server. I know that a lot of [&#8230;]]]></description>
				<content:encoded><![CDATA[<p></p><p><a href="/member"><img class="alignleft" src="/blog_images/ebook_javascript_large.png" alt="" width="200" height="251" /></a> Javascript programming for Twitter changes dramatically with <strong>Twitter API version 1.1</strong>. The requirement to use OAuth with every API request means that you can no longer call the API directly from Javascript. Instead you have to rebuild all your Javascript code to proxy your requests through your own server. </p>
<p>I know that a lot of Twitter programmers don&#8217;t know how to program this way, so I&#8217;ve written a new ebook to explain this coding method. It is available as a <a href="/member">free PDF download on our members page</a>. You can also get the source code for all the examples in the ebook there as well.  </p>
<p>I do my Javascript coding with jQuery on the client side and PHP on the server, so the ebook covers everything you need to know to create Twitter apps using these two languages. </p>
<p>Here are the topics covered:</p>
<ul>
<li>jQuery tutorial</li>
<li>Ajax tutorial for client-server programming</li>
<li>Getting a user timeline with Javascript</li>
<li>Getting the results of a Twitter search with Javascript</li>
<li>Creating a complete Twitter search app with proper tweet display formatting</li>
</ul>
<p>I&#8217;ve also created a new <a href="https://groups.google.com/forum/?fromgroups#!forum/javascript-programming-for-twitter-api">Google Group</a> for questions and discussions of the issues raised by this ebook. </p>
]]></content:encoded>
			<wfw:commentRss>http://140dev.com/twitter-api-programming-blog/twitter-api-ebook-javascript-programming/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Most common Twitter OAuth programming errors</title>
		<link>http://140dev.com/twitter-api-programming-blog/most-common-twitter-oauth-programming-errors/</link>
		<comments>http://140dev.com/twitter-api-programming-blog/most-common-twitter-oauth-programming-errors/#comments</comments>
		<pubDate>Sat, 26 Jan 2013 14:31:19 +0000</pubDate>
		<dc:creator>Adam Green</dc:creator>
				<category><![CDATA[API error]]></category>
		<category><![CDATA[Twitter API Tutorials]]></category>
		<category><![CDATA[Twitter OAuth]]></category>

		<guid isPermaLink="false">http://140dev.com/?p=1992</guid>
		<description><![CDATA[My Twitter OAuth ebook has been out for 2 weeks now, and I&#8217;ve had a chance to help a lot of people get over the hump of running their first OAuth code. I&#8217;ve collected a list of the most common problems they have: No callback URL When you create an app, Twitter has an input [&#8230;]]]></description>
				<content:encoded><![CDATA[<p></p><p>My <a href="http://140dev.com/twitter-api-programming-blog/twitter-api-ebook-single-user-twitter-oauth-programming/">Twitter OAuth ebook</a> has been out for 2 weeks now, and I&#8217;ve had a chance to help a lot of people get over the hump of running their first OAuth code. I&#8217;ve collected a list of the most common problems they have:</p>
<p><strong>No callback URL</strong><br />
When you create an app, Twitter has an input field on the application creation page for filling in a callback URL. The URL is used when you create an OAuth login interface that lets people sign in on your site with Twitter. So if you are doing single-user OAuth, you could reasonably think that you can leave this blank. Twitter encourages this thinking by not requiring you to fill in the field on this form. The notes under the field also imply that you don&#8217;t need it: &#8220;To restrict your application from using callbacks, leave this field blank.&#8221; I&#8217;m not sure what this note means, but I do know that you MUST include a callback URL. If you don&#8217;t, the tmhOAuth library will not be able to make an OAuth connection and none of your API code will work. What URL should you use? It doesn&#8217;t matter, as long as it is valid. You can even use http://twitter.com. </p>
<p><strong>Failure to set read write access</strong><br />
The Settings tab in the app creation page has a set of radio buttons that let you set the access level to read write. For some reason, this option is not displayed when you first create an app. You have to create the app, which is set to read only access by default, and then go to the Settings tab and change the access to read write. If you leave it as read only, you will not be able to tweet, follow, or do anything else with the API that changes an account. </p>
<p><strong>Incorrect server clock</strong><br />
The OAuth system is very sensitive to differences between Twitter server clocks and your server. If your server&#8217;s clock is off by more than 5 or 10 minutes, all your OAuth requests will fail. If you don&#8217;t know how to check or set your server clock, ask your webhost. </p>
<p><strong>Duplicate tweet</strong><br />
Some people have tried running my example post_tweet.php script and got a 403 error. This generally means that you have sent a duplicate tweet. There is a time limit after which duplicate tweets are allowed, but I&#8217;ve never been able to get an answer from Twitter HQ on what it is. If you get a 403 error when posting a tweet with the API, check your timeline to make sure this is not a duplicate of what you have already sent recently. </p>
<p><strong>tmhOAuth files not found</strong><br />
There are only 2 files from the tmhOAuth files that you must use: cacert.pem and tmhOAuth.php. They both MUST be in the same directory and you have to use a valid path when you require or include tmhOAuth.php. </p>
<p><strong>Invalid tmhOAuth files</strong><br />
I include the latest copies of the tmhOAuth files in the zip for the ebook, but some people prefer to download them from their home site at <a href="https://github.com/themattharris/tmhOAuth">https://github.com/themattharris/tmhOAuth</a>. That is fine, but you have to make sure you download clean copies of these files. I worked with someone for quite a while until we figured out that he had downloaded the entire page from Github, including the HTML, when he downloaded them. </p>
<p>I&#8217;m still interested in hearing about any problems you have with the ebook code. I want to make sure this is as clean as possible. <a href="mailto:140dev@gmail.com">Email me</a> if you can&#8217;t get it to work. </p>
]]></content:encoded>
			<wfw:commentRss>http://140dev.com/twitter-api-programming-blog/most-common-twitter-oauth-programming-errors/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New Twitter API Console tool</title>
		<link>http://140dev.com/twitter-api-programming-blog/new-twitter-api-console-tool/</link>
		<comments>http://140dev.com/twitter-api-programming-blog/new-twitter-api-console-tool/#comments</comments>
		<pubDate>Tue, 22 Jan 2013 18:10:35 +0000</pubDate>
		<dc:creator>Adam Green</dc:creator>
				<category><![CDATA[Twitter API Console]]></category>
		<category><![CDATA[Twitter API Tutorials]]></category>
		<category><![CDATA[Twitter OAuth]]></category>

		<guid isPermaLink="false">http://140dev.com/?p=1943</guid>
		<description><![CDATA[My Twitter OAuth ebook closes with the source code for an API Console application. This app got such a favorable response that I decided to enhance it and put it out as a free tool. I have found this to be an invaluable debugging aid when testing an API request. It lets you enter just [&#8230;]]]></description>
				<content:encoded><![CDATA[<p></p><p>My <a href="http://140dev.com/twitter-api-programming-blog/twitter-api-ebook-single-user-twitter-oauth-programming/">Twitter OAuth ebook</a> closes with the source code for an API Console application. This app got such a favorable response that I decided to enhance it and put it out as a <a href="http://140dev.com/twitter-api-console/">free tool</a>. I have found this to be an invaluable debugging aid when testing an API request. It lets you enter just the description of the request and quickly see the complete response without having to write any test code.</p>
<p style="text-align: center;"><a href="http://140dev.com/twitter-api-console"><img class="aligncenter" style="border: 1px solid black;" title="Twitter API console" src="/blog_images/console.png" alt="" width="400" height="448" /></a></p>
<p>The tool is pretty obvious to use, but I also added a <a href="http://140dev.com/download/140dev_api_console_users_guide.pdf">users guide</a> to cover some of the features you might not notice, such as the ability to share the <a href="http://140dev.com/twitter-api-console/?method=GET&amp;url=1.1/show/users&amp;screen_name=justinbieber">URL </a>for an API request. This lets you easily demonstrate a bug in the API to Twitter HQ, or show a fellow developer how to do a specific API task.</p>
]]></content:encoded>
			<wfw:commentRss>http://140dev.com/twitter-api-programming-blog/new-twitter-api-console-tool/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Twitter API Ebook: Single-user Twitter OAuth Programming</title>
		<link>http://140dev.com/twitter-api-programming-blog/twitter-api-ebook-single-user-twitter-oauth-programming/</link>
		<comments>http://140dev.com/twitter-api-programming-blog/twitter-api-ebook-single-user-twitter-oauth-programming/#comments</comments>
		<pubDate>Fri, 11 Jan 2013 22:41:31 +0000</pubDate>
		<dc:creator>Adam Green</dc:creator>
				<category><![CDATA[Twitter API Tutorials]]></category>
		<category><![CDATA[Twitter OAuth]]></category>

		<guid isPermaLink="false">http://140dev.com/?p=1825</guid>
		<description><![CDATA[This free ebook covers everything you need to use OAuth from a single Twitter account. It is available as a free PDF download on our members page. You can also get the source code for all the examples in the ebook there as well. Here are the topics covered: Create your first Twitter application Set [&#8230;]]]></description>
				<content:encoded><![CDATA[<p></p><p><a href="/member"><img class="alignleft" src="/blog_images/oauth_ebook_cover_small.png" alt="" width="200" height="251" /></a> This free ebook covers everything you need to use OAuth from a single Twitter account. It is available as a <a href="/member">free PDF download on our members page</a>. You can also get the source code for all the examples in the ebook there as well.  </p>
<p>Here are the topics covered:</p>
<ul>
<li>Create your first Twitter application</li>
<li>Set up OAuth tokens for single-user access</li>
<li>Connect to the Twitter API with the tmhOAuth library</li>
<li>Posting tweets through the API</li>
<li>Looking up complete account details for any Twitter user</li>
<li>Converting Twitter API docs into working PHP code</li>
<li>Debug and test any API request with your own API console</li>
</ul>
<p>I&#8217;ve also created a new <a href="https://groups.google.com/d/forum/140dev-oauth-discussion">Google Group</a> for questions and discussions of the issues raised by this ebook. </p>
]]></content:encoded>
			<wfw:commentRss>http://140dev.com/twitter-api-programming-blog/twitter-api-ebook-single-user-twitter-oauth-programming/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Twitter docs are improving</title>
		<link>http://140dev.com/twitter-api-programming-blog/twitter-docs-are-improving/</link>
		<comments>http://140dev.com/twitter-api-programming-blog/twitter-docs-are-improving/#comments</comments>
		<pubDate>Sat, 14 Jul 2012 23:22:51 +0000</pubDate>
		<dc:creator>Adam Green</dc:creator>
				<category><![CDATA[Twitter API Tutorials]]></category>
		<category><![CDATA[Twitter documentation]]></category>

		<guid isPermaLink="false">http://140dev.com/?p=1690</guid>
		<description><![CDATA[I&#8217;ve been using the same database schema for recording Twitter user information for a couple of years. I recently agreed to do a training session on Twitter follow programming for the Boston PHP Meetup group, so I decided to check out the Twitter API docs to see if they have gotten any better. I found [&#8230;]]]></description>
				<content:encoded><![CDATA[<p></p><p>I&#8217;ve been using the same database schema for recording Twitter user information for a couple of years. I recently agreed to do a training session on Twitter follow programming for the <a href="http://www.bostonphp.org/events/73073732/">Boston PHP Meetup group</a>, so I decided to check out the Twitter API docs to see if they have gotten any better. I found this really nice &#8220;<a href="https://dev.twitter.com/docs/platform-objects">field guide</a>&#8221; to data objects. The idea that it is a reproduction of the Audubon model of natural history field guides is a little too cutesy for me, but the information provided is still a big improvement over past docs. My only complaint is that many of the string values, such as location and description, don&#8217;t have a maximum length. </p>
]]></content:encoded>
			<wfw:commentRss>http://140dev.com/twitter-api-programming-blog/twitter-docs-are-improving/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Language detection for tweets: Part 4</title>
		<link>http://140dev.com/twitter-api-programming-blog/language-detection-for-tweets-part-4/</link>
		<comments>http://140dev.com/twitter-api-programming-blog/language-detection-for-tweets-part-4/#comments</comments>
		<pubDate>Thu, 24 May 2012 15:50:46 +0000</pubDate>
		<dc:creator>Adam Green</dc:creator>
				<category><![CDATA[Twitter API Tutorials]]></category>
		<category><![CDATA[Twitter Language Detection]]></category>

		<guid isPermaLink="false">http://140dev.com/?p=1472</guid>
		<description><![CDATA[There are two problems you typically want to solve with language detection for tweets. First you need to analyse the types of languages you end up with for a specific set of keywords, and determine the minimum confidence level needed to get a clean result. Then when you have that data, you can process a [&#8230;]]]></description>
				<content:encoded><![CDATA[<p></p><p>There are two problems you typically want to solve with <strong>language detection for tweets</strong>. First you need to analyse the types of languages you end up with for a specific set of keywords, and determine the minimum confidence level needed to get a clean result. Then when you have that data, you can process a tweet stream and pull out just the tweets that meet your goals. This simple language library will address both of these issues. </p>
<p><strong>language_lib.php</strong><br />
<table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
</pre></td><td class="code"><pre>&lt;?php
// language_lib.php

require_once 'Text/LanguageDetect.php';
$oLang = new Text_LanguageDetect();

// Return an array with the language data for any text
function language_info($text) {
 	global $oLang;
	
 	// Split out the key and value of the first array element
	list($language, $confidence) = each($oLang-&gt;detect($text));
	
	// Convert the confidence level to a 2 digit integer for convenience
	$confidence = round($confidence*100,0);
	
	// Get the number of words in this string
	$string = eregi_replace(&quot; +&quot;, &quot; &quot;, $text);
	$array = explode(&quot; &quot;, $string);
	$word_count = sizeof($array);
	
	return array( 'language' =&gt; $language,
		'confidence' =&gt; $confidence, 
		'word_count' =&gt; $word_count); 	
}

// Return 1 if the text meets your requirements, and 0 if not
function is_language($text, $target_language, $min_confidence, $min_words) {
 	global $oLang;
	
	// Get the number of words in this string
	$string = eregi_replace(&quot; +&quot;, &quot; &quot;, $text);
	$array = explode(&quot; &quot;, $string);
	// Exit if there aren't enough words
	if (sizeof($array) &lt; $min_words) {return 0;}
	
 	// Test all the possible languages returned by detect()
	foreach($oLang-&gt;detect($text) as $language =&gt; $confidence) {
		$confidence = round($confidence*100,0);
		
		// We have a good tweet
		if ((strtolower($language) == strtolower($target_language)) &amp;&amp; 
			($confidence &gt;= $min_confidence)) {
				
			return 1;
		}
	}

	// No acceptable languages were found
	return 0;	
}

?&gt;</pre></td></tr></table></p>
<p>Let&#8217;s use the first library function, language_info(), to examine all the tweets in the sample database. In a real application, I would typically store the results in a database, so I could run some queries to find things like the average confidence level and number of words for tweets in different languages. Based on that data, I could build a quality control routine to pick out just the best tweets. For now you can test this idea by <a href="http://140dev.com/tutorials/language_detection/language_detect6.php">running</a> the next script in a browser. </p>
<p><strong><a href="http://140dev.com/tutorials/language_detection/language_detect6.php">language_detect6.php</a></strong><br />
<table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</pre></td><td class="code"><pre>&lt;?php
// language_detect6.php

require_once 'language_lib.php';

// Connect to the database with the sample tweet table
require_once('db_lib.php');
$oDB = new db;

// Loop through the sample tweets
$query = &quot;SELECT tweet_text FROM language&quot;;
$result = $oDB-&gt;select($query);
while ($row=mysqli_fetch_assoc($result)) {

	// Print the detected language info
	$text = $row['tweet_text'];
	print &quot;Text: $text&lt;br/&gt;&quot;;
	print_r( language_info($text));
	print &quot;&lt;br/&gt;&lt;br/&gt;&quot;;
}
?&gt;
</pre></td></tr></table></p>
<p>I won&#8217;t bother including all the results from this script, but you can see from the first few tweets that we now have a way of extracting what we need for all the tweets in a stream.</p>
<p><code>Text: Neugründung von “Deutsche Diabetes-Hilfe – Menschen mit Diabetes” http://t.co/mRRAhvPh<br />
Array ( [language] => german [confidence] => 26 [word_count] => 9 ) </p>
<p>Text: RT @minihex: BMI denkt sich wiedermal:ein bissl rassistischer gehts noch-gesetzesentwurf sieht neue schikanen f asylwerberInnen vor htt ...<br />
Array ( [language] => german [confidence] => 32 [word_count] => 18 ) </p>
<p>Text: @myMONK_de naja, da wäre noch die üble Bronchitis, die ich seit über 2 Wochen habe, aber Magen-Darm ist wenigstens wieder okay endlich<br />
Array ( [language] => german [confidence] => 32 [word_count] => 23 ) </p>
<p>Text: COPD - eine Gefahr für die Lunge nicht nur bei Rauchern: http://t.co/8B2UnJng<br />
Array ( [language] => german [confidence] => 41 [word_count] => 12 ) </code></p>
<p>The next example uses the library&#8217;s is_language() function to select just the tweets for a specific language. In this case, I&#8217;ve tested for English, but the function will work with any of the languages that the Text_LanguageDetect code returns. We saw <a href="http://140dev.com/twitter-api-programming-blog/language-detection-for-tweets-part-3/">yesterday</a> that there is a large number of possible languages. </p>
<p><strong><a href="http://140dev.com/tutorials/language_detection/language_detect7.php">language_detect7.php</a></strong><br />
<table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
</pre></td><td class="code"><pre>&lt;?php
// language_detect7.php

require_once 'language_lib.php';

// Connect to the database with the sample tweet table
require_once('db_lib.php');
$oDB = new db;

// Loop through the sample tweets
$query = &quot;SELECT tweet_text FROM language&quot;;
$result = $oDB-&gt;select($query);
while ($row=mysqli_fetch_assoc($result)) {

	// Print the detected language info
	$text = $row['tweet_text'];
	
	// Only display tweets in English 
	// with a confidence level of at least 30%,
	// and at least 5 words
	if (is_language($text,'english',30,5)) {
		print &quot;Text: $text&lt;br/&gt;&quot;;
		print_r( language_info($text));
		print &quot;&lt;br/&gt;&lt;br/&gt;&quot;;
	}
}
?&gt;</pre></td></tr></table></p>
<p>If you <a href="http://140dev.com/tutorials/language_detection/language_detect7.php">run this script</a> in your browser, you&#8217;ll see just the English tweets that have a confidence level of at least 30% and 5 words or more. I chose to reject tweets that that didn&#8217;t meet the minimum word count, but another option would have been to set the minimum word count to 0, so all English tweets that met the confidence level were displayed. </p>
<p><code>Text: RT @StephenAtHome: A study predicts nearly half of all Americans will be obese by 2030. But with a little American ingenuity I bet we ca ...<br />
Array ( [language] => english [confidence] => 32 [word_count] => 26 ) </p>
<p>Text: I was looking for some weight loss computer support in my area, but there's no low-cal IT in my locality.<br />
Array ( [language] => english [confidence] => 35 [word_count] => 20 ) </code></p>
]]></content:encoded>
			<wfw:commentRss>http://140dev.com/twitter-api-programming-blog/language-detection-for-tweets-part-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Language detection for tweets: Part 3</title>
		<link>http://140dev.com/twitter-api-programming-blog/language-detection-for-tweets-part-3/</link>
		<comments>http://140dev.com/twitter-api-programming-blog/language-detection-for-tweets-part-3/#comments</comments>
		<pubDate>Wed, 23 May 2012 15:02:21 +0000</pubDate>
		<dc:creator>Adam Green</dc:creator>
				<category><![CDATA[Twitter API Tutorials]]></category>
		<category><![CDATA[Twitter Language Detection]]></category>

		<guid isPermaLink="false">http://140dev.com/?p=1465</guid>
		<description><![CDATA[In yesterday&#8217;s installment we learned how to get the most likely language for a tweet with the detectSimple() function. We also discovered that this library sometimes fails when you get down to just 2 or 3 words. The Text_LanguageDetect library has a more advanced function, called detect(), that delivers an array of possible language matches [&#8230;]]]></description>
				<content:encoded><![CDATA[<p></p><p>In <a href="http://140dev.com/twitter-api-programming-blog/language-detection-for-tweets-part-2/">yesterday&#8217;s installment</a> we learned how to get the most likely language for a tweet with the detectSimple() function. We also discovered that this library sometimes fails when you get down to just 2 or 3 words. The Text_LanguageDetect library has a more advanced function, called detect(), that delivers an array of possible language matches and a numeric confidence level for each. The higher the confidence level, the more likely the language is a match. </p>
<p><strong><a href="http://140dev.com/tutorials/language_detection/language_detect4.php">language_detect4.php</a></strong><br />
<table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
</pre></td><td class="code"><pre>&lt;?php
// language_detect4.php

require_once 'Text/LanguageDetect.php';
$oLang = new Text_LanguageDetect();

$long_french = 'qui propose une &eacute;cole maternelle bilingue fran&ccedil;ais';
print &quot;Long French: $long_french&lt;br/&gt;&quot;;
print &quot;Language: &lt;br/&gt;&quot;;
print_r($oLang-&gt;detect($long_french));

$long_english = 'the latest episode of american idol sucks';
print &quot;&lt;br/&gt;&lt;br/&gt;Long English: $long_english&lt;br/&gt;&quot;;
print &quot;Language: &lt;br/&gt;&quot;;
print_r($oLang-&gt;detect($long_english));

?&gt;</pre></td></tr></table></p>
<p>If you <a href="http://140dev.com/tutorials/language_detection/language_detect4.php">run this script</a> in a browser, you will see that there are many possible languages to choose from, in order by confidence level. </p>
<p><code>Long French: qui propose une école maternelle bilingue français<br />
Language:<br />
Array ( [french] => 0.32340136054422 [romanian] => 0.25102040816327 [slovene] => 0.24061224489796 [danish] => 0.23877551020408 [latin] => 0.21857142857143 [italian] => 0.21761904761905 [english] => 0.21040816326531 [norwegian] => 0.20884353741497 [portuguese] => 0.20047619047619 [estonian] => 0.18700680272109 [spanish] => 0.18503401360544 [croatian] => 0.18428571428571 [pidgin] => 0.17292517006803 [slovak] => 0.16809523809524 [dutch] => 0.16224489795918 [czech] => 0.14707482993197 [german] => 0.14544217687075 [tagalog] => 0.14510204081633 [cebuano] => 0.11734693877551 [finnish] => 0.1147619047619 [swedish] => 0.11469387755102 [lithuanian] => 0.11333333333333 [latvian] => 0.10857142857143 [polish] => 0.1069387755102 [swahili] => 0.10551020408163 [turkish] => 0.094149659863946 [hawaiian] => 0.09204081632653 [indonesian] => 0.089727891156463 [albanian] => 0.080544217687075 [hausa] => 0.077142857142857 [azeri] => 0.067074829931973 [hungarian] => 0.052517006802721 [icelandic] => 0.052448979591837 [vietnamese] => 0.051768707482993 [welsh] => 0.051700680272109 [somali] => 0.037142857142857 [bengali] => 0 [mongolian] => 0 ) </p>
<p>Long English: the latest episode of american idol sucks<br />
Language:<br />
Array ( [english] => 0.26414634146341 [pidgin] => 0.20056910569106 [spanish] => 0.17081300813008 [slovak] => 0.16130081300813 [estonian] => 0.15845528455285 [italian] => 0.15471544715447 [welsh] => 0.14829268292683 [latin] => 0.14739837398374 [danish] => 0.14585365853659 [romanian] => 0.14268292682927 [french] => 0.1409756097561 [norwegian] => 0.14048780487805 [dutch] => 0.12666666666667 [portuguese] => 0.12065040650406 [german] => 0.1130081300813 [indonesian] => 0.1079674796748 [slovene] => 0.090487804878049 [swahili] => 0.09 [latvian] => 0.086991869918699 [turkish] => 0.08 [azeri] => 0.079512195121951 [swedish] => 0.075447154471545 [albanian] => 0.07479674796748 [hungarian] => 0.074065040650407 [hawaiian] => 0.072926829268293 [finnish] => 0.07260162601626 [tagalog] => 0.072113821138211 [cebuano] => 0.060894308943089 [hausa] => 0.059105691056911 [croatian] => 0.057967479674797 [lithuanian] => 0.055528455284553 [somali] => 0.053170731707317 [polish] => 0.043170731707317 [czech] => 0.041219512195122 [vietnamese] => 0.040975609756098 [icelandic] => 0.034146341463415 [mongolian] => 0 [bengali] => 0 )</code></p>
<p>Manipulating arrays is sometimes tricky, so here is an extension of this script that delivers the most likely language for a string, along with its confidence level and number of words. </p>
<p><strong><a href="http://140dev.com/tutorials/language_detection/language_detect5.php">language_detect5.php</a></strong><br />
<table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
</pre></td><td class="code"><pre>&lt;?php
// language_detect5.php

require_once 'Text/LanguageDetect.php';
$oLang = new Text_LanguageDetect();

$long_french = 'qui propose   une &eacute;cole maternelle bilingue fran&ccedil;ais';
print &quot;Long French: $long_french&lt;br/&gt;&quot;;
language_info($long_french);

$long_english = 'the latest episode of american idol sucks';
print &quot;&lt;br/&gt;&lt;br/&gt;Long English: $long_english&lt;br/&gt;&quot;;
language_info($long_english);

function language_info($text) {
	global $oLang;
	
	// Split out the key and value of the first array element
	list($language, $confidence) = each($oLang-&gt;detect($text));
	
	// Convert the confidence level to a 2 digit integer for convenience
	$confidence = round($confidence*100,0);
	
	// Get the number of words in this string
	$string = eregi_replace(&quot; +&quot;, &quot; &quot;, $text);
	$array = explode(&quot; &quot;, $string);
	$word_count = sizeof($array);
	
	print &quot;Language: $language&lt;br/&gt;&quot;;
	print &quot;Confidence: $confidence%&lt;br/&gt;&quot;; 
	print &quot;Words: $word_count&lt;br/&gt;&quot;; 	
}

?&gt;</pre></td></tr></table></p>
<p><code>Long French: qui propose une école maternelle bilingue français<br />
Language: french<br />
Confidence: 33%<br />
Words: 7</p>
<p>Long English: the latest episode of american idol sucks<br />
Language: english<br />
Confidence: 26%<br />
Words: 7</code></p>
<p>We now have the basic tools to create a library of language functions that can be used when processing tweets from the Twitter API. Come back tomorrow and we&#8217;ll work out the details of such a library. </p>
]]></content:encoded>
			<wfw:commentRss>http://140dev.com/twitter-api-programming-blog/language-detection-for-tweets-part-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Language detection for tweets: Part 2</title>
		<link>http://140dev.com/twitter-api-programming-blog/language-detection-for-tweets-part-2/</link>
		<comments>http://140dev.com/twitter-api-programming-blog/language-detection-for-tweets-part-2/#comments</comments>
		<pubDate>Tue, 22 May 2012 13:32:11 +0000</pubDate>
		<dc:creator>Adam Green</dc:creator>
				<category><![CDATA[Twitter API Tutorials]]></category>
		<category><![CDATA[Twitter Language Detection]]></category>

		<guid isPermaLink="false">http://140dev.com/?p=1448</guid>
		<description><![CDATA[The docs for the Text_LanguageDetect library say that you need to pass it 4-5 sentences to get an accurate language identification, but as we saw in part 1 of this tutorial, even a single sentence seems to work. This is great, since we will need this to work with tweets that average 5-6 words. So [&#8230;]]]></description>
				<content:encoded><![CDATA[<p></p><p>The docs for the <a href="http://pear.php.net/package/Text_LanguageDetect/docs">Text_LanguageDetect</a> library say that you need to pass it 4-5 sentences to get an accurate language identification, but as we saw in <a href="http://140dev.com/twitter-api-programming-blog/language-detection-for-tweets-part-1/">part 1</a> of this tutorial, even a single sentence seems to work. This is great, since we will need this to work with tweets that average 5-6 words. So how small a string will give you accurate results? It varies with each language, but from my tests you need at least 3-4 words in most languages. </p>
<p>This <a href="http://140dev.com/tutorials/language_detection/language_detect2.php">sample script</a> demonstrates the problem. </p>
<p><strong><a href="http://140dev.com/tutorials/language_detection/language_detect2.php">language_detect2.php</a></strong><br />
<table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
</pre></td><td class="code"><pre>&lt;?php
// language_detect2.php

require_once 'Text/LanguageDetect.php';
$oLang = new Text_LanguageDetect();

$long_french = 'qui propose une &eacute;cole maternelle bilingue fran&ccedil;ais';
print &quot;Long French: $long_french&lt;br/&gt;&quot;;
print &quot;Language: &quot; . $oLang-&gt;detectSimple($long_french) . &quot;&lt;br/&gt;&quot;;

$short_french = '&eacute;cole maternelle';
print &quot;&lt;br/&gt;Short French: $short_french&lt;br/&gt;&quot;;
print &quot;Language: &quot; . $oLang-&gt;detectSimple($short_french) . &quot;&lt;br/&gt;&quot;;

$long_english = 'the latest episode of american idol sucks';
print &quot;&lt;br/&gt;Long English: $long_english&lt;br/&gt;&quot;;
print &quot;Language: &quot; . $oLang-&gt;detectSimple($long_english) . &quot;&lt;br/&gt;&quot;;

$short_english = 'american idol';
print &quot;&lt;br/&gt;Short English: $short_english&lt;br/&gt;&quot;;
print &quot;Language: &quot; . $oLang-&gt;detectSimple($short_english) . &quot;&lt;br/&gt;&quot;;

?&gt;</pre></td></tr></table></p>
<p>Running this example in a browser shows that with just 2 words, the language returned by the library can&#8217;t be trusted.</p>
<p><code>Long French: qui propose une école maternelle bilingue français<br />
Language: french</p>
<p>Short French: école maternelle<br />
Language: danish</p>
<p>Long English: the latest episode of american idol sucks<br />
Language: english</p>
<p>Short English: american idol<br />
Language: welsh</code></p>
<p>I&#8217;ve found that the best way to test the accuracy of this language detection method is to process a sample set of tweets with it, and examine the results for different languages. The next script will do this with a list of 16 tweets I pulled out of a database I built for a firm that consults to drug companies in Europe. They need to collect tweets for different diseases, and separate the results by language. The sample table we&#8217;ll process here has 4 tweets in each of 4 languages. The code uses my standard <a href="http://140dev.com/twitter-api-programming-blog/simple-php-mysql-database-library-source-code/">db_lib.php database library</a> to read the tweets from the database. </p>
<p><strong><a href="http://140dev.com/tutorials/language_detection/language_detect3.php">language_detect3.php</a></strong><br />
<table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</pre></td><td class="code"><pre>&lt;?php
// language_detect3.php

// Get ready to use the language library
require_once 'Text/LanguageDetect.php';
$oLang = new Text_LanguageDetect();

// Connect to the database with the sample tweet table
require_once('db_lib.php');
$oDB = new db;

// Loop through the sample tweets
$query = &quot;SELECT tweet_text FROM language&quot;;
$result = $oDB-&gt;select($query);
while ($row=mysqli_fetch_assoc($result)) {

	// Print the detected language	
	$text = $row['tweet_text'];
	print &quot;Text: $text&lt;br/&gt;&quot;;
	print &quot;Language: &quot; . $oLang-&gt;detectSimple($text) . &quot;&lt;br/&gt;&lt;br/&gt;&quot;;
}
?&gt;</pre></td></tr></table></p>
<p><a href="http://140dev.com/tutorials/language_detection/language_detect3.php">Running this script</a> in a browser shows that with a reasonably long tweet, the language identification is really good, especially for a free library. </p>
<p><code>Text: Neugründung von “Deutsche Diabetes-Hilfe – Menschen mit Diabetes” http://t.co/mRRAhvPh<br />
Language: german</p>
<p>Text: RT @minihex: BMI denkt sich wiedermal:ein bissl rassistischer gehts noch-gesetzesentwurf sieht neue schikanen f asylwerberInnen vor htt ...<br />
Language: german</p>
<p>Text: @myMONK_de naja, da wäre noch die üble Bronchitis, die ich seit über 2 Wochen habe, aber Magen-Darm ist wenigstens wieder okay endlich<br />
Language: german</p>
<p>Text: COPD - eine Gefahr für die Lunge nicht nur bei Rauchern: http://t.co/8B2UnJng<br />
Language: german</p>
<p>Text: Mierda de profesor que no supo explicar nada de las columnas de dominancia ocular y ahora no entiendo nada<br />
Language: spanish</p>
<p>Text: #diabetesla Hace poco puse el enlace a 1foro de diabetes. Se comenta que insulina Lantus provoca depresión.¿Algo de cierto? 10 días con ella<br />
Language: spanish</p>
<p>Text: Queridos padres, tengo casi 16 años, creerme ya he aprendido a vivir con la diabetes, me acompaña desde los 3 años, así que por favor +<br />
Language: spanish</p>
<p>Text: La OMS advierte sobre el aumento de casos de hipertensión y diabetes en el mundo - http://t.co/074JHwBh http://t.co/PKUyPLuG<br />
Language: spanish</p>
<p>Text: Les problemes ou sa fait maigrir ou sa fait grossir. Personnellemnt je suis devenue obese. Fais chié!<br />
Language: french</p>
<p>Text: @Mangeunepomme C'est ce que je compte faire<br />
Language: french</p>
<p>Text: Genre c'est une grosse limite obese et elle fait la meuf genre c'est une salope<br />
Language: french</p>
<p>Text: @GlodieGabrielle hehehehehe. A kelke kilos detr obese, u va mettre ta tente dans une salle de gym de la place<br />
Language: french</p>
<p>Text: Oh shoot looks like I've got hay fever... This is bad :/<br />
Language: english</p>
<p>Text: RT @StephenAtHome: A study predicts nearly half of all Americans will be obese by 2030. But with a little American ingenuity I bet we ca ...<br />
Language: english</p>
<p>Text: Vital Signs: Options for weight loss: In addition to dietitians, counselors and life coaches who can walk you th... http://t.co/hBam9XiF<br />
Language: english</p>
<p>Text: I was looking for some weight loss computer support in my area, but there's no low-cal IT in my locality.<br />
Language: english</code></p>
<p>You can now see how easy it is to get the language for a series of tweets. We&#8217;ll dig deeper tomorrow and learn how to use this library&#8217;s confidence level results. That will let you select only the tweets that have a high chance of being in the language you need. Then later in the week I&#8217;ll create a standard language detection function that you can call whenever you need to process tweets. </p>
]]></content:encoded>
			<wfw:commentRss>http://140dev.com/twitter-api-programming-blog/language-detection-for-tweets-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Language detection for tweets: Part 1</title>
		<link>http://140dev.com/twitter-api-programming-blog/language-detection-for-tweets-part-1/</link>
		<comments>http://140dev.com/twitter-api-programming-blog/language-detection-for-tweets-part-1/#comments</comments>
		<pubDate>Tue, 22 May 2012 01:48:06 +0000</pubDate>
		<dc:creator>Adam Green</dc:creator>
				<category><![CDATA[Twitter API Tutorials]]></category>
		<category><![CDATA[Twitter Language Detection]]></category>

		<guid isPermaLink="false">http://140dev.com/?p=1439</guid>
		<description><![CDATA[One thing I learned early on in building tweet aggregation sites for clients is that they expect to only see tweets in English. After all, Google can do it, why can&#8217;t I? In theory there is a lang=en argument in the search API, but it doesn&#8217;t help much, because it only uses the language setting [&#8230;]]]></description>
				<content:encoded><![CDATA[<p></p><p>One thing I learned early on in building tweet aggregation sites for clients is that they expect to only see tweets in English. After all, Google can do it, why can&#8217;t I? In theory there is a lang=en argument in the search API, but it doesn&#8217;t help much, because it only uses the language setting entered by users in their profile. Since English is the default, and hardly anyone changes it, almost all tweets are labelled as English. I seem to remember the streaming API having a lang argument also, but it isn&#8217;t in the docs now. Either way, I gave up and found my own solution a long time ago. The good thing is that it doesn&#8217;t just work for English. It also does a remarkably good job for over a dozen languages I have tested it for, and claims to do a lot more. Best of all, it is free and open source. </p>
<p>The library I use is called Text_LanguageDetect, and it is available as a Pear module, which makes installation very easy for PHP. You can download the code <a href="http://pear.php.net/package/Text_LanguageDetect/download">here</a>, and get docs <a href="http://pear.php.net/package/Text_LanguageDetect/docs">here</a>. It requires PHP 5.3, and Pear 1.9. You don&#8217;t have to download it and install manually, you can just use the Pear install command:<br />
<code>pear install Text_LanguageDetect-0.3.0</code></p>
<p>Using the library only takes a few lines of code. It is a class, so you have to create an instance of the class, and then you can call its functions.<br />
<code>require_once 'Text/LanguageDetect.php';<br />
$oLang = new Text_LanguageDetect();</code></p>
<p>The simplest function you can call is detectSimple(), which returns the most likely language for the text it is passed. Here is a <a href="http://140dev.com/tutorials/language_detection/language_detect1.php">basic test script</a>. </p>
<p><strong><a href="http://140dev.com/tutorials/language_detection/language_detect1.php">language_detect1.php</a></strong><br />
<table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code"><pre>&lt;?php
// language_detect1.php

require_once 'Text/LanguageDetect.php';
$oLang = new Text_LanguageDetect();

$text = 'La OMS advierte sobre el aumento de casos de hipertensi&oacute;n y diabetes en el mundo';
print &quot;Text: $text&lt;br/&gt;&quot;;
print &quot;Language: &quot; . $oLang-&gt;detectSimple($text);

?&gt;</pre></td></tr></table></p>
<p>Running this script through a browser shows that the language detection library correctly identified the text as Spanish.<br />
<code>Text: La OMS advierte sobre el aumento de casos de hipertensión y diabetes en el mundo<br />
Language: spanish</code></p>
<p>Tomorrow we&#8217;ll dig deeper into this library, and see how to handle tweets that are more borderline as to their language. </p>
]]></content:encoded>
			<wfw:commentRss>http://140dev.com/twitter-api-programming-blog/language-detection-for-tweets-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The simplest Twitter OAuth tutorial possible</title>
		<link>http://140dev.com/twitter-api-programming-blog/the-simplest-twitter-oauth-tutorial-possible/</link>
		<comments>http://140dev.com/twitter-api-programming-blog/the-simplest-twitter-oauth-tutorial-possible/#comments</comments>
		<pubDate>Mon, 15 Nov 2010 01:25:25 +0000</pubDate>
		<dc:creator>Adam Green</dc:creator>
				<category><![CDATA[Automated Tweets]]></category>
		<category><![CDATA[Custom Twitter Client]]></category>
		<category><![CDATA[Twitter API Tutorials]]></category>
		<category><![CDATA[Twitter OAuth]]></category>

		<guid isPermaLink="false">http://140dev.com/?p=994</guid>
		<description><![CDATA[There is something about OAuth that brings out the worst in techies. You can see it when someone asks how to get started with OAuth on the Twitter development talk mailing list. The general response is &#8220;get a copy of library X, and you&#8217;re all set.&#8221; Well if downloading a library would solve the problem, [&#8230;]]]></description>
				<content:encoded><![CDATA[<p></p><p>There is something about OAuth that brings out the worst in techies. You can see it when someone asks how to get started with OAuth on the <a href="http://groups.google.com/group/twitter-development-talk">Twitter development talk</a> mailing list. The general response is &#8220;get a copy of library X, and you&#8217;re all set.&#8221; Well if downloading a library would solve the problem, I don&#8217;t think so many people would keep asking for help. </p>
<p>The disconnect is that all the Twitter docs on OAuth assume that you already know how it works. It&#8217;s like giving driving directions to someone who does not know how to operate a car. If someone has never driven before, saying &#8220;Just go north 5 miles, and you&#8217;re all set&#8221; doesn&#8217;t help much.</p>
<p>What OAuth beginners need is a step by step set of instructions that starts by showing where all the moving parts are, and how to assemble them for a first working program that controls the Twitter API. That is what I have tried to do with my latest tutorial. It is called <a href="http://140dev.com/twitter-api-programming-tutorials/hello-twitter-oauth-php/">Hello Twitter OAuth</a>, and it shows every step necessary to post tweets with OAuth. These techniques can then be applied to any API command. </p>
]]></content:encoded>
			<wfw:commentRss>http://140dev.com/twitter-api-programming-blog/the-simplest-twitter-oauth-tutorial-possible/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
