Pushing the boundary of Real Time Web with Twitter and XFactor

Makoto Inoue

This post was originally published on the New Bamboo blog, before New Bamboo joined thoughtbot in London.


I got lots of positive comments/retweets about my last article Real time online activity monitor example with node.js and WebSocket.

(if you haven’t read the very long article, and not interested into the technical detail, don’t worry. I try not to get into technical detail this time)

I am glad that I was able to show how exciting node.js and WebSockets are to make real time web application.

However, I don’t think my “activity monitor” example showed the full potential of the real time web yet. Why? because I was only showing resource usage stats “EVERY ONE SECOND”. One second latency is still not real time yet.

This is what I did next (Make sure your sound is turned ON):

How many people watched the show?.

XFactor” is UK’s popular talent show. It’s like American idol, where contestants sing live and get voting each week (Please note that Susan Boyle is not from XFactor, but from Britain’s got talent.

20 million people watched the final show, and over 10 million people voted for the 2 finalists , which is more than the number of votes for the government at the last general election (according to Metro).

What the video shows

I captured the twitter feed real time during the show (19:54 ~ 21:49 GMT ) 13th Dec (Sun). I captured any feed related to keywords “Joe & win” , “Olly & win” or the both.

Here is the explanation of 3 key figures.

  • Total = The number of mentions to each contestant during the show
  • Speed = Tweets Per Second(I’ll call it TPS going forward)
  • Ratio = “Total” of each contestant / sum of the both

Of the 3 figures, TPS(Speed) was the most interesting one to watch real time. Did you notice that it was showing 4 ~ 5 TPS overall, went down to 0 ~ 1 TPS during the pose before the announcement, then jumped into 10 TPS immediately after the announcement?

What happened at Twittersphere during the show (what the video does not show)

A night before the final ,there were 3 contestants (Olly Murs, Stacy Sollomon, and Joe Mcelderry). Olly and Stacy’s name were on [Twitter’s “Trending Topics”], even though many newspapers were writing that Joe is the bookie’s favourite(2 - 9, again according to Metro).

My initial guess was this.

  • Stacy and Olly are in their 20’s while Joe is still a teenager. Twitter is based on more adults audience (20~30’s), so people tweets more about Stacy and Olly, but teenage fun of Joe does not tweet about him much.

The moment I started capturing the twitter feed, it was clear that Joe was going to win. The dominance ratio between Olly and Joe was consistently 41% : 59%. Olly pushed a bit during the show up until 45% (for about 5 min), but it did not last so long. Olly received his highest 14 TPS(Tweets per second) when Robbie Williams was on video to support Olly. Joe received his highest 22 TPS when Cheryl Cole was making support comment after Joe sang his last song and she was half sobbing.

So, I tried to find out myself, and the result shows that Joe got more tweets. So why Twitter failed to put as Trends? Here is my current guess

  • Joe is a very common name (Average Joe, GI Joe, Joe Jonas of Jonas brothers which was trapped on my search a lot in my earlier version of the trial), so twitter filtered out as noise.

The above is the screen capture after Joe won the XFactor. His name is still not on the trends.

Under the hood (a bit more technical detail)

I was looking for interesting things I could do real time. Unfortunately, most Web APIs are not real time yet. They are still in the old paradigm of request/response cycle, and also many of their api have usage limits, so I can not keep hitting external web server, with an exception: Twitter.

When it comes to “real time”, Nobody puts Twitter on a corner.

I knew the existence of Twitter’s Streaming API. When Twitter announced it back in September, I totally dismissed it thinking that it’s only useful when you either write a desktop app, or store the streamed data somewhere for data analysis at later time.

Here is the brief summary from the website.

To connect to the Streaming API, form a HTTP request and consume the resulting stream. Our servers will hold the connection open indefinitely, barring server-side error, excessive client-side lag, network hiccups or duplicate logins.

For streaming capturing, I could have used node.js, but I was in such a hurry that I used my most trusted tool, Ruby.

I used a ruby gem called TweetStream. RubyInside has some nice article about it, so I won’t go detail. If you are really interested how I got stream, here is the code.

How did I put the result onto my browser? It’s easy. I appended the result into log file, and let node.js to tail the log file.

Redirecting output to log file

my_program.rb >> xfactor-final.json

Start up node js with the log file as argument.

node server.js ./xfactor-final.json

In fact, I used node.js , bud did not write a single line of node.js code, as I already had “tail.js” code which reads the the log file.

Tail node.js example

The source here is bundled in my previous example code.

var sys = require('sys');

var filename = process.ARGV[2];
if (!filename)
  throw new Error("Usage: node server.js filename");

var child_process = process.createChildProcess("tail", ["-f", filename]);

exports.handleData = function(connection, data) {
  var output = function (output_data) {
    connection.send('\u0000' + output_data + '\uffff');
  }

  connection.addListener('eof', function(data) {
   child_process.removeListener("output", output)
  })

  child_process.addListener("output", output);
}

I may publish the code once I replace my ruby logic with node.js. However, it’s worth saying that you do not need to do everything using one framework. You can just pick the best tool to do what you want to do, and glue them together. That’s good & old tradition of Unix programming in general.

Another interesting thing worth noting is that Twitter real time feed did NOT disconnect during the entire show (over 1 hr generating more than 5MB of texts). That’s pretty impressive.