Speaking at PHPBenelux 2010

December 5th, 2009 rowan No comments

I am pleased, and not a little bit excited, to confirm that I will be giving the talk “Living with Legacy Code” at this year’s PHPBenelux conference. It’s over in Antwerp on the 30th January so will also be a chance to visit Belgium again – something I haven’t done in almost 15 years.

Legacy code is not the sexiest topic out there, but it’s definitely something that a lot of us have to deal with on a 9 ’til 5 basis. One day, I might be in a position to do nothing but play with shiny new technologies and never care about supporting them but until then the rent still needs paying – so it’s worth finding ways to make that as enjoyable as possible.

Of course, I can’t give too much away – you’ll just have to come along and see! You can have a tiny sneak preview via my profile on the site though. Turns out speaking off the cuff in an interview is really quite hard. Still, maybe someone will suggest somewhere to get good waffles.

Perhaps the only down side to the whole thing is that I’m opposite Fabien Potencier giving a talk on “Dependency  Injection in PHP 5.2 and 5.3″ which I’d really like to see!

Categories: Talks Tags:

Giving Finger the Twitter

November 29th, 2009 rowan 6 comments

Commands with names that can be interpreted in a juvenile fashion seem to be a long standing tradition in the various Unix-like operating systems out there. One that seems to have fallen out of use is finger. So, to start the learning (and perhaps a little giggling) open a terminal and and let’s try fingering ourselves.

rowan@favabean:~$ finger rowan
Login: rowan Name: Rowan Merewood
Directory: /home/rowan Shell: /bin/bash
On since Sun Nov 29 15:58 (GMT) on tty7 from :0
4 hours 19 minutes idle
On since Sun Nov 29 16:42 (GMT) on pts/0 from :0.0
No mail.
No Plan.

At the end of the output you can see the rather cynical sounding “No Plan.” message. Since we hopefully do have a plan, let’s see what’s going on there. Straight-faced, type in:

man finger

Read through and you should find the excerpt:

~/.plan ~/.project~/.pgpkey
These files are printed as part of a long-format request. The .plan file may be arbitrarily long.

Let’s experiment with this in the classic way then:

rowan@favabean:~$ echo 'Hello, World!' > ~/.plan
rowan@favabean:~$ finger rowan
Login: rowan Name: Rowan Merewood
Directory: /home/rowan Shell: /bin/bash
On since Sun Nov 29 15:58 (GMT) on tty7 from :0
4 hours 30 minutes idle
On since Sun Nov 29 16:42 (GMT) on pts/0 from :0.0
No mail.
Plan:
Hello, World!

All simple enough, so why is it there? A little digging around will uncover a post on alt.folklore.computers in Origins of the finger command. He quotes an email from Les Earnest, the author of finger and it includes an explanation of the Plan feature:

Some people asked for the Plan file feature so that they could explain their absence or how they could be reached at odd times, so I added it. I found it interesting that this feature evolved into a forum for social commentary and amusing observations

This sounds familiar… almost as if filling in your .plan is the equivalent of answering the question “What’s happening?”

In which case, let’s link our old-school ’70s protocol up-to everyone’s favourite social network. Since we’re keeping the Unix theme, we’ll do it by piping a few commands together.

Fetch your status, substitute in your username for rowan_m:
http://twitter.com/users/show.xml?screen_name=rowan_m

Strip out everything except your update:
| grep "<text>"

Strip out the tags:
| sed 's/\s*<text>\(.*\)<\/text>\s*/\1/'

Dump that into your Plan:
> ~/.plan

Let’s string that all together and see what happens:

rowan@favabean:~$ curl http://twitter.com/users/show.xml?screen_name=rowan_m | grep "<text>" | sed 's/\s*<text>\(.*\)<\/text>\s*/\1/' > ~/.plan
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
104 1873 104 1873 0 0 4524 0 --:--:-- --:--:-- --:--:-- 8961
rowan@favabean:~$ finger rowan
Login: rowan Name: Rowan Merewood
Directory: /home/rowan Shell: /bin/bash
On since Sun Nov 29 15:58 (GMT) on tty7 from :0
4 hours 52 minutes idle
On since Sun Nov 29 16:42 (GMT) on pts/0 from :0.0
No mail.
Plan:
Piping some commands to other commands

There we go, my current status in my .plan. Obviously you don’t want to be running that manually every time, so as a last step you can add it into your crontab. For example, to set it up to run every five minutes use crontab -e and add the following:

# m h dom mon do command
*/5 * * * * curl http://twitter.com/users/show.xml?screen_name=rowan_m | grep "<text>" | sed 's/\s*<text>\(.*\)<\/text>\s*/\1/' > ~/.plan

This has been pretty quick and dirty, so feel free to point out optimisations. I’m sure we can get this down to 10 characters of Perl. ;)

Categories: Hacks Tags:

Introducing Tweetist

November 22nd, 2009 rowan 2 comments

I can’t quite remember where I heard this, but someone compared reading their Twitter feed to dipping into a stream. They didn’t attempt to keep up with it and it was essentially luck if they popped in at the right time to pick up something interesting. I tend to be rather free with the “Follow” button, so as a result my feed is fairly busy. Some time ago, I idly wondered if anyone had implemented a Bayesian filter on Twitter, so that you could begin to refine your feed over time – meaning you can still follow hundreds of people but are less likely to miss those 140 characters that will change your life.

After taking the necessary first step and gently rubbing some Google on the affected area, it looks like there’s only one working attempt out there. It’s a C# experiment from Ade Miller making use of the Witty client. There are a few speculative articles out there too, but as far as I could tell his was the only attempt that resulted in something concrete. So, it’s pleasing that I’m not too late to the party and there’s still time to do something different with this idea. That’s where the motivation for this article came from, as I suspect the best way for me to finish coding this is to publicly tell everyone that I will.

Currently, the proof-of-concept is a Zend Framework application on my netbook. It retrieves my feed, rates the tweets and lets me love or hate individual ones. Over the coming week or so, I’d like to get a working alpha up on tweetist.org for people to have a play with. However, to prove it’s not total vapourware let’s have a look under the hood at this first incarnation.

Classifier, FunctionExtractors, Probabilities, Scorer

Classifier, FunctionExtractors, Probabilities, Scorer

The class at the top of the library is the Classifier which is able to take a SimpleXMLElement representing a tweet from Zend_Service_Twitter and return a score. Since we’re talking about Bayesian classification that score is a probability predicting whether or not the tweet will be worth reading. For the Classifier to make this decision it needs a number of resources.

The FunctionExtractors specify the features that will be extracted from the tweet. The Probabilities class is responsible for pulling the various read/don’t read probabilities for the features from the dataset we’ve trained so far. Finally, the Scorer takes the probabilities and chooses how we combine them to reach a final rating for the tweet.

It’s the function extractors that are going to be the most important part of making this application work as they determine what aspects of a tweet are being rated. Currently, it extracts the author, any other mentioned users, any topics and the words in the tweet. There are some obvious improvements to make to the word extractor: dropping common words, normalising, stemming and so on which will reduce the noise ratio. Once these are bedded in, there are more complex features that might be worth exploring, for example: average number of tweets per day from the user, trending topics, shared friends, etc.

Anyway, hopefully that’s enough to whet your appetite. I’ll try and hold up my end of the bargain and get this public soon. There will also be some follow up posts with the database structure and the maths involved too.

Categories: Tweetist Tags: ,