Saturday, July 28, 2007

Little under a year ago, Google advertised three job positions in South Africa. Yesterday it was announced that Novell SA country manager, Stafford Masie, has taken one of those positions. He will be heading operations here from 3 September.

This is a very exciting move for SA. They have yet to decide whether the office will run from Cape Town or Johannesburg. This is only the beginning, however. What I'm waiting for is for Google to open up an engineering office in SA. That would be awesome! Amazon recently opened an engineering office in Cape Town, so there is definitely interest. We have a fair number of talented computer scientists coming out of our universities every year.

There is already a google.co.za domain with some of our official languages, which has been around for a good couple of years now. Google Maps data for SA is aweful and that is generally one of the first projects when a new engineering office opens. If you don't know what I'm talking about, do a search in America (you can even drag the route!).

When I first participated in the ICFP Contest last year I was expecting an AI problem from looking at previous years. I was attracted to the contest because of this. My team had won the Java Challenge (renamed the Parallel Challenge for that year) in the ACM ICPC the previous year, so I thought it would be nice to try out the longer version.

So I got a team together telling them to expect some AI. None of us knew any functional programming and although we could see from previous years that it was beneficial we weren't taking it seriously. We get the problem and boom, "code a VM". Ok, where's the AI? We immediately jumped in with C++ and even then we initially had efficiency problems, although those were quickly rectified. Everyone complained afterwards that functional languages lost out there.

Then the mini problems started revealing themselves one by one. I will admit we didn't get very far with them (mainly due to factors out of our control and a lack of interest from some who expected an AI problem). Some of the problems were slightly functional in nature, but I still don't think there was any major benefit in using a functional language.

After the lack of AI in 2006, I was able to gather some people into the team that previously wouldn't participate because of the AI nature in the past. This year we were far more prepared for a non-AI problem. I also got a larger team together after 2006 being multiple small problems. Once again, none of were functional programming gurus (although by now I have at least know a bit of Haskell).

We get the problem this year and once again it starts off with a VM (DNA->RNA). With most of our team being C++ fanatics, we quickly got it running and even discovered ropes pretty quickly. From reading other peoples comments, the functional languages suffered with the VM. Although it was possible to get an efficient VM, the sub-linear data structures appear to be fairly uncommon. Yes, they're not that common in C++ either, but the functional teams sure seemed to rush over to C++.

Then there were the problems once you've passed the VM. Our team used Perl, Python and a bit of Bash to glue the scripts together. I don't see how a functional language could have been beneficial in any way. Even if someone does find a way of using a functional language, I sure would like to see how functional languages can do a better job.

The idea of the ICFP Contest is to promote functional programming. Or at least, that used to be one of its goals. Has this emphasis changed? There sure still seem to be many teams whose primary tools are functional languages. Last year, none of the top three teams nominated a functional language as their primary tool. I'm confident the same will occur this year.

I certainly do not mean for this to be an attack on the contest. I still love it. This is just something that has been on my mind making me wonder what the connection to the ICFP is. Is it possible that the functional programming community can learn from the lack of results from functional teams and make the functional languages more powerful? Maybe that's the aim?

Wednesday, July 25, 2007

Remember my post about my home theater? All I was waiting for back then was the TV. Well, it arrived on Friday and was installed on Saturday. Unfortunately the news of its arrival has been swamped by my participation in the ICFP. Nevertheless, it is here!

Up until a week ago we were going for the 42". Well, that all changed when the price dropped by a huge amount. We are now stuck with an even bigger 50"! I tell you, seeing these things in the stores is one thing. Having them in your own home...that's the first time you truly see how beatiful and how enormous these beasts truly are! Here's an attempt at trying to awe you readers out there, although it barely touches the true beauty this thing reveals:

A couple of weeks ago we got a PS3. I was amazed at the graphics on an SD TV..you can imagine my amazement when connected to this beast. Currently we're waiting for the real games to come out so we only have Resistance and Motor Storm. I am anticipating the launch of GT5, which is said to be the first ever full 1080p game.

We also have one of these wonderful devices hooked up. It streams divx content over the network. I tell you, with an HD panel you can't not notice the low quality of divx. That's to be expected, but I didn't realise just how noticeable it would be.

Then finally we have Multichoice's PVR, which there isn't much to talk about.

You have to see and hear it for youself to understand just how amazing the whole setup we have going is. The guy that came to install the TV said the sound was the best he's ever heard, and he even has the same amp! He also loved the Ziova - to the point where he's trying to get one for himself now. :P

The problem of this year's ICFP Contest started off with us receiving Endo's DNA, which was 7MB. The DNA was interpreted as a sequence of patterns, which searched for some DNA, and templates, which described how to manipulate the DNA at that point. Some of the DNA caused RNA to spew out. The RNA was then interpreted as drawing commands to draw an image. Endo's original DNA sequence created this image:

Our task was to reverse engineer the DNA and poke away by coming up with a prefix to prepend to Endo's DNA to morph him to stand a better chance of survival on Earth, where he has crashed. We were given a target image to aim for (below) and we were scored by the number of pixels we got correct and the length of our prefix, with a shorter prefix being better.

Being in South Africa, we fitted nicely into the same timezone as the organisers so we started at noon on Friday. We took about an hour to read through the problem statement to decipher the meaning of the DNA. We then split up the tasks of writing a DNA->RNA and RNA->image converter.

Carl, Alex, Harry and I worked on the DNA converter while Bruce and James did the RNA->Image converter, both in C++. The RNA->image converter didn't take too long, but the DNA->RNA converter had efficiency issues. We got a working version by about 17:00, but it was horrendously slow compared for what we needed. Max joined us after work and quickly gave us the idea of ropes. None of us had used them before though so we had to read up on them to decide if they would be useful. We went for them in the end, but they caused us major headaches. The rope substr function defaults to taking only one character from the string instead of the usual running to the end.

After working on all these annoying bugs we finally got it working some time around 20:00 or so. We then used the prefix given to us in the problem text which ran some self checks. Unfortunately one of them failed and it was back to debugging. The bug appeared to be related to ropes, since our strings version passed all the self checks. It was also odd that it worked with compiler optimisations, but not without them enabled. While Carl worked on debugging, Bruce worked on some OpenGL code to display the image as it was being generated so we could see any hidden messages that were covered by later layers.

With the OpenGL code we discovered a hidden prefix which when run gave us a field repair guide which led us to a catalogue page and a prefix to rise the sun. Julien and Richard spent some time trying to figure out how to use the catalogue prefix to view other pages, but only ended up at invalid pages. When Bruce had a moment of free time we gave it to him and he solved it very quickly. We replaced the number in the prefix with the page number we needed. Page 1337 gave us the catalogue index page. This opened up a whole bunch of new pages which both helped us understand the DNA and gave us new problems to solve.

We did a brute force search on the page number to see if we could find any undocumented pages. Of note were pages 100 and 1024. We searched up to about 20000 pages, but didn't find much else. The stegonography hint gave us the number 9546 in the ET page, although we never figured out a use for it. The Virus Alert page we noticed was Wingdings, but we didn't find a use for the message. The Intergalactic Character Set pointed us to ebcdic. The Undocumented RNA helped us print a stack trace of function calls.

One of the pages extracted was a Gene Table. It said "Page 1 of 14" on the top. We got the remaining 13 pages by setting the value AAA_geneTablePageNr to the one we wanted. We also later discovered from the ImpDocs that printGeneTable took a boolean parameter that when set to true ran integrity checks. This proved valuable later on when we got to fixing functions in the various ways. We got some of our team, including team managers, to transcribe the gene table. From this we wrote scripts to call a function in a human readable format. This was the start of our scripting language that generated a prefix.

Some of the function calls yielded further help pages and more problems to solve. We noticed that the contest-xxxx pages had some yellow letters (ICFP's). When combined these gave us another prefix. The Encodings page helped us search for strings and polygons in the DNA. Some of these strings were helpful, while most of them we had already seen. The Fuun Security Features page was easily noticed to be encrypted with rot13.

We extracted the hitMeWithAClueStick function from the DNA. Removing the C's and trying different widths gave us the message "PORTABLE NETWORK GRAPHICS FOLLOWS". We then noticed the data was split into three sections by P's. The second chunk gave us a PNG image, which told us that an audible voice followed and we then interpreted the remaining segment as an MP3, which read out yet another prefix. This gave us the Beautiful Numbers page, which didn't get us anything extra since we had already found page 496 by brute force.

All the above was about sub-problems in the contest. However, none of this helps morph poor Endo into a cow to save him. At least, not directly. Most of the sub-problems yielded clues which in turn helped us find the right DNA to poke into Endo's DNA, call functions and do other interesting things to it. These all helped us morph the source image into the target image. Since we've been asked by the organisers not to reveal our score I think this is as far as I should go.

Hopefully what I have said should be sufficient for those that did not get very far to poke around and at least get some satisfaction from the problem. I can imagine that many of you spent the entire weekend and got nowhere.

Monday, July 23, 2007

The ICFP Contest ended just four hours ago. I'm so excited to tell you that we, the United Coding Team, have found ourselves somewhere in the top 15 out of 869 teams. We suspect that we're reasonably high up within the top 15 as well from looking at positions 16+ on the scoreboard, although we've been asked to keep our score private until the final standings are announced at the ICFP (a conference) during 1-3 October.

I am extremely exhausted right now and it's time to catch up on some much needed sleep. Once I've regained some energy I will post some details of how we tackled the problem. In the meantime, here are a couple of pictures of our team (unfortunately with me being the photographer I'm not in any of them).

Sunday, July 22, 2007

This year's ICFP task is to save the alien Endo who has crash landed on Earth by morphing him to stand a better chance of survival in this unfamiliar environment. We are given his DNA which for the purposes of the problem draws the image on top. Our task is to prepend DNA to get the bottom image. The full problem description can be read here.

On Friday we jumped into the top 20 and have remained their since. Other than that I shall refrain from discussing our results as the contest is still underway for another 25 hours.

Go United Coding Team!

PS: Thanks Bruce, without you we would still be interpreting the meaning of the DNA. :)

Thursday, July 19, 2007

Currently we're sitting on a total team size of 19. I think that's a record. :P We have the 4th ranked TopCoder, 6 IOI participants, 2 IMO participants and 4 ACM world finalists on our team. Not to mention the circus guy whose primary form of transportation is a unicyle and some silly man who likes his hair green.

Unfortunately there's a scheduled power cut on Saturday from 11:00-13:00, although we're not sure if we'll be affected.

One of the contest organisers has a blog in which he's been going on about this story about a spam message they received from which they've decrypted and posted several pictures. They are obviously connected to the contest, since recentpostings have mentioned that they're getting close to cracking its secrets and it looks like they will "crack" it when the contest begins.

If I get a break I might be posting brief updates during the contest. So keep your eyes glued!

Tuesday, July 17, 2007

Yes, there are those rare few who know what they're doing at ICTS. And Adrian Frith is one of those people who knows those particular people and what they're good at.

With Adrian's help, last week the issues with the LCC were solved. The solution ended up being: "Oh, let's just disable all the smart stuff in the smart switch!" Problem I find in that is, what was the point of a new switch then? Apart from the nice upgrade to a Gbit network, which doesn't really improve the situation for ICTS and them wanting to remotely block ports. It's fixed though and nothing has gone backwards (thankfully) so I'm happy.

Then, another issue we had with them was with the Olympiad server. This is the server we use for the SACO. Now, the team are supposed to be training for the upcoming IOI and half of their training is done off this server. The 3rd round of the SACO is also approaching and again, they use this server to train on. The issue was that it could only be accessed on campus via the IP address. The DNS wouldn't resolve and the firewall was blocking access from off campus.

This problem started about 3-4 weeks ago. We've been in contact with ICTS for at least two weeks. They kept on closing the call, telling us the problem was resolved. Three of us (myself, Bruce and Carl) all tried desperately to get them to sort this out as it was really slowing other things down. On Friday I tell Adrian about the problem and this morning it is solved! Wow, how it helps knowing the right people at ICTS.

So that's two of three issues solved for now. The last one which is only getting worse is affecting everyone. Carl has blogged about this problem, so I won't say much more other than it appears to be related to the issues they're having with multi-casting. They've pulled down the network about five times now to get it working and they're doing it again this evening.

Monday, July 16, 2007

For Computer Science honours at UCT we all have to do a big project. The project aims to be the first step towards doing our own research.

I ended up selecting a project on genetic algorithms applied to colour image segmentation. I've since adapted the problem a bit by wanting to study parallel genetic algorithms and more specifically attempting to develop a model that can run on a Grid. My supervisor is Audrey Mbogho and my partner Keri Woods.

Last week I was working on the literature review. It's quite a long and drawn out task to do as you have to hunt down relevant papers. There's plenty literature on image segmentation, to the point where the tricky part was trimming down on the length. There's also a fair amount of literature on parallel genetic algorithms. Grid-based PGAs however - now that's an area that has received very little attention. And the papers I desperately needed I couldn't access. :(

I've just started putting together a page where I've uploaded the documents I've completed so far. Currently it's pretty bare with just my proposal and literature review, although more will be added over time. You can access the page here.

Thursday, July 12, 2007

It's that time of the year again. Time for the ICFP Contest. Last year Carnegie Mellon set an amazing problem where a 1.8MB file ran on virtual machines we had to write and self-extracted another 15MB source file. The new file ran on the same virtual machine and yielded a mini Linux kernel with user accounts you had to hack into to retrieve the problems to solve. Everything - the problems, compilers, evaluators - came out of that tiny 1.8MB file.

This year our team returns to the scene. This time strengthened with some stronger team members. And the hope that our network will be running for more than 50% of the time. Our team is 100% from the University of Cape Town (UCT), South Africa. Two of our members will be joining in from California though, but still from UCT nonetheless. Our team in alphabetical order to reduce favouritism:

The first thing you will probably notice is that our team is rather large. One of the nice things about this contest is the unlimited team size. We thought we'd try take advantage of that this year. It will be interesting to see if the advantage of more brains overcomes the disadvantage of managing a large team over such a short time. To reduce the management tasks the team has to worry about we have two dedicated managers - Hayley McIntosh and Ian Saunder.

The only thing that changes this year is it is organised by Utrecht University and of course the problem will be different. It runs from Friday 20 to Monday 23 July (72 hours). One of the teams put together a countdown, as well as one for the more geeky.

Tuesday, July 10, 2007

I think everyone I know at UCT has at least one story about a nasty experience with ICTS. I tell you, I have many. One of the reasons is that I co-administrate the IBM Linux Competency Center (LCC). If you tried clicking on that link and it didn't work that either means you aren't on the UCT network (blame ICTS) or the problem I am about to describe has yet to be resolved.

A very brief background on the LCC. It is a lab with a rack of IBM servers ranging from dual core blades through to a server with 8 cores. I took over admining the lab almost exactly a year ago from now together with Jason Brownbridge who has since left UCT and Adrian Frith has taken over his duties. Running on our own subnet we often have to deal with ICTS, especially since our CS admin, Matthew West, left last month.

During this year ICTS has been gradually "upgrading" the UCT network, claiming the end user will benefit, although all I've heard that will be new is that they will have more control over the network such as being able to disable machines remotely. On Friday 29 June, the PD Hahn building in which the LCC is situated underwent the upgrade. This is when all the troubles began. To give you an idea, when the CS building underwent the upgrade it indirectly caused one of our sys admins to retire.

First problem was the IP adresses were all changed. We appeared to resolve that issue pretty quickly, although more on that later. Then they replaced the switch with a nice new Gbit Cisco switch. This is where the real problems start. The blade center could not connect to the switch. Three people from ICTS checked it out on seperate occasions - one of them checking it twice - and every time they've told us it's an issue on our end. So, we decided to put in our own switch to be sure. Guess what? It worked!! We're still following this up though, as it would be nice to get the Gbit switch back.

The other problem we're still experiencing started on Monday. All of a sudden after working on Sunday none of the nodes could connect to the external network. First one ICTS person told us they had been having issues deploying multicasting services on Friday and that it has been spreading the PD Hahn building, affecting the various subnets. However, today another ICTS person tells us that he knows of no network issues in PD Hahn. Tell me about miscommunication! He tells us that we're using the incorrect IP address. HOW?

Monday, July 9, 2007

I was watching a video of a tech talk on Google's MapReduce and there was one slide that had very little to to do with MapReduce itself, but I found very interesting. It was a list of 10 observations of the development process at Google:

Devs work out of a ~single source depot; shared infrastructure!

A dev can fix bugs anywhere in the source tree

Building a product takes 3 commands ("get, config, make")

Uniform coding style guidelines accross company

Code reviews mandatory for all checkins

Pervasive unit testing, written by devs; high dev/test ratio

Unit tests run nightly, email sent on failure

Powerful tools, shared company-wide.

Rapid project cycles; devs change projects often; 20% time

Peer-driven review process, flat management

I experienced or at least observed every single one of those points. It really sums up the working environment at Google well.

The source tree - mostly all visible to even interns, and all you can see can be modified. Building couldn't be easier - even the Makefile is auto-generated. Coding style is very strict. They emphasise unit testing a lot, espicially writing them before writing the code. Powerful tools - couldn't have described it better, MapReduce is one of many examples.

Sunday, July 8, 2007

The location of the 2008 ACM ICPC World Finals is now official. The announcement is on the ICPC site. It will be in Banff Springs, Alberta (wikipedia page). Hopefully they will make it a good year, but my hopes aren't pinned very high since I've heard that Alberta is in "the middle of nowhere". When you have a Canadian saying that he wouldn't want to go there you know somethings not right.

2009 should be a bang though. Currently it appears as though Stockholm, Sweden will be getting it and KTH have some cool things lined up. They were the ones that introduced the new scoreboard this year, so that should give you a hint as to what to expect. It's not official yet, so don't take my word for it. I've spoken to the KTH coach though and they seem very excited about the possibility of hosting the finals.

I'm waiting for the ICPC to come to Cape Town though. If that ever happens I will be very happy. It appears to becoming a lot more competative now though as KTH were the first to put together a bid and that will probably be expected from now on.

Friday, July 6, 2007

As thorough as Google products are tested, bugs still creep in. And I'm not talking about security holes or anything along those lines, I mean bugs that you could identify even without looking. While some may be petty bugs, which don't interfere with user experience, some can be rather annoying once noticed.

The first bug which I found falling under the annoyance cateogory is with the main Google Web Search. I was searching for the C++ >?= operator. If you search for the operator (or click here) you will notice an absence of results. This is due to Google not indexing punctuation. This happens when searching for anypunctuationonlyquery. However, when you search for any other random query which contains no results such as this one an error message is returned saying "Your search - thisisarandomquery - did not match any documents." I also use Google Search History and the former searches are not saved while the latter are.

The Web Search is out of beta, at least last time I checked. Other products still labelled as beta are expected to have more bugs. Google Talk handles multiple logins to the same account rather poorly, often sending the message to just one of the clients and sometimes the unexpected one. While I was working on Code Search I discovered some bugs from general usage and reported them as well.

There are a couple of other strange behaviours people have pointed me to. Some people could consider them bugs and some not. It's sometimes a thin line between bug and feature. I have reported these behavious even since I have left. Someone also reported one of the examples on the Code Search home page as confusing and I got it changed.

So as you can see, Google is far from perfect, far from being bug free. You can help by reporting any bugs or odd behaviours you may discover.

Wednesday, July 4, 2007

This is probably the worst secret I have had to keep. It's been seven months. Seven anxious months! But the time has come. I can finally speak! The NDA holds me back no longer!

A small flashback is required. Over the past December-February I was a software engineer intern working in Google's Zurich offices. The project I spent most of my time on was Google Code Search. You can read up about my experiences here.

The problem is that what I was working on was confidential, as with most work at Google. Since Code Search is a relatively small project, I couldn't go any further than telling people I was/had worked on Code Search. This all changes however when my work goes live. Guess what? You made the connection yet? Yes!! After seven long months...it has...(drum rolls)...gone live!!! :D

Another small flashback will do here. Before I even discovered I was to work on Code Search, I had this lingering question in my head about what is crawled. It was lingering enough to make me wonder, but not enough to make me find an answer. I wondered however, if they crawled html pages. Well, it's out now. That was what my main task during my internship - to crawl html pages for code embedded within web pages.

I am afraid of leaking information I shouldn't be discussing, so I'm afraid most of this big secret will have to remain that way, possibly forever. You might look at it and think "OMG, how did he do that?", or on the other end you might think "Gosh! I could do that in my sleep!" Now I'm afraid I can't back either end up. But, I will tell you that some this task taught me some interesting things about programming languages I probably never would have come across before. This happened while having to identify the language of the code and tag it. Tricky bastard that was!

It is still very much in its early stages in its live state so the results are far from perfect. And there are some other things which I cannot comment on. But you can see some of the results for yourself. You can see that a result is from an html snippet if it isn't from an archive or cvs/svn repository. There is no way to completely distinguish the types of results in a search (mine from original), but to get an idea you can search for html pages not detected as html or php, which will yield some results.

There is a posting on the Russian Google Blog, which you can read use Google Translate here. It's a nice read from the perspective of the Russian members of Code Search. They point out a nice example: wordexp_t example, which yields a result in html snippets at the top.

UPDATE: The ranking has changed slightly, so although the example about still returns html results, it falls lower down. There is another example however, which still yields a top html result: nph-refresh lang:perl

I'll leave you to explore the results of about 6-7 weeks of my time spent at Google. It was a really enjoyable experience and I'd like to thank my manager Miguel Garcia and co-worker Pawel Aleksander Fedorynski as well as the other members in other locations.

Monday, July 2, 2007

I refer you back to this post of mine where my luggage was lost on a flight from Frankfurt to Zurich. I took that flight 11 days ago. Finally, after countless calls (with the phone ringing for 10 minutes a common occurrence), this afternoon for the very first time was I told with confidence that they even knew where my luggage was - Johannesburg.

Ironically, in the hands of South Africans the process sped up and I was actually kept informed. Would you believe it, after 11 days the saga is over. I now have my luggage!! I dread to find out what it's been through as I have yet to open it up. I'd like to enjoy this brief moment of happiness.