Archive for the ‘Uncategorized’ Category

Perish the thought that I should say something controversial but, and there’s no getting away from this, I think that desktop applications (aka rich clients) still have a place in the brave new world of software services. Over the years, there have been many arguments and disagreements, but the most frequent (work based) one has been over the need to run absolutely everything on a web server and perform all day to day computing tasks in a browser. Granted, I am presenting the extreme case, but the argument always seems to revolve around the fact that if e-mail etc can be done over the web, then so can everything else. Now I am as big a believer in the Web 2.0 hype as anybody, but I really don’t see a world where the only application running on my local machine is a copy of Firefox, with all my programs and data hosted elsewhere.

There are many reasons why I don’t subscribe to the view that everything can be performed in a browser, but the one that bothers me the most is that the main argument in favour revolves around ease of software management. This argument is almost always put forward by people with a systems administration / architectural purity focus and never by the people that actually do the work. I don’t agree that ease of support means that users (in our case, scientists) should have to make do with a sub-standard experience. Yes, AJAX has made a difference and yes, Flash / Java / Silverlight make for a reasonable user experience on the web, but the fact remains that desktop applications currently provide the most responsive interactive experience. I’m not arguing against web applications at all, I just think that at the moment there is a place for both. Consider, for example, Google Earth:

Google Earth is a truly great technology. Granted, there are other “World Browsers” around (Nasa WorldWind was the first time I saw this, and obviously there’s the Microsoft alternative), but Google brought the tool to everybodies attention. Google are perhaps the poster child for the Web 2.0, do everything on a browser philosophy, but the fact remains that the most responsive, visually appealing way to access the geographical data provided is the desktop Google Earth client. This is a fantastic piece of software that makes use of PC’s 3D card, the CPU and disk space to provide the richest possible experience. It is my first point of call when I want to fly over landmarks in 3D. I wouldn’t use it on my phone, though (there’s another downloadable client for that), I wouldn’t use it to plan a quick route from A to B (the web site does a better job of that). Clearly, it’s the data that’s key to Google Earth – the means of viewing that data depends on a) where I am and b) what I want to do with it. I like the fact that I am not forced to only use the web interface when I want something else. I also like the fact that if I want to get a quick map printed out I’m not forced to install and use the client. “Horses for Courses” is the phrase that springs to mind.

This discussion reared its head again recently because I was demoing a piece of (desktop) data analysis software I have been writing as a hobby over the last few years (It’s notionally open source and someday I might even release it). This demo was particularly enjoyable, though, because the objections to desktop applications began before I’d even started the software! I don’t want to come over as some sort of luddite by clinging to obsolete technologies, but the reality at the moment is that having a rich client makes sense to a lot of people and we are not quite ready to fully make the move back to the client-server era 😉

How quickly another month slips by! It’s been a quiet period for the blog, but a busy time otherwise. I am writing this entry during a quiet hour of my Moroccan holiday, sitting on the roof terrace of my hotel with a beer. Shouldn’t I be enjoying my trip? Well, frankly, I’m sick of the bazaars, guided tours and hard-sell carpet shops. The event that prompted my to add a blog entry, however, came yesterday when I was in the Internet cafe. Everybody was on Facebook and I mean everybody (including me). A couple of years ago, and you could have replaced this scene with ten or twenty elderly PCs logged onto Hotmail. There is a lot of hype in Web 2.0, but sometimes you realise that it’s not all nonsense. Why has social networking supplanted e-mail for so many people? I’m not going to pretend I know the answer to that, just that there is an interesting change underway in the way people communicate via the Internet. This rant does have a point, though. For the past couple of weeks, myself and a couple of colleagues (Paul Watson and David Leahy) have been batting around various e-science social networking scenarios. I should point out that we are by no means alone in thinking that some aspects of social networking could help push all of the various e-science technologies out to a far wider audience. The myExperiment people have an interesting set of workflow sharing services and, over time, these could evolve into a more general e-Science framework. I think an e-Science platform along the lines of Facebook (with data storage, identity management, accessible APIs etc) could be a great thing. It could be hosted centrally, generously provided with storage and compute power and could become the platform of choice for provisioning e-Science applications. It would certainly simplify life for developers if they didn’t have to worry about certificates, data storage and security and could just focus on providing the functionality their users require.As far as software goes, I have been working towards providing a skeleton framework (in .NET) for such a platform. It’s pretty rudimentary, but I have finally managed to put up a simple webpage that users can sign in to using Windows LiveID. Once signed in, there are web services which provide basic information about the User, the groups they are in etc. Like I say, it’s not much at the moment, but I’m happy enough with the way it’s been put together to start adding some more functionality. At the very least, there’s going to be a data storage API and….. I won’t pretend that this hasn’t been a steep learning curve after 8 years of Java and J2EE development experience. It’s not that .NET is worse in any significant way (both platforms will happily leave you staring blankly at the screen wanting to kill somebody) it’s just different. I am getting the hang of things now though – for what it’s worth, I like the .NET way of doing web services more, although I’m not a huge fan of the online MSDN pages. There is also a lot more information on Google to help troubleshoot J2EE problems than there is for .NET developers, which has slowed me down a bit. Things are, however, made better by the fact that, conceptually at least, both platforms are very similar. The major difference is that J2EE will involve mystifying XML configuration files, whereas .NET will involve mystifying GUIs and Wizards. Maybe, if we succeed in producing a really decent e-Science social networking platform, e-Science developers will be spared some of these low-level hassles.

It’s been a fairly busy week, or so. I’m currently waiting for a new desktop PC to arrive with a bit more horsepower than my current machine. It’s done fairly sterling work though, because I’m pretty sure it used to be Savas‘ PC when he was still in Newcastle (I work in his old office in the Devonshire building). Until that arrives, coding is a bit of a pain.

I have also managed to secure a couple of days of work a week finishing up the GOLD project demonstrator. This is potentially very useful because we are trying to construct a demonstration which addresses at least some of the requirements of the REACH community. REACH (Registration, Evaluation, Authorisation and restriction of Chemical substances) is an extremely significant piece of EU legislation which will require all chemicals in use (over a certain limit) to be re-registered with the EU along with all the associated test and toxicity reports. To minimise the imact of this, companies are being encouraged to form strategic partnerships so that a set of users of a particular chemical can prepare a joint registration. The GOLD project was created to form just these kinds of partnerships. The software we created was designed to enable chemical companies to perform collaborative development. I am not sure that we will directly make use of the work I am doing at the moment with MS technologies, but it is a good opportunity to get closely involved with some actual collaborative working.

In a similar vein Paul Watson, David Leahy and I have been having some fascinating discussions on the subject of using social networking technologies to support e-Science research. I know we are not alone in this, because the myExperiment people are also doing some interesting work on workflow sharing. Our thoughts to date, however, have focussed around the Facebook metaphor. What is particularly interesting about Facebook is the support for external applications. Facebook provide a set of REST based services that application suppliers can make use of to integrate their software into peoples’ profiles. This is relevant to us because it has a direct parallel with our desire to provide a common e-Science infrastructure. The portal based collaboration that this type of social networking application provides is potentially a very significant means of “empowering” scientific researchers. Based on my brief look through the Facebook documentation it seems as though the services Facebook supply are fairly simple and fall into the categories of authentication, data storage, relationship building, application registries etc. Third party developers are expected to host external applications on their own servers and use these services. Facebook then acts as a common presentation layer that lets users tie together applications of interest. For our e-Science application, one could envisage the following structure:

In this system a set of core e-Science services will be provided that support authentication, group formation etc. Any specific applications that get developed can then make use of these services and display their GUIs on an e-Science portal. We can then provide customisation tools to allow scientists to configure their individual profiles to include applications of interest to them. I think this is going to be the long term aim for the investigations I am doing at the moment – to provide a hosting environment for e-Science applications (and the associated core services) that is configurable by the end user.

Big jump in blog visitors today. Up from a comfortable relaxed, no pressure 0 to the dizzy heights of 15! Big thanks to Savas for the link.

I had an interesting e-mail exchange recently regarding my initial protoype design. I didn’t make it entirely clear how my various services accesed the data stored within the system. The very reasonable point was made that it would be best to transfer pointers (URIs) to the data and not ship massive quantities of information around the system. This was always my intention, what I didn’t get across was that I am trying to abstract as much of the physical details of the data storage as possible. I am aiming (for want of a better word) for a type of “virtualised” data store, whereby one of my services can perform an operation like:

byte[] dataChunk = store.readChunk(uri);

without caring where that data is stored. Ideally, if I follow the CARMEN CAIRN philosophy this data will be held extremely closely to the computer hosting the service. However, and this is the important bit, it need not be if that is impractical. For example, one of the services might use a licensed piece of software that cannot be moved to be near the data. In this case, the server will just have to accept the hit of shifting the data. The point is that the individual services will not have to be rewritten. If the distribution of services and data changes later on, everything will still work. I guess this is a fairly obvious point, but it has helped me by writing it down. Maybe I’ll draw another diagram soon.

Back to my Visual Studio and .NET studies. I have written my first web service which was, once again, embarassingly easy. I like the way the the .NET and J2EE worlds are converging on using Annotations to describe web service methods in code. It made it much easier to get my head around things. The next step, and I suspect this might be more of a struggle, will be to pass a Windows LiveID token (via a Header?) from a client application to my web service. I’m sure this should be possible, but there doesn’t seem to be a lot of documentation around yet. Plenty on the client side and some good stuff about protecting web pages, but not much about protecting individual Web Services. Maybe I’m missing something, and it will be very similar to securing a web site. More reading. Again.

In order to create a relatively relevant prototype, one of the main tasks is to be able to duplicate at least some of what the CARMEN people have done with their CAIRN. Part of the current demonstration uses a workflow (Taverna: http://taverna.sourceforge.net) to co-ordinate a set of web services which are used to sort and classify spikes from a set of spike train data (actually, electrical signals from electrodes). To do this, I need to host a workflow and also deal with the large volumes of data that need to be analysed and stored. After some consideration, I am proposing the architecture shown below:

It’s based around Workflow at the moment, because the CARMEN developers are using Taverna to execute scientific workflows. The use case I had in mind is as follows:

The User authenticates using Window LiveID and receives a LiveID token

The User then initiates a workflow. This creates a workflow “Context” that is used to store the intermediate results as the workflow executes. I am not sure yet how this will be implemented – it may be that Windows Workflow Foundation provides some of this for me, or it may need to be specially built. Regardless, this context will form the basis for the data exchange between services as they are invoked. The actual data could be stored in memory, in a temporary directory or in some database hidden behind a Web Service.

The initial working data is taken from the data storage service and loaded into the workflow Context. In the neuroscience case, this will be a set of spike train data. Again, this is an undefined area as the actual data may not need to be transferred, rather a URI pointing to the initial data could be stored in the Context.

Services are invoked as required. These services operate on data within the workflow Context (or pointers to data somewhere elses) and return the results (or, again, pointers to data held in another location) to the Context for subsequent services to access.

At all times, services are invoked using the LiveID token that the User obtained at the beginning of the process.

The final stage of the workflow will upload the calculation results to the storage Web Service and the unwanted data stored within the execution Context can be deleted.

Now, again, I’m not sure how logical all of this is and how much of this type of functionality is already provided by the Windows Workflow software. Back to reading.

Well, that was fun. After a couple of days of EU proposal writing, it’s back to Visual Studio. Installation went smoothly enough and, on first examination, it looks like a pretty impressive piece of software. There’s certainly a lot in it. Today has been quite a steep learning curve, although it is a testament to the quality of Visual Studio that anything works at all after such a short time. I now have an installation of SQL Server and my first ever C# program which uses Hibernate to save objects directly to the database. This means that I can now use code like:

// Open a database session

ISession session = factory.OpenSession();

// Create a new object

User u = new User();

u.Name = "Hugo";

// Save to the database

session.Save(u);

session.Close();

This may not look that great but, take it from me, it is a lot neater than messing around with SQL queries and keeping these up to date as the object properties change. This is a pretty important piece of plumbing, because my next step is going to be to come up with a basic set of objects to represent Users, Data, Experiments etc etc. After today, I am fairly confident that the database won’t throw up any show stopping issues. Probably.

The biggest drawback of using commercial, as opposed to Open Source, software is the bureaucracy associated with getting licenses. It’s hard to get a definitive answer as to whether or not our MSDN subscription covers this sort of use. Some say it does, some say it doesn’t. So, after a day installing all of the Visual Studio express editions, I finally decided to bite the bullet and go with Visual Studio Beta2 and .NET 3. I guess we should be at the cutting edge, so I’ve set off the massive 3GB download. After some searching, it also seems like this might be the best way of getting Windows Workflow Foundation working.

A couple of words about the basic plumbing for what I want to do. I’m going to do things in as standard a way as possible, using all of the Windows Security etc. I figure that this will be the best way of staying compatible with new versions of the development software and libraries. I’m going to use Web Services (as opposed to .net remoting) for all of the communication between components, including those running on the same machine. This should let me move stuff around relatively easily as things progress. I am going to use the .NET version of Hibernate for as much of the database storage as I can. The reason for this is that life is too short for SQL and I’ve used the Java version of this and it’s great!

The next step, once all of the development tools are installed and running will be to make sure I can use of the libraries that I have got in mind. Then it will be time to look at the CAIRN and pick the features that I’m going to implement.

I always swore I’d never write a blog, but here I am. I guess now they’re considered passe, it’s ok for me to have one. Bit of background: I am the technical director of the North Eastern Regional e-Science centre (http://www.neresc.ac.uk), which is based in Newcastle University. I get involved in all sorts of projects, basically supplying e-Science based technical advice (hence the job title). The reason for this blog is, primarily, to document my experiences with writing a prototype e-Science research platform using Microsoft tools instead of the more traditional approach of fighting with Open Source. This way is easier, supposedly. The task I have set myself is to recreate, at a basic level, the software being developed by the CARMEN project (http://www.carmen.org.uk). This project aims to help Neuroscientists share data, workflows, results, etc. It is being written as a set of Java Web Services, uses the Storage Request Broker (SRB) for storage, and provides an AJAX based portal for the end users.

The plan, at the moment, is as follows: SQL Server for data storage, Windows LiveID for security and authentication, Windows Workflow Foundation for the workflow enactment, Silverlight for the GUI and ASP.NET for the Web Services hosting.

First step, therefore obtain the software, install it and start playing around. More later…!