July 25, 2008

The finishing of a PhD is more of a whimper than a bang. It has been seven months since I handed in my thesis, and despite having had only the most minor of revisions (total time approximately 4hrs), I have only just received the certificate for my masterpiece:Whilst there are often complaints about the inability of government to work as effectively as ‘the marketplace’, we should all be grateful that academia is not in charge of the country; nothing would happen for years on end.

As many weeks have also passed since I sent my thesis to the University’s electronic repository, and it still hasn’t appeared online, I have decided to put it online myself.Title:Web Manifestations of Knowledge-based Innovation SystemsAbstract:Innovation is widely recognised as essential to the modern economy. The term knowledge-based innovation system has been used to refer to innovation systems which recognise the importance of an economy’s knowledge base and the efficient interactions between important actors from the different sectors of society. Such interactions are thought to enable greater innovation by the system as a whole. Whilst it may not be possible to fully understand all the complex relationships involved within knowledge-based innovation systems, within the field of informetrics bibliometric methodologies have emerged that allows us to analyse some of the relationships that contribute to the innovation process. However, due to the limitations in traditional bibliometric sources it is important to investigate new potential sources of information. The web is one such source. This thesis documents an investigation into the potential of the web to provide information about knowledge-based innovation systems in the United Kingdom.

Within this thesis the link analysis methodologies that have previously been successfully applied to investigations of the academic community (Thelwall, 2004a) are applied to organisations from different sections of society to determine whether link analysis of the web can provide a new source of information about knowledge-based innovation systems in the UK. This study makes the case that data may be collected ethically to provide information about the interconnections between web sites of various different sizes and from within different sectors of society, that there are significant differences in the linking practices of web sites within different sectors, and that reciprocal links provide a better indication of collaboration than uni-directional web links. Most importantly the study shows that the web provides new information about the relationships between organisations, rather than just a repetition of the same information from an alternative source. Whilst the study has shown that there is a lot of potential for the web as a source of information on knowledge-based innovation systems, the same richness that makes it such a potentially useful source makes applications of large scale studies very labour intensive.

Obviously the above abstract will have all but the greatest dullard champing at the bit, and I have therefore made it available in both PDF and Word Document formats.

April 25, 2008

One of the problems with the web is that it is just too damned big: just as you think you are uptodate with everything in one area, you suddenly realise that there is a whole other area that you has totally passed you by. For me that area is Open Data: the practice of making data freely available to everyone. Whilst I had heard a few rumblings, I didn’t really appreciate how much was going on, or some of the tools that were available, until reading an article in the last issue of Online Magazine. Webometricians create massive amounts of data, and whilst we know we should do more, we generally use the data we gather as the subject of academic papers, or blog posts, then it sits on our hardrives until we forget where it was from and what it represents (personally I have gigabytes worth of data in text files that is now totally meaningless to me).

In future I will definately make a concerted effort to try and make data available on Open Data sites (whether people like it or not). Not only due to the movement’s worthy ethos, but for the selfish reasons of a useful repository and the benefits of some useful tools. Of the many open data sites my first experimentation has been with IBM’s Many Eyes (http://services.alphaworks.ibm.com/manyeyes/home), which, whilst suffering from a few bugs, has some great visualisation applications, including network diagrams:

This particular network comes from my, ever-so-successful, PhD thesis. It shows the interlinking between the web sites of 64 members of the Association of the British Pharmaceutical Industry, as seen through the Microsoft Live Search API (in the glory days of access to both the linkdomain and linkfromdomain operators). Obviously not particular awe-inspiring here, but earth-shattering in the context of 130 other pages.

Additional open data sites include Data360, Swivel, Freebase, and many more. Whilst I’m sure that different people will find different sites more appropriate to their needs, the main thing is that we (espicially academics) start getting the data out there…and more than the off the cuff 95 lines I uploaded for the above diagram.

March 13, 2008

When it comes to boring jobs I like to think I have had some of the worst: taking the shells off of hard boiled eggs, taking the green bits off of tomatoes, and, most recently, classifying web links. Yes, I can classify the links at home with a constant supply of coffee and the music of my choice, but it is still one of the most boring jobs. The reason: web pages come in ever imaginable form, mostly with no discernible purpose, with links placed just because the web owner can. Classifying the web is like herding ADHD cats.

The good and interesting sites that we visit every day are surrounded by a web of crap that we only usually trip across if we are unlucky. These are not necessarily offensive sites, just sites that are absolute rubbish: spam, half-formed, badly written, orphaned. Classifying the web means that we have to wallow in this web of crap. Its not like classifying a library of books, but rather like classifying a whole world of which 90% is the council rubbish tip.

March 10, 2008

Thanks to a single link on the BBC’s delicious roll on Saturday night, yesterday saw Webometric Thoughts get its highest number of hits ever. Whilst for many sites 121 absolute unique visitors in a day (according to Google analytics) wouldn’t be worthy of note, the webometric blogging community have fairly low aspirations.

What is interesting, from the perspective of a Google Analytics junkie, is the difference between the amount of traffic this link drove in comparison to a similar on the BBC’s delicious roll on the 16th January. Whilst the January link only drove 17 unique users to my site, Saturday’s link drove 102 users over a three day period!

Was the extra traffic all due to the extra time the link was visible on the BBC? It was visible a lot longer, but weekend traffic is often slower. Or was it the topic of the posts? The first was about ISPs, whilst the second was about the iPhone. It seems equally likely that the difference in the traffic is due to the link’s anchor text. Whereas the first text referred to ‘David Stuart research fellow’, the second link merely referenced the blog ‘Webometric Thoughts’ (AC seems to have done much more digging than NR).

December 13, 2007

Whilst I hate search engine optimisation, it doesn’t mean that they don’t occassionally come out with some useful tools. Search Engine Roundtable have just brought to my attention a Firefox PlugIn by Joost de Valk which provides a PageRank and the anchor text for each of the inlinks found through either Google Webmaster Tools, Yahoo Site Explorer, and Microsoft Webmaster.