Tuesday, November 28, 2006

Glen: "In 12 years, why not an iPod that can carry all scientific literature ever produced?"Of course, I am borrowing from the recent statement made by NikeshArora, Google's VP of European operations at the FT World Communications Conference, where he said "In 12 years, why not an iPod that can carry any video ever produced?"

All video ever produced is huge amount of content, and most probably (I may be mistaken) is much greater than the body of all scientific, technology and medical literature (books, articles, etc) or at least all, say, from the last 40 years. If you accept this premise, then the personal digital libraries/collections that are becoming very common (Beagrie2005, Borgman2003, Alvaraz-Cavazos2005) will have transmogrified themselves to becoming a world (or at least a Very Big Personal Library) unto their own.

It reminds me a little bit of some of the stories we heard when the Internet was just becoming part of main-stream society, and we laughed over quaint stories of (newbie) people who expressed how they wanted to "Download that Internet-thing..." (No references to tubes at this juncture in history, however!).

Well, in a certain fashion, this will be coming true. No, not that someone will be able (in 12 years at least) to download the Internet to some iPodesque device. But they will be able to carry around with them the enormous collective content which previously was held in floor-after-floor of library stacks (perhaps across multiple libraries), then in storage-unit after storage-unit of hot, expensive, energy-guzzling online storage.

How should/will publishers react? Will this force more DRM or more Open Access? What will be the role of libraries, if most of their content will be available on these small devices? How will search/browse interfaces and technologies evolve to meet these new demands and situations? What new business models/opportunities will evolve from this environment? What will be the impact on learning and higher education?

And, what will librarians be doing and with what will libraries be filled?

Friday, October 27, 2006

Extensible Text Framework (XTF): FLOSS platform for access to digital content XTF is the California Digital Library's amazing access platform for digital content. It is based on Lucene, a tool that is well known as a scalable and stable full-text engine. But XTF is more than Lucene, and is a full end-to-end system, offering über configurable indexing, quering and display. Java-based, completely XSLT-driven presentation-layer, extensible to things like Shibboleth, and has some very nice additioanl features like OAI-PMH provider and SRU. From what I can tell it does not have an SOA architecture, but offers a high degree of modularity which could easily be wrapped in Web services, etc

Wednesday, October 25, 2006

Google not cashing-in on Amazon linking?In my ever-vigilant interest in making sure that Google has covered all the funding streams it can ;-) , it seems to me that it is missing an important one: whenever I search Google and there is a link to a book on Amazon, the URL does not seem to have an Amazon associates ID. Why isn't Google an Amazon Associate member, cashing-in on the click-throughs to Amazon, getting a % of the sales from people it directs to Amazon? They are likely the top forwarder to Amazon and it shouldn't be too hard to insert their Amazon Associates ID etc. into their Amazon-bound URLs...

Saturday, October 21, 2006

Proprietary vs. Open Source development analogy:like training-for-a-race vs. running-a-raceIn reading about the new (to me, at least) transactional database engine for MySQL (v >= 5.1) called the PrimeBase XT storage engine (PBXT) I ran across an interview with its creator, Paul McCullagh. It seems that Paul was from the proprietary software development world, and was surprised by the response to the Open Source community around this project, and the new friends he has found. He felt it was a very different environment from what he was used to. In his words, from the article:

I like to take marathon running as an example. Think of the difference between training for a marathon and running a race. The closed source industry is like training for a marathon. You are basically on your own. The open source community is like running a race. Not because you want to win. Most people don't run a marathon to win, they run to complete. But during the race you experience a comradeship and sense of doing something together that makes running much easier then in training.

Tuesday, October 17, 2006

Tapping the power of text miningIn his closing plenary to the Access 2006 conference in Ottawa, Clifford Lynch listed text mining as one of the exciting areas of activity for the near future, soon (hopefully!) realizing its potential for discovery on large text corpora. In the September 2006 issue of Communications of the ACM, Fan et al. have a good general introduction to this area.

ACM & IEEE team-up forWiki for Discussing and Promoting Best Practices in ResearchThe scope is somewhat(!) narrower than the title suggests, focusing on the challenges in running and managing conferences in the areas on which the ACM and IEEE focus. The Wiki includes categories dealing with: acceptance rates (too high & too low), creative ideas (like lightning talks), examining allowing author responses to reviewer concerns, (technical) competitions, tracking reviews (if a paper is rejected by conference X and is usually re-submitted to conference Y, with some organizing & cooperation, the two conferences can have the reviews carried-over(shared)), two-phase reviewing, double blind submissions, scaling of programme committees using hierarchy and not agglomeration.

Saturday, October 14, 2006

Stan RuekerI've just learnt about the nora projectwhich is an amazing visual-based search construction interface from Stan Rueker, University of Alberta, in his David Binkley Award presentation at Access 2006 in Ottawa. today. I can see why see was presented this award, as he is building truly beautiful and functional prototypes...

The original 40 spaces were taken up and the 28 people on the waiting list were eventually added to an additional fest, called the Ad Hoc Fest. The Hack Fest was hosted at Carleton University, and the Ad HocFest was held at the Library and Archives of Canada.

Access 2006: First Day v1.0While I am here at the Access 2006 conference, I can't help that I am missing out on some great Web 2.0 discussions, knowing that I am missing out on going out with Michael Stephens et al. to the Thai restaurant at Internet Librarian International in London, which I did do last year. Richard Wallis reports in Panlibis that he has the good fortune of doing this this year.

That said, I am sure that Access will not dissappoint, as it has always been a great conference for the library techie crowd...

Tuesday, October 03, 2006

XML11: Amazing AJAX ToolkitXML11 is a very exciting AJAX toolkit inspired by the X11 protocol. It allows Java applications to be rendered on a web browser, but also under Java Swing and Java AWT. In addition (and very wild), as there is an X11 server implemented (WeirdX) in Java, you can also have an X11 application working in a web browser!

Seeing xcalc and xeyes rendered on Firefox, via AJAX, WeirdX, AWT and X11 is borderline bizarre. Check out the Google TechTalks video by Arno Puder, it is quite amazing. I would have liked to have seen Firefox running inside of this convoluted set of protocols and environments inside of Firefox.

He also coins a wonderful phrase: "JavaScript is the assembly of the Web...", basically claiming that while JavaScript is fundamental to the Web (or at least AJAX), no sane person wants to use it (like assembler today: is is a "pain" to write in). You would prefer to use a proper high level programming language like Java, C++, etc. I have to agree...

Logic code can either run on the original platform (X11, Java) or can run on the client via a Java-bytecode-to-XML-to-XSLT-to-JavaScript (wow!!) cross-compiler. This is configurable at the class level, I believe. If something on the browser needs a component on the server, some transparent middleware looks after making this connection...

They are also looking at getting VNC (via a VNC Java client) to work inside of a browser, and looking at something that works with .NET...

Tuesday, September 26, 2006

Yes, the cryptically titled conference is on once more this year, in Ottawa, and it looks like I'll be attending yet again. While you won't have to be exposed to any presentation of mine, it appears that I have been gang-pressed into moderating the Friday afternoon and Saturday morning of the conference. And I should also be blogging it, so stay-tuned for some exciting stuff!

Wednesday, August 02, 2006

I recently received an email about the new Codex canadiensis from the Library and Archives of Canada et al. A very interesting collection, but when I checked the collection pages all I see (except for the splash page) for metadata is:

<!-- META START --><!-- META END -->

No DC.anything, not anything at all. Quite surprising, and more than a little disappointing from a National Library-type organization.

Of specific interest to the research community in my neck of the woods (Canada) are the statements on last three slides (#80-82):

Canada is losing about $640 million dollars worth of potential return on its public investment in research every year.

The Canadian Research Councils spend about $1.5 billion dollars yearly, which generate about 50,000 research journal articles. But it is not the number of articles published that reflects the return on Canada’s research investment: A piece of research, if it is worth funding and doing at all, must not only be published, but used, applied and built-upon by other researchers. This is called ‘research impact’ and a measure of it is the number of times an article is cited by other articles (‘citation impact’).

The online-age practice of self-archiving has been shown to increase citation impact by a dramatic 50-250%, but so far only 15% of researchers are doing it.

We will now apply only the most conservative ends of these estimates (50% citation increase from self-archiving at $100 per citation) to Canada’s current annual journal article output (and only for the approximately 50,000 Canadian articles a year indexed by the Institute for Scientific Information, which covers only the top 8000 of the world's 24,000 journals). If we multiply by the 85% of Canada’s annual journal article output that is not yet self-archived (42, 500 articles), this translates into an annual loss of $2, 125, 000 in revenue to Canadian researchers for not having done (or delegated) the few extra keystrokes per article it would have taken to self-archive their final drafts.

But this impact loss translates into a far bigger one for the Canadian public, if we reckon it as the loss of potential returns on its research investment. As a proportion of Canada’a yearly $1.5bn research expenditure (yielding 50,000articles x 5.9 = 295,000 citations), our conservative estimate would be 50% x 85% x $1.5.bn = about $640 million dollars worth of loss in potential research impact (125,375 potential citations lost). And that is without even considering the wider loss in revenue from the loss of potential practical applications and usage of Canadian research findings in Canada and worldwide, nor the still more general loss to the progress of human inquiry.

The solution is obvious, and it is the one the RCUK is proposing: to extend research’s existing universal 'publish or perish' requirement to 'publish and also self-archive your final draft on your institutional website'. Over 90% of journals already endorse author self-archiving.

A recent UK international survey has found that 95% of authors would self-archive – but only if their research funders or their institutions required them to do it (just as they already require them to ‘publish or perish’).

The actual experience of the f institutions that have already adopted such a requirement (CERN, U Southampton, U. Minho, U Zurich, Queensland U. Tech) -- has shown that over 90% of authors will comply.

The time for Canada to close its own 50%-250% research impact gap is already well overdue. Canada should immediately follow the UK model, adopting the web-age extension of "publish or perish" policy to "publish and self-archive on the web. " This tiny and very natural evolutionary step will not only be of enormous benefit to Canada’s researchers, its institutions, its funders, and its funders' funders (i.e., the tax-payers), but it will also be to the collective advantage of worldwide research progress and productivity itself.

Tuesday, June 27, 2006

In June I spoke at the GeoTec conference in Ottawa on the past, present and future of Web-based mapping and the implications for data providers, mainly national- and similarly levelled mapping agencies. The meeting had particular resonance for me as it was also celebrating the 100th anniversary of The Atlas of Canada, where I worked in the early-mid 1990s, and where we did some very exciting work in early Web-based mapping. NAISMap was created by various explorations I made, and with the support of my director at the time, Jean Thie, and the efforts of the National Atlas team, NAIS-on-the-Net was born, with NAISMap being playing a central role. Ah, those halcyon days!

Monday, June 12, 2006

I just found out about this conference: The International Conference for Science & Business Information. The 2005 proceedings are online (http://www.infonortics.com/chemical/ch05/05chempro.html) and there are some very interesting presentations dealing with the transformations we are seeing in how scientific research is done and how the scholarly publishing world is also changing.

Monday, June 05, 2006

SOA / Web Services as Service Personnel

Explaining SOA / Web Services to non-technical people is not always easy.

Web services can -- depending on your business and the granularity in which you implement your web services -- be modelled as the service people who previously -- in an earlier non-web age -- performed those services.

This entry is an attempt to make web services more understandable to non-technical people.

Let's start. Let's use a library which has a document delivery service (for which is charges), which in the present web age delivers documents in both analog (paper photocopies) and digital form. We'll start with how one might have done this in the past (I admit this example presents things in their extreme: in the real world one service person performed a number of service roles. But please bear with the example).

You are a researcher. You are looking for the following article, i.e. you want a copy of the following article:

You contact the "Find article" (FA) person by telephone at the library. You give them the article information. They call you back and indicate that they do have the article. They give you a unique identifier for the article that you write down, and give you the telephone number for the "Deliver article" (DA) person.

With this unique identifier for the article that you want, you then telephone the DA person. The DA person takes the article identifier and asks you for what institution you work. You reply "The Foobar Institute". The DA person indicates that -- for clients from your institution -- that this particular article is not available free-of-charge -- and indicates that they cannot give you the article until you make arrangements with the "Pay for article" (PFA) person. You telephone the PFA person, who takes the identifier of the article and tells you that it will cost you $10 for the article, how would you like to pay? You give them your credit card number, and they say that they will call you back. They call the credit card company authorization (CCCA) person (external to their organization), and give them all the information about you, the purchased article and the price. They get an authorization number from the CCCA person. They call you back and indicate your transaction was successful, and give you the transaction number.

You then call back the "Deliver article" (DA) person, and give them:

the document identifier

the credit card transaction number

They call you back and indicate that they do have the article and that their records now indicated that you have paid for the article but have not been delivered the article, and take your delivery information in order to get you the article. You get a photocopy of the article in the mail.

Example 2: Have someone else do the work for you:

Instead of doing all of the work described above in Example 1, you contact an intermediary service (IS) person, with whom perhaps you already have a relationship (i.e. they have your institutional affiliation information , you payment information, your delivery information, etc). The role of the IS is to take your information and to do all of the interactions that you might have done with the various primary service persons.

They are your proxy or perhaps act as a broker.

Think of asking your research assistant to track down and get a copy of this article, or -- in a different example -- a travel agent acting for you, doing all of the interactions you might have done, such as booking flights with the airline booking service person (pre-Sabre), reserving rooms with hotel reservation service person, and renting a vehicle with the vehicle rental service person.

The IS could survive on any one of a number of business models:

They might charge you direcly for this service.

They may have a brief paid advertisement that you hear when you first telephone them.

They may receive a fee from the companies whose services they have brokered for you (in which case you would have to be careful as they may choose the services for which pays them the highest fees, as opposed to delivering to you the best rates/quality).

To the Web: When people are not involved (except for the client)

Now, let's move this to the Web and SOA / Web services world.

Instead of a telephone, you are using some kind of web browser, on whatever device.

Instead of service people, your web browser is interacting with a series of web applications which invoke web services. These applications reside at the library, on their web servers.

The first application takes your article information and, using the "Find article" web service, tries to determine if the library has or can get, this article. They can. A web page is presented to you indicating this, and a link to the "Deliver article" web page is also presented to you. Embedded in this link is the article identifier.

You click on the "Deliver article" link, which presents you with a web page form from which you select the institution to which you belong (or it uses you IP to discover this, or you type in a userid/password, etc.). When you submit this page, the "Deliver article" web service is invoked and the application returns a page that tells you that you do not have free access to this article, and that you must pay for it, and presents a link to the "Pay for article" page. Embedded in this link is the article identifier and a key to your ID information.

The "Pay for Article" page has a form in which you type in your credit card information. When submitted, this invokes the "Pay for article" web service which:

Uses another web service to authorize the credit card transaction

Records the transaction in a database (or perhaps uses another web service to do this).

A page is now presented indicating that the transaction was successful, and a link is made back to "Deliver article". Embedded in this link is the ID of the transaction. The "Deliver article" application again calls the deliver article web service which, using the transaction ID, gets the article information, etc and sees that the article has been paid for and delivers it to theuser's browser.

------

These two modes -- pre-web using people and post-web using web services, are not that different. It is the same business model. Instead of using service people and the telephone (and regular mail) to deliver the article, a web browser and web services (and the Internet) are used.

-----------Notes

This is a simplified implementation and does some things in a naive fashion, security-wise, to simplify explanation. Things like the credit card transaction number would never be passed around in a web environment in the fashion describe above.

Web services do not always map to services which were previously performed by service personnel. But there are many examples where these are the case and may be the appropriate way to model and implement your system. Use cases often (but not always) are mapped to specific real-world distinct services, which are often part of the services provided by a service person.