Thursday, August 31, 2006

Well the site is now live, which is a relief. Alas, not all the content decided to join us! Therefore, we're currently re-indexing things. As it stands all the titles, URLs, publication information are there. However, we're missing a small (but significant) proportion of text from the body of some publications.

Still it should all be there in time for our scheduled launch date of tomorrow!

We set a target of making TRIP free-access at the start of the year. By March/April the decision had been made. We then needed to give 6 months notice to our distributor (Update-Software) and here we are. Three-months before the end of the year with free-access and a radical re-design of the site.

In previous blog entries (as part of the 'Countdown to free-access' series) I have highlighted new features. The last post, in this series, is to highlight a simple fact - we will be free-access. There will be no subscription charges to search TRIP. Clinicians (and non-clinicians) will be able to go to www.tripdatabase.com and have free use of the TRIP Database.

Wednesday, August 30, 2006

The biggest change to the new, free TRIP is the search algorithm. For the last 5+ years the TRIP search has been dominated by the distinction between a ‘title’ and ‘title and text’ search. This allowed for great searching. The rational being that if the document was about asthma it would be mentioned in the title and the vast majority of searches were on title only. This presents a couple of principle problems.

Firstly, if you do a search on asthma you would generate (even as a title search) a large number of results. This makes the task of identifying relevant material difficult. Why? Because users rarely want information about asthma. They may be interested in asthma and steroids, or asthma and allergies – rarely just asthma. This over-simplified search was highlighted in Professor Paul Glasziou’s evaluation which showed most people just searched for the actual disease. So if you wanted to look at asthma and steroids the best search would be:

1) Title search for asthma2) Title and text search for steroids3) Combine the results4) Click on a results categories to see any results

So 4 steps to see any results – in hindsight that seems ludicrous!

Secondly, Google – well it’s a nice problem. But most people who use TRIP will invariably be more familiar with Google. So they’re used to adding any number of terms and letting Google quickly return results, which it does very skilfully! Also Google tends to be searched using multiple search terms. The average number of search terms used per search is gradually increasing over time, surely a reflection that users are becoming more sophisticated/discerning. We’re hoping this increased use of terms will be reflected in the new TRIP.

So, the challenge was to try and mimic the Google search interface (i.e. no ‘title’, ‘title and text’ distinction) yet still return good results. To a large extent we’ve produced a system that works well. We’re not saying it’s perfect and our role, from now, is to continue to improve on the search algorithm. The actual algorithm is based on three main variables:

1) Publication date – more recent articles score more highly than older documents2) Publication – each publication (e.g. Cochrane, Bandolier etc) are given a score based on their rigour and clinical usefulness. This is based on our experience of answering 5,000+ clinical questions – we tend to know which publications answer clinical questions more than others. Our scores reflect this experience.3) Textual analysis. The main issue is where the search terms appear. If you do a search for asthma and steroids if a document has both terms in the title it gets the highest score, if one term is in the title a lesser score while if the terms only appear in the text it scores lowest. Another, lesser, component is term density. If asthma is mentioned 50 times in a document it scores more highly than a document which only mentions it once.

The above variables are then combined to produce the results.

Given the nature of the search system good results for one person might be bad results to another and in testing we occasionally get results which surprise us. However, on the whole we are getting excellent results, this is our experience and from feedback from our external testers. But, we'll continue to refine and enhance the search - feedback welcome!

Tuesday, August 29, 2006

One of the most frequent question types we get at our various clinical question answering services relate to drug information e.g. does this drug interact with that drug, what are the impliactions for pregnanct women etc. For this reason we have created the Drug Box. To start with we have used the most frequently prescribed drugs (around 200 of them). Anyone searching for information on the drug will be presented with the usual search results. However, where the sponsored links usually are, will be the Drug Box, see below:

We're very pleased with this new service and new drugs will be added over the next few weeks and months.

Monday, August 28, 2006

Moving to free-access opens up all sorts of opportunities to 'distribute' TRIP. We've got a number of ways:

Incorporation of search box into third-party websites. We'll be supplying HTML for webmasters to incorporate a TRIP searchbox into their own web-pages. We had this feature previously (before we went subscription-based) and this proved very popular.

Web-services. This allows third-party resource to search TRIP via a SOAP interface, the results are returned in an XML format allowing the third-party resource to seamlessly link the TRIP results into their own application.

Sunday, August 27, 2006

While TRIP was closed to all but subscribers, the subscriptions helped us develop the site. However, if you remove the subscriptions you remove the revenue. The bottom line is that for TRIP to survive and continue to improve it needs a revenue, hence the adverts and sponsored links.

Adverts via Google ads. These are positioned in a relatively minor part of the site and are unlikely to generate a significant income. However, as we're testing the business model we thought we should try this method. We are currently being searched around 15-20,000 times per day (likely to increase significantly) and if 0.1% click on adverts per day that should make for an interesting income.

Sponsored links. This system allows users to purchase a keyword so that when someone searches on, say, hypertension the sponsors messages gets displayed. This is an interesting experiment to see if anyone likes this method. We're hoping it'll be of some interest, especially given our volume of usage.

In an ideal world we'd prefer not to have adverts at all. We're hoping it's a small price to pay for free-access

Saturday, August 26, 2006

EBM, Medical Images and Patient Information Tabs have been introduced, previously all the content was mixed into one search interface. As the number of results categories increased so the usability started to suffer.

An analysis showed that there were distinct search types, represented by our three search tabs. These tabs allow for easy movement between these domains.

EBM – our core material.

Medical images – an area we’re not renowned for but this is an excellent feature and is probably the largest, free, searchable collection of medical images on the internet.

Patient information leaflets – The principle aim in TRIP is to support clinicians in answering their clinical questions. However, these same clinicians frequently have a desire to locate patient information leaflets to give to their patients.

Tabbing is seen in most general search engines and the inclusion in TRIP will further enhance usability.

Friday, August 25, 2006

Each day between now and launch I will be highlighting a new feature on the site. With 7 days to go I'll highlight the inclusion of selected peer-reviewed journal articles. These are being taken from 2 routes:

1) The big five general internal medicine journals - NEJM, JAMA, Lancet, BMJ and Annals of Internal Medicine. All articles, published within the last 5 years, will be included. We're harvesting the content automatically from PubMed via the eUtilities.

2) BMJ Updates. I never fully appreciated the scale of this project. Basically, they scan over 100 'premier' clinical journals and extract only those of high quality and of clinical relevance and interest (as judged by at least 3 clinicians from around the globe).

TRIP has historically focussed on secondary-review material and that will remain to be our focus. Our search algorithm will give precedence to secondary material but the system will ensure that highly relevant articles from these two sources are near the top of the results. The two sources will offer highly relevant material for our clinical users.

Wednesday, August 23, 2006

We have now solved the last major issue with the new version of TRIP. Only a few really minor issues are outstanding. Not wanting to tempt fate but getting it all sorted with more than a week to spare - I think that's a record for us!

We're starting to get excited as, with the new tweaks, the search algorithm is behaving wonderfully. We genuinely believe this will be a significant 'new' tool for clinicians seeking answers to their clinical questions.

Second, the 'official' Google blog has reported on the 'related articles' feature on Google Scholar. I've always liked the related article feature on PubMed and the GS version appears to work as well. Whether it would be as useful on TRIP is another matter. Related article style features have traditionally relied on latent semantic indexing (see wikipedia article for more detail). This approach can also be used for all sorts of other useful semantic 'tricks' such as synonym analysis.

I often wondered if the related article feature could be used to keep the Q&A answers up to date. In other words, use the LSI technique to look for new articles in PubMed published since we published our answer. The idea seems plausible but implementing it is a significant issue and quite possibly not worth the potential gain...

Sunday, August 20, 2006

As mentioned in previous posts we get lots of positive comments from users of the service. Currently we are experiencing high volumes in the NLH Q&A Service and have had to, temporarily, stop taking new questions. This has resulted in a few e-mails, one in particular caught my eye:

"Pity that you are inundated with questions! This is one fo the few worthwhile services we as GPs have, and has a TREMENDOUS IMPACT ON CLINICAL PRACTICE for busy GPs. It wouold be shame to be constrained by staff shortage. I have disseminated the usefulness in many meetings. Perhaps - we should stop informing GPs!!"

Saturday, August 19, 2006

While away the web people have been busy beavering away and getting closer to the finished article. A few graphics and some text need dropping in. I was confident enough to send test URLs to two colleagues whose opinion matters a great deal to me. Fortunately, they got back to me with useful constructive criticism and a general positivity about the site.

Another glimpse, this time of the results page with a new feature the 'drug box' which highlights drug interactions, warnings etc. As well as the 'standard' results.

Friday, August 11, 2006

The new site is nearly, functionally ready (some design needs dropping in), with my web-people working for a few more hours before I head off on holiday. In other words they have a few more hours to correct one problem. Basically I had the algorithm set-up pretty well on the test harness. However, now we're using the algorithm on the new site with updated content something's gone wrong. I'm hoping it's not a big deal.

When they've fixed that I can relax and enjoy the holiday. I'll send the test URL to a number of tame users - mainly a load of trusted Q&A people from ATTRACT and the NLH Q&A Service as well as the TRIP clinical director (whose also a general practitioner and TRIP co-owner). They'll give it some heavy use over the week. Then, on my return, I can pass on any issues to my web-people. Hopefully giving them 2 weeks to finalise any changes.

Tuesday, August 08, 2006

Mixed results for this test of the city-wide wi-fi roll out in Mountain View, California - supplied by Google. As the author of this piece on the BBC News website said:

"After several hours grilling Google's Mountain View wi-fi network I realised both the power of the service, but also the present-day limitations and youth of the technology.

While the service was ubiquitous throughout the city, it's not as reliable, as fast, or as easy to use, as my home internet connection or my cell phone. Not yet anyway.

Start-up companies and major manufacturers are working on all these issues. They just take time"As wi-fi takes off and expands we really must invest in optimising TRIP for searching on laptops, PDAs, mobile phone......

Monday, August 07, 2006

As TRIP moves to free access we are removing our classic 'title' or 'title and text' search distinction. When TRIP initially started it was a title word search only. Then we started spidering content and we added a text search as well. However, our various search analyses have shown this method to be problematic. For instance if you wanted to search for 'asthma and steroids' you would need to search for 'asthma' as a 'title' word, 'steroids' as a 'title and text' word and then combine them. This equates to three steps (two searches and then a combined search). As Dean highlighted in his recent blog entry - searchers are using more search terms per query and expressing more complex information needs.

Google, the main general search tool, allows users to add multiple terms and normally returns very good result sets. Google is setting the standard for how users search.

So what's the coincidence?The NLH have announced a new search format - adopting our current (but soon to be moved to advanced search) search method - of 'title' and 'title and text' distinction - see screen shot. I can't help feeling flattered with them adopting TRIP's search method. However, I feel its potentially a wrong move.....

Wednesday, August 02, 2006

I received the test version of the site on Friday and - no real content to search - the indexing of content had failed! We've still got problems with the indexing but these should be fixed by the end of the day. Other problems include the test server being woefully underpowered to test the site, no advanced search, no content from the big five journals (NEJM, JAMA, Lancet, BMJ, Annals of Internal Medicine). Irrespective of these issues we should still be live for the start of September!

As means of a progress report I've posted the above picture. Missing text and graphics - but no 'title' and 'title and text', separate tabs for 'Medical images' and 'Patient information' and a few other 'bits and bobs'. Roll on September......