Karen Spärck Jones

12:01AM BST 12 Apr 2007

Karen Spärck Jones, who died on April 4 aged 71, was Professor of Computers and Information at Cambridge University and, from 2000 to 2002, a Vice-President of the British Academy; she spent more than half a century working on information retrieval (IR) and natural language processing (NLP), fields in which she influenced a generation of computing scientists.

She was described by a fellow don at Wolfson as "one of the most cerebral people I ever met", someone for whom the word "bluestocking" might have been coined. Certainly she was one of the most distinguished women to work in what was - and to a large extent remains - a highly masculine environment, and received numerous awards; notably, in 2004, the Association for Computational Linguistics (ACL) Lifetime Achievement Award, and in 2007 the British Computer Society's Lovelace Medal and the Association for Computer Machinery/AAAI Allen Newell Award.

Karen Ida Boalth Spärck Jones was born on August 26 1935, the daughter of A Owen Jones and Ida Spärck. She was educated at Girton where, after her undergraduate degree, she completed a doctorate which was published as Synonymy and Semantic Classification, a paper later recognised as having been well in advance of its time in its use of both statistical and symbolical techniques in NLP.

Her research began in the late 1950s at the Cambridge Language Research Unit, to which she had been recruited by Margaret Masterman, despite the fact that her only qualification for research into language and information processing (LIP) was a year reading Philosophy - "though this was a good qualification in fact," Karen Spärck Jones later pointed out. There she also met Roger Needham, whom she married in 1958. He was later to become Professor of Computer Systems at Cambridge, and set up Microsoft's first research laboratory outside America.

The three of them created an immediate impact with a paper on the analogy between mechanical translation and library retrieval. Presented at the International Conference on Scientific Information at Washington DC in 1958, it has been widely cited ever since, and regarded as highly prescient.

Karen Spärck Jones's work at the Unit focused on the use of thesauri for LIP, a job which involved translating to punched cards a series of words and their near-synonyms, and then attempting to devise more sophisticated ways of distinguishing ambiguous terms. One of her examples - "The farmer cultivates the field" - showed that though the word "field" has a range of meanings from "land" to "subject", adding a general underlying concept such as "Agriculture", which applies to "farmer", "cultivate" and "field", would select "land" as the intended meaning.

Existing thesauri, such as Roget's, proved inadequate for the task, and the obvious solution - at least to Karen Spärck Jones - was to construct a better one, ideally automatically, using text distribution data for words and applying statistical classification methods to the data. The absence of any existing corpora on which to base her method (nowadays, the entire web would be available as a subject) did not deter her one bit, and she used dictionary definitions, which emphasise synonymy, then refined them by drawing on Needham's theory of classification, which grouped together "clumps" in order to find larger classes of words with similar behaviours.

In the 1960s she began to concentrate on IR, and to develop inverse document frequency (IDF) term weighting. Her paper on it, published in the Journal of Documentation in 1972, and the responses of her colleague Stephen Robertson, provided much of the initial impetus for a technique which is now applied in web search engines, and has also filtered into areas of NLP. Her subsequent collaboration with Robertson was an attempt to establish the value of relevance weighting for terms, which turned out to be the basis for a highly successful model for IR.

In 1965 she had become a research fellow at Newnham, a position she held for three years before becoming a Royal Society Research Fellow. In 1971 she published Automatic Keyword Classification for Information Retrieval, which was followed by Linguistics and Information Science (with Martin Kay, 1973). That year she became Librarian of Darwin College, of which she was an Official Fellow from 1968 until 1980, and the next year was appointed a senior research associate, then (1983-88) GEC Fellow and (1988-94) Assistant Director of Research. She was Reader in Computers and Information at the Computer Laboratory from 1994, and became Professor in 1999.

Karen Spärck Jones was instrumental in establishing the Intelligent Knowledge Based Systems research area in the UK Alvey programme, which funded hundreds of projects and provided a huge boost to AI and language work during the 1980s.

Her more recent work had been on document retrieval, including speech applications, database query, user and agent modelling, summarising, and information and language system evaluation as well as projects on automatic summarising, belief revision for information retrieval, video mail retrieval, and multimedia document retrieval, the last two in collaboration with the Engineering department. As an influential figure on evaluation programmes, Karen Spärck Jones was also involved in setting the standards for a large proportion of the work in NLP.

Karen Spärck Jones was president of the ACL in 1994, and spoke at the first Grace Hopper Conference. She taught the MPhil in Computer Speech and Language Processing at Cambridge for many years and supervised numerous PhD students, over a remarkably varied range of topics within NLP and IR. Besides papers in scholarly journals, her other publications included editing or co-editing Information Retrieval Experiment (1981); Automatic Natural Language Parsing (1983); Readings in Natural Language Processing (1986); Evaluating Natural Language Processing Systems (1996); Readings in Information Retrieval (1997); and Computer Systems: theory, technology and applications (2004). Until her final illness, she never really retired from her work, though she had nominally been Emeritus Professor at the Computer Laboratory since 2002.

Karen Spärck Jones thought it very important to get more women into computing. "My slogan is: 'Computing is too important to be left to men'," she said. "I think women bring a different perspective to computing; they are more thoughtful and less inclined to go straight for technical fixes. My belief is that, intellectually, computer science is fascinating - you're trying to make things that don't exist."

She and Needham built their own house together at Coton, a village two miles outside Cambridge, working on site in the mornings and returning to their respective theses in the afternoons and evenings. They lived happily in the modest wooden building for many years, until the noise from the M11 finally drove them out.

Karen Spärck Jones had nothing in the way of light conversation, but instead ploughed straight into whatever academic subject was under discussion, as if responding to a seminar paper.

She could occasionally tear herself away from computing and shared with her husband an interest in sailing; they bought their first boat in 1961 and later sailed an 1872-vintage Itchen Ferry Cutter.

But her chief happiness in her marriage, as she declared in her speech on the occasion of receiving the ACL's Lifetime Achievement Award, was that: "I could always talk to him about my research, and he always encouraged me."