Web 2.0 and the Future of Competitive Intelligence - Text Analytics, Portfolio Analysis, and the Real-Time Value of Digitised Content

Digitised content, ubiquitous data access and information overload are hallmarks of the Web 1.0 revolution of the 1990s. As the information on a company, an industry and the competitive landscape became digitised, new media of relevant content also emerged – company websites, FAQs, blogs, online journals, social networks and other new media. The combination of overwhelming digitised information with new content has offered possibilities and challenges for competitive intelligence. While Wall Street has historically been guided by the numbers, more defining information in brand messaging, consumer feedback, online reviews, company reports, patent applications and other less structured content can alter future performance. At the same time, innovations in text analytics have allowed for robust data mining from unstructured heterogeneous sources.

This presentation overviews how these innovations in text analytics combined with the explosion in unstructured text sources create unique opportunities to monitor, categorise, trend, track, visualise -- and even predict -- competitive landscapes in a Web 2.0 world. This transition is a frightening change for stodgy business, but creates opportunities for savvy data miners and iterative, flexible, entrepreneurs.

Traditional information and lab data management systems are designed for documentation and operational needs or commercial aspects of external information providers. The result is a wide variety of silo applications with different content, functionalities and technologies. In the R&D process, using information from different sources in the right context is necessary, which requires intelligent searches in truly integrated systems.

This presentation sheds light on the limitations and disadvantages of current solutions and presents the architecture of a novel system that is based on search engines und portal technology.

11:15 - 11:45

Cultivating the Corpus - Professional Information Retrieval

Most of the current IR (Information Retrieval) research efforts point towards its application in the consumer domain, where the requirements tend to focus on broadness rather than depth. By contrast, professional IR needs maximum precision, recall AND efficiency. The experimental investigation and the practical evaluation of existing methodologies has shown that there is little probability of finding a single algorithm that will satisfy all the needs of professional patent searchers. Hence, there is a need for a variety of different Natural Language Processing (NLP) techniques to be applied on the global patent corpus in order to significantly improve patent retrieval.

Recursively generating metadata from data and metadata from metadata, the various refinement processes let the information store grow and allow the user community to actively "Cultivate the Corpus". The main limiting factor in this endeavour is the sheer size of the data. Like most real world collections, the patent collection is of exceptional size. More than 60 million large documents containing a vocabulary of > billion distinct terms lead to a repository size much larger than 100 terabytes after generating NLP metadata. To keep processing time reasonable, a special discipline of HPC (High Performace Computing) techniques has emerged: Semantic Super Computing (SSC). In SSC the traditional parallelisation of tasks is extended by the field of reconfigurable computing by the use of algorithmically generated processor architectures, explicitly designed and tuned for NLP purposes. The solution to the professional IR case seems to be more like a critical path to take than a single scientific formalism. As for many complex systems, evolution seems to be the most effective way for progress. The IRF (Information Retrieval Facility) therefore created and maintains an infrastructure of information and technology as "ecological environment" involving all relevant parties: patent information professionals, information scientists and IT experts. Together they created an extensible software infrastructure, the "Leonardo" Ecosystem, in an agile development process.

Within this framework, technologists can simultaneously create and refine new tools and use the community channel to communicate with their end-users. The benefit for the end-users on the other side is a closer match between the tools for their actual information needs and existing workflows. This feedback mechanism corresponds to the "Matrixware Innovation Cycle".

The IRF and its annual convention in Vienna, the IRFS (IRF Symposium), are trying to shape the understanding and to sketch possible solutions for the real world professional context of patent retrieval.

11.45 - 12:15

Semantic Searching - Challenges and Solutions

Latent Semantic Analysis (LSA) is a powerful information retrieval tool that provides searchers with an effective way to locate and semantically rank related documents while overcoming the search problems associated with synonymy and polysemy. While advanced searchers still rely heavily on Boolean searching because of its high precision, the quality of Boolean searching is dependent on the searcher’s experience level, knowledge of the content set and search engine, and ability to enter all relevant keywords as part of their search. Since various unknown keywords can be used to describe a concept, Boolean searching may result in reduced recall. Although LSA is limited in its ability to improve precision, it can dramatically improve recall, finding documents that Boolean searches may miss by analyzing document sets and terms to reveal concepts especially when document sets span varied or noisy texts or contain multiple languages. This presentation will outline the pros, cons and synergy between Boolean and LSA and discuss the value of LSA for the information professional.

12:15 - 12:45

Chemical Non-Patent Literature Searching in E-journals and on the Internet

Searching non-patent literature prior art is crucial for checking patentability of new inventions and validity of granted patents, since by patent law information contained in non-patent literature is as important as any patent document. Relevant subject matter is not always in focus of a publication, but often hidden in the text, and therefore not always indexed in bibliographic databases of classical online hosts. Thus, comprehensive information retrieval requires searching the full-text of journals and the internet. In this context retrieval of chemical structures from these sources is a major challenge.

The presentation gives an overview of the potential and drawbacks of various publisher E-journal full-text search sites with special respect to their search and display capabilities in chemical searching. Moreover, recent developments in chemical structure searching in E-journals and on the internet will be discussed.

14:15 - 14-45

Finding SMEs as Partners: Good Things Do Come in Small Packages

In recent years Open Innovation has become increasingly important as a way to approach business, and R&D in particular. Organisations big and small no longer assume that the answers to R&D challenges can be generated internally. It is therefore increasingly important to find the right partner to collaborate with - not just any partner - which often means a combination of technical capability and commercial viability.

There is plenty of evidence that small and medium-sized enterprises (SMEs) are increasing in importance in the innovation space. But finding the right SME presents significant challenges because there is typically a large number of them, each with a relatively small "footprint".

As one way to meet this challenge, the Technology Intelligence group at Unilever has developed techniques that draw together information from multiple sources. We then use information analysis and visualisation tools to identify partners that have both a technical footprint (e.g. patents) and commercial footprint (e.g. trade and business news), using this as an indicator of promising companies. We work closely with the R&D teams to refine the lists generated to deliver a shortlist of leads to follow up.

15:45 - 16:15

The Changing World of Search and Information Access at the USPTO

Effective access to Intellectual Property (IP) information is a key component of the USPTO's mission. By disseminating this information through its public search systems and data products, the USPTO provides the public the means to foster the competent preparation of patent and trademarks applications, avoid infringement of patents and trademarks, and understand the current state of the art as a basis for new ideas. This interactive presentation will focus on the new and innovative approaches being explored by the USPTO to more effectively provide access to the USPTO's extensive body of scientific knowledge. Current projects that will support the modernisation of internal USPTO automation systems to enhance text and search-related capabilities will be discussed. The interactive portion of this presentation will focus on a topic of key interest to the USPTO: improving automated access to the USPTO's systems, so that patent information can be delivered to all users, including 'automated' / data mining users, in an efficient manner. The presentation will engage the audience in a discussion on key data dissemination issues and ideas on approaches to improve the electronic access and delivery of information to the business and research communities.

16:15 - 16:45

Future Patent Tools - Evolution and Revolution

The EPO manages one of the world's most comprehensive collections of technical documentation, accessed daily by thousands of internal and external users through electronic tools developed to support the patent granting process. Guiding the further evolution of these services to a full electronic end-to-end granting process for the benefit of all users is a major challenge in the strategic objectives of the EPO. At the same time, the growth in international data exchange and handling of the increasing amount of patent-related documentation, particularly from Asia, needs to be addressed so as to ensure that user expectations are met with efficient tools.

This presentation looks at the strategic issues in maintaining the value of the patent system by outlining the key principles that underlie the development of the EPO's documentation databases. Emphasis will be given to recent development in the translation and searching of CN, KR and JP patents and utility models as well as the importance of acquisition and quality policies when extending the range of available patent and non-patent literature. The recent developments in the search engine and associated examiner tools specifically developed for the search and examination work will also be addressed.