Top Story

Several trends are contributing to strong growth in the e-discovery market, including the ever increasing amount of litigation, greater volumes of data and a move toward adding in-house e-discovery capabilities.

The corporate taxonomy: creating a new order

Taxonomies are an important tool in balancing the contradictory forces of information overload and the need for instant access to the right information

According to the writer Jorge Luis Borges, a "certain Chinese encyclopaedia" classified animals as: belonging to the Emperor; embalmed; trained; sloppy; sirens; fabulous; stray dogs; included in this classification; trembling like crazy; innumerable; drawn with a very fine camelhair brush; et cetera; having just broken the vase; from a distance look like flies.

The wondrous inconsistencies that this brings to mind are a useful reminder to anyone trying to implement a corporate taxonomy that the world does not fit easily into neatly labelled boxes. That said, it is increasingly important for those in charge of content management, enterprise search, portal or e-commerce projects to have an understanding of why taxonomies matter and how they can be used to improve information retrieval and navigation.

In some areas, such as botany or medical research, taxonomies have long been important tools for organizing information. Today, many more organizations are looking to build taxonomies as part of their information management strategies. Taxonomies are an important tool in balancing the contradictory forces of information overload and the need for instant access to the right information.

Any organization that needs to make significant volumes of information available in an efficient and consistent way to its customers, partners or employees needs to understand the value of a serious approach to taxonomy design and management.

Why taxonomies are important today

Information overload The latest University of California at Berkeley study into information growth estimates that 5 exabytes of recorded information were created worldwide in 2002 (equivalent to 800 Mb for each person on the planet). If access to those volumes of information is to be a benefit rather than a burden, then order and control become prerequisites. Information management techniques must be improved if we are to gain more control over those information flows, and taxonomies should be a key part of that. (For more on the Berkeley study, see www.sims.berkeley.edu/research/projects/how-much-info-2003.)

The rise of the Web

The very structure of the Web (in an Internet, extranet or intranet context) offers new opportunities for information organization. The ability to provide universally accessible, hyperlinked, multimedia content presents unique challenges in terms of information classification. The growing awareness of the value of taxonomies is closely associated with the rapid development of knowledge about effective Web site design, online usability and the importance of the overall information architecture.

The growing use of unstructured information management technologies In response to, and as part of, the evolution of the Web, most large and medium-sized businesses have invested in content management, search and portal technologies. They are now looking at how they can increase the benefits from those technologies and provide a consistent information infrastructure that can be shared across different applications.

What is a corporate taxonomy?

A definition

A simple definition of a taxonomy is that it is a hierarchy of categories used to classify documents and other information. A corporate taxonomy is a way of representing the information available within an enterprise.

A classical taxonomy assumes that each element can only belong to one branch of the hierarchical tree. However, in a corporate environment, such formal ordering is neither feasible nor desirable. For example, a document on a competitor's product may be of interest to different departments in the organization for different reasons--forcing it into a single predefined category may be neater, but also reduces its usefulness. Corporate taxonomies need to be flexible and pragmatic as well as consistent.

The role of technology

There is no intrinsic need for technology in the definition and management of a taxonomy. However, there is a growing role for tools that can assist or even eliminate some of the tasks associated with taxonomy design and management.

Tools and solutions are available that can assist with any stage of the process--from the simple editing and design of a taxonomy structure, to the automatic identification of categories and the assignment of content to the relevant classes.

Vendors disagree over the right techniques and approaches to the classification of information. The approach that suits any individual organization will depend on a mixture of the business requirement, the type, volume and volatility of the information to be managed, the skills available in-house and the budget available.

The greater the volatility of the information and the categories to be used, and the less the in-house experience available, the more attractive an automated solution will be. For organizations with extensive experience of in-house taxonomy design, greater manual control may be preferred.

However, few cases will be simply either/or. Even in an environment where information classification is largely automated, someone still has responsibility for the effectiveness and performance of the system (in terms of information access and not just technical reliability). For organizations with extensive knowledge of taxonomy design, increased automation can reduce maintenance costs, increase efficiency and extend the applicability of a taxonomy across an enterprise.

The evolving roleof software: beyond categorization

Software support for the corporate taxonomy is not limited to the provision of automatic classification tools. More effort is now being invested in software that can increase the usability of taxonomies for both corporate users and consumers and in tools to support taxonomy design and management.

The experience gained in building intranet and e-commerce sites is driving the development of more flexible technologies for the definition, management and use of taxonomies. The goal is to combine:

the need for control and order in information management,

an understanding of how users navigate through large volumes of information, and

the realities of corporate and e-commerce information models.

Multifaceted taxonomies

The limitations of a purely hierarchical taxonomy model have been recognized for many years in academic and information science circles. Consequently, there has been considerable interest in taxonomy structures that offer a more flexible view of how information can be categorized for general use: The alternative approaches are often referred to as faceted, multidimensional or relational taxonomies. Those concepts are now making their way into commercial products aimed at supporting taxonomies in an e-commerce or corporate environment.

Multifaceted taxonomies enable the user to navigate through a number of facets of a taxonomy (for example, by artist, genre, instrument or composer in a music library). They also allow the different facets to be cross-referenced to narrow or widen a search as the user browses the categories (for example, you can browse and select recipes by a combination of ingredients, cuisine and occasions at epicurious.com).

Developments in multifaceted taxonomies are also closely linked to new analytical and visualization capabilities that offer to transform our experience of search and navigation through large volumes of information.

Workflow and collaboration

Developing and managing a taxonomy is a collaborative project involving multiple stakeholders. It also needs clear procedures for change management. Integrated workflow tools and collaborative editing tools make it easier to manage taxonomies in large organizations and places where taxonomies have to monitored and adapted on a regular basis, such as shopping sites.

Search analytics and taxonomy management

Search analytics refers to the collection, analysis and exploitation of information about the way search technologies are used. The initial driver for this development came from the need for e-commerce sites to know how users are searching their sites. The next step is for those techniques to be used within the enterprise. Better information on what users are searching for, and the ability to tailor results and navigation paths, offers a relatively easy way to improve information retrieval within an organization. There is a great opportunity for using search analytics in the design and maintenance of better taxonomy structures.

Visualization tools

Improved visualization capabilities can enhance the value of taxonomies at two levels:

usability--providing visualization capabilities to the user enhances their ability to take advantage of the investment in an underlying taxonomy. Taxonomies provide a basis for implementing existing visualization tools in a useful way and open the way for new tools that can help users visualize the multidimensional space in which they are searching.

design and management--improved means of visualizing a taxonomy structure make it easier to ensure an efficient balance among categories and better fit with user expectations. Such developments are closely linked to improved support for the rapid design, test and refinement of taxonomies.

The evolving role of taxonomies in the enterprise

In order to develop the promise of the "knowledge-based economy," businesses (and governments) have to develop a much better understanding of the nature of information and knowledge capital. Taxonomies are linked into that development at a number of levels:

The development of corporate taxonomies is part of a general move toward developing methods and techniques for the management of intellectual capital, including knowledge audits, social network analysis, balanced scorecards and new accountancy models suited to intellectual asset management.

Those developments are also linked with the evolution of a mature information management architecture based on technologies such as content management, portals, search and data warehousing.

Thirdly, a new generation of enterprise IT architectures based on Web services and other open standards is making possible new levels of information integration and interoperability. To exploit those capabilities, organizations must have much clearer and formal understanding of their information flows and structures at the semantic level.

As organizations evolve their information management processes, methodologies and technologies in coming years, taxonomy development (and related information science concepts) will be given a much more prominent role within organizations. Taxonomy methods and technologies will themselves have to evolve if they are to meet the requirements of this general transformation in information management.

We have described some of the most interesting developments at the technology level (multidimensional taxonomies, taxonomy management tools, etc.). The corporate taxonomy will also continue to develop as part of a wider evolution of information management strategies in the enterprise.

The rise of the semantic enterprise

The failure of companies to respond quickly enough to changing conditions is also prompting many of them to take a more holistic view of their information architecture. We are seeing many organizations questioning some of the established limits on information flow in the organization--between front and back office, between operational and reporting and analysis systems, and between structured and unstructured information.

There is a recognition that we need to break down existing silos of information--in the past, this has been one of the goals of collaboration tools, content management repositories, data warehouses and CRM systems. Each of those initiatives has attempted to address the problems of information islands, but in the end they have often raised as many problems and barriers as they solved.

The rising interest in taxonomies and information classification suggests, however, that we are becoming more sophisticated in how we view and manage information assets across our organizations. In addition, the emergence of common--and more importantly workable--standards for information exchange and application integration (such as XML, WSDL, SOAP) holds out the possibility that we can finally start to overcome the recurrent barriers to developing a unified approach to managing information and knowledge across an organization.

Web services standards will provide the technical basis of such integration. And the increasing use of XML as a standard for information description holds out the hope of developing semantically rich infrastructures in which new forms of information publishing, information discovery and information sharing will be possible.