What’s New in Endeca 6.0

Blue Fish Development Group

July 23, 2008

Introduction

Endeca has thrived by continually updating their flagship search offering, the Information Access Platform, to keep ahead of the game in the arena of world-class search and navigation engines. And the upcoming version 6.0 is no exception.

This article will describe some of the most notable enhancements in version 6.0 that you may be interested in if you are currently using a previous version of Endeca’s Information Access Platform (IAP). It assumes a basic familiarity with the Endeca concepts and components.

XQuery

One of the most fundamental and interesting new pieces of functionality in Endeca IAP 6.0 is the inclusion of an XQuery evaluator in the MDEX Engine.

XQuery (also sometimes called XML Query) is a W3C standard language that is designed to easily query across XML data. It utilizes the XPath standard that is also used in XSLT, but combines that with a full-featured language syntax to provide extensibility in data processing. This gives XQuery the ability to perform complex operations on XML data that would have previously required code in other languages leveraging purpose-built libraries for processing XML. The technical details about XQuery are beyond the scope of this article, but there is a wealth of information about it online. The XQuery 1.0 Specification (http://www.w3.org/TR/xquery/) and this XQuery Tutorial (http://www.w3schools.com/xquery/) are good places to start.

Endeca chose to implement an XQuery evaluator as part of its new MDEX engine to provide the ability to write extensible code that is deployed in the engine instead of in the application, and can be bound to web-service interfaces exposed by the engine. This has a number of benefits.

Firstly, much of the code that was previously written in the user interface layer or application layer that dealt with building queries to the engine, parsing results, re-querying, etc. can now be pushed into the engine layer of the application. This means that the other layers of the application can focus on business logic and presenting results to the end user, instead of needing to be concerned with the specifics of working with the engine to manage queries and results.

Providing the ability to bind XQuery functionality to web-services also opens the platform up to the option of using additional programming languages on the client side. Previously, writing code to interface with the engine in anything besides .NET and Java was potentially costly and error-prone, but now, any language that is able to invoke a web service can be used to build the client portion of an Endeca-driven application. These web services can be built either as SOAP-style services, or as simpler REST-style services, and virtually all modern languages have mature frameworks for handling both.

This ‘engine-layer’ code can also now be shared across multiple applications or clients, even those written in multiple languages. A Java-based web application and a native C++ Windows application could both query the same engine, and both take advantage of the shared code that now resides in the engine itself.

Pushing this functionality into the engine layer can have significant performance benefits as well. Consider an application where there are multiple, prioritized classes of data, and the application is to search the first class of data, and only if there are no results in the first, then search the second, and so on. If this code were written in the application layer, there could be several sets of communications with the engine before any results are ultimately presented to the user. However, if this same ‘cascading search’ functionality were implemented in the engine using XQuery, there would only be one request to the engine, and one response, regardless of how many of the underlying classes of data needed to be searched.

Another benefit this brings is the potential for reuse, not just within a single organization, but across the community of Endeca users. Code that is written specifically to reside in the MDEX engine and is not tied to a particular application or user interface is much more likely to be reusable in a broader context. Endeca is already encouraging users of the Early Access release of IAP 6.0 to contribute reusable XQuery code to EDeN, the Endeca Developer Network.

Performance and Scale Enhancements

With the 6.0 version of the IAP, Endeca is continuing its tradition of continual performance improvements from version to version. With this release, special focus was given to some of the more complex edge cases of search and navigation.

One area of focus is navigable dimensions with extremely deep taxonomic hierarchies. This is likely to be an especially important improvement for users who have content management systems with deep and complex folder structures or other hierarchical organization schemes.

Another area of focus is on dimensions with multi-valued attributes. Again, rich content management systems that may have, for instance, many keyword tags, categories, or authors, on a product or document are likely to see more gain from these changes than simpler, flatter data sets.

Actual performance improvements for a particular application will vary, but can be very significant. In internal testing, Endeca has observed gains in throughput (queries per second) of 30 – 60% over the current release 5.1 product on many benchmark tests.

Large scale implementations will benefit from major enhancements to management of index data on disk and in memory. This can be especially impactful in cases where extremely rich records are used for findability, but the many of the details of that record are not required to display to the user. For instance, in a large chemical or parts catalog, an individual component may be described by hundreds of parameters which are potentially important in narrowing a search to that particular item. However, once found, the most important details such as price and SKU may be simple and straightforward.

These scalability enhancements contribute to the throughput gains mentioned earlier. Other key metrics also show notable improvements. One example is large-scale engine process startup times, where Endeca is observing dramatic reductions of 70 – 95% relative to the 5.1 release.

h3>Platform Support

With the 6.0 release, Endeca is completing its transition from 32-bit to the modern 64-bit server architecture. Optimizations have been made to the full 64-bit versions of the Indexer and MDEX Engine for improved performance on 64-bit platforms. Forge remains a 32-bit application, though the gains in moving to 64-bit would be slight, as it is more often bottlenecked by disk contention or source data systems than processor and memory resources.

The 6.0 release will support the 64-bit versions of Windows, Sun Solaris, and Red Hat Enterprise Linux.

Merchandising Workbench

One new product that is now available alongside IAP 6.0 is Merchandising Workbench. Merchandising Workbench provides additional functionality on top of Endeca’s Web Studio that is designed specifically for merchandisers. This functionality removes the complexity of managing business rules from the task of the merchandiser, and provides a simplified interface for performing the most common merchandising tasks, including a workflow specifically designed for streamlining the process of creating effective product landing pages.

CAS v1.2

The latest version of CAS (Content Acquisition System) provides several enhancements to crawling content management systems, including a graphical interface for managing CMS crawls and the ability to run multiple CMS crawls simultaneously, which can be a huge boon to systems that need to pull data from multiple CMSs.

RAD Framework .NET v1.1

The new version of the Rapid Application Development Framework for ASP .NET has a number of bug fixes and enhancements. This framework is intended to jumpstart the process of building a .NET-based Endeca application, and provides code for many common components that can be easily reused. A Java RAD framework is under development, and is currently in the beta stage.

Availability

The Early Access release of IAP 6.0 was made available in May 2008. The EA release is intended for production use by select customers who see immediate, compelling value. The EA release is also available as a technology preview for current customers. Endeca is currently projecting general availability of the 6.0 release later in 2008. For more detailed information, I encourage you to check out EDeN at http://eden.endeca.com/

I hope this article has provided some helpful insights into the latest version of Endeca’s flagship product, and how some of its new features may be relevant and important for you. If you have any questions or comments, I encourage you to comment on this article.