Simplifying Endeca Deployments: The Deployment Template

Blue Fish Development Group

April 8, 2008

Introduction

Endeca pipelines can be deployed in any number of ways, though some ways are better than others. How you implement and deploy Endeca pipelines can impact the maintainability and sustainability of your solution, and a good, clean deployment process can reduce the risk of future complications early on. Thankfully, Endeca has recently introduced a generic deployment template which is based on best practices and deployment process standards.
This article will highlight the benefits of using the deployment template, address some common pitfalls of non-standardized deployments and explain the basic scripting mechanisms provided by the template. It is assumed that the reader has a basic understanding of developing Endeca pipelines and is deploying pipelines using at least Endeca Information Access Platform 5.0.

A Deployment without a Template

An Endeca pipeline deployment consists of configuring pipeline components and scripts in the Endeca Application Controller (or Job Control Daemon) and ensuring the pipeline can be kicked off via a script so that it can run on a schedule.
Prior to the introduction of Endeca’s deployment template, this deployment process was considered something of a black art, left only to those with the right kind of know how. Without the template all the decisions of the deployment process are left to the developer, resulting in a deployment that is hard to understand for anyone not directly involved with it. This easily leads to various problems down the road.
Because there was no mandatory way to configure an Endeca pipeline or its supporting scripts, a developer chose his own preferred method. This made it difficult for another developer to understand the configuration. If the pipeline was using auxiliary scripts, the developer may not have been using the provided scripting utility, this made it harder to coordinate scripts with the Endeca Application Controller and share scripts among other projects. Additionally, logging and index archival did not typically exist, often resulting in a loss of essential log files and indices.
With the advent of the deployment template, all the guess work is taken out of deploying an Endeca pipeline.

Benefits of the Deployment Template

The release of Endeca’s generic deployment template enables developers to standardize the deployment process. The deployment template is a set of scripts and configuration which takes care of a project’s deployment requirements. The details of deployment requirements will be explained in depth later in the article.
The template provides many features to ease the deployment process. Some of the features provided are:

A centralized configuration for pipeline components and the Endeca Application Controller

A scripting platform which supports Java and is powered using BeanShell

Example scripts for performing baseline updates and partial crawls

Out of the box scripts for logging and archiving

Standardized directory structure and naming conventions

These features are what make the deployment process easy, repeatable and supportable. Anyone with experience using the template will easily understand any pipeline, its configuration, and its supporting scripts.

What is EAC?

Before we get too far, I want to provide a quick overview of what EAC, or the Endeca Application Controller, is and what it does. The Endeca Application Controller, introduced in Endeca IAP 5.0, is a web service which manages the pipeline configuration, environment configuration and scripts. EAC is also used to execute pipeline components and scripts it has been configured to run.
The user interface for EAC is a web application called Web Studio which can be used to view or update configuration information and execute scripts and components. Since EAC runs as a web service, developers are not bound to using Web Studio to interact with it. The deployment template configures EAC during initialization and updates, and developers are also free to interact with it.

Deployment Template Basics

The deployment template may be downloaded from Endeca’s support site. The latest version at the time of this article is 2.1.

Installing the Template

In addition to its supporting install files the deployment template unzips to provide two scripts in the bin directory: deploy.bat or deploy.sh, to support both Windows and Unix/Linux environments. Running the deploy script will prompt the user to verify the installed version of Endeca, along with a few questions: project name, installation location, and EAC port (the default EAC port is 8888). This will configure the template to the user’s Endeca environment and install the template in the provided path.

What’s in the Box

There are a few files that are very relevant to using the template successfully. This list is a breakdown of the most important paths and scripts provided:

Note: In this article, it is assumed the template was installed under Windows, therefore all scripts have the .bat extension. Had the template been installed in a *nix environment, everything would be the same except the scripts would have a .sh extension, built for running under those environments. All behavior and functionality should remain the same.

config/pipeline/

This directory contains all of the pipeline files. The generic template, as it is installed by the installer, comes pre-configured with Endeca’s wine pipeline. This is provided so the template works ‘out of the box’. The wine pipeline will need to be removed to drop in a new pipeline. Later in this section, I will explain how to remove the wine pipeline and provide a new one.

config/script/AppConfig.xml

This is the heart of the template’s configuration. When the initialization script is run, this file drives it. The environment configuration, EAC configuration and scripts go in here.

config/script/logging.properties

Contains the log4j configuration.

config/script/set_environment.bat

This is used to set environment variables and is called when running any of the scripts provided in /control.

control/initialize_services.bat

This file configures EAC according to AppConfig.xml. EAC setup includes configuring Forge, Dgidx, MDEX engine, base scripts, daily, weekly and monthly report generation as well as any custom scripts or components provided in the AppConfig.

control/baseline_update.bat

One of the default scripts provided in the AppConfig is BaselineUpdate. This file sends the command to EAC to kick off the BaselineUpdate script.

control/runcommand.bat

This can be used to run any command on EAC, for example ‘RunCommand BaselineUpdate’ would run the BaselineUpdate script.

control/set_baseline_data_ready_flag.bat

This file along with the set_partial_data_ready_flag is not a vital component of all pipelines. Though the wine pipeline provided with the deployment template utilizes these mechanisms, other pipelines may not need to copy their baseline data and ‘ready’ it, instead it may be on a database or website, and therefore this would not be used. The implications of this will be described in further detail when I explain how to remove the wine pipeline and drop in a new pipeline.

control/update_web_studio_config.bat

When initialize_services runs, it sets up the project in EAC for the first time. If there is a need to update the configuration at a later time (perhaps due to changes in the AppConfig), this script can perform that task. Alternatively, initialize_services may be used, but be warned – doing this will delete the existing project first.

Installing a Pipeline

Verify AppConfig

Most of the default configurations and scripts provided by the deployment template in the AppConfig are good but there are a few things that might be worth double checking.

In the global variables section, verify that the eacHost, eacPort and working directory are as specified during the template installation.

Under the server/hosts section, verify that the hostname attributes are specified as well.

How to Install the Pipeline

Installing the pipeline involves removing the wine pipeline and its specific configuration, and replacing it with the new pipeline. To accomplish this:

Delete all the files under /config/pipeline and copy the desired pipeline in its place.

The project name should be the same name prefixing the pipeline files. Make sure the new pipeline files have this name.

There are two options to fixing the names if they do not match:

Use Developer Studio to rename the pipeline. To accomplish this, open the pipeline in Developer Studio and use the ‘Save As…’ functionality to give it a new name.

Alternatively, the template can just be deployed with the name of the pipeline.

Delete the /test_data directory, as that data will no longer be needed.

The wine pipeline uses data provided in a data directory, but many pipelines do not depend on their data source being on the local disk. So to create a generic template, it is important to modify the deployment template not to rely on such data sources. The data that the wine pipeline uses ships with the deployment template under /test_data. When running the provided BaselineUpdate script, this test data is copied into a holding directory, then a flag is set to signal that the data is ready. Only then will the pipeline begin to process it.

In the AppConfig, find the BaselineUpdate script. Remove this if statement and its corresponding braces: if (Forge.isDataReady())

The Forge.isDataReady flag is set by ‘load_baseline_test_data’. This batch file copies the test data from its source directory (/test_data) into the projects ‘incoming’ directory. None of this is pertinent to a pipeline whose data is not immediately available locally.

At this point, AppConfig is ready and the pipeline is in place. It is time to configure the project in EAC. Configure EAC by running /control/initialize_services.bat

This will setup all the components and scripts in EAC. Once this is done, the project will be accessible in Web Studio. Logging in to Web Studio and clicking on ‘EAC Administration’ will display the configuration as it is described in AppConfig.

Finally, run the baseline script, /control/baseline_update.bat

This will run the Forge and Dgidx, and then start the index on the designated MDEX engine or Dgraph (the AppConfig refers to it as a Dgraph but in EAC it is referred to as the MDEX engine component.)

Once the engine is running, the index may be viewed with the provided reference application.

Basics of EAC Scripting

EAC scripts are scripts that are written in Java (supported by the BeanShell framework) and live in the AppConfig xml file. When the deployment template initializes or updates EAC with the AppConfig the provided scripts will be included as well. Once a script is in EAC it can be kicked off from Web Studio or by sending a command from the command prompt. There are a few scripts that are provided ‘out of the box’, and are a great starting point for writing scripts to meet your specific needs.
EAC scripts can be used for virtually any purpose imaginable. The baseline script, for instance, will archive logs and the index, kick off a full crawl and then restart the engine. Developers can write scripts that are necessary for their project as well.
Consider a project that has a pipeline component which depends on data that is retrieved from an LDAP server. The developer can write an EAC script that will retrieve the data from LDAP and store it locally. After the data is retrieved the script can kick off the crawl which processes the LDAP data. Another scenario I’ve personally encountered is when an indexing process is dependent on the completion of several concurrent processes. EAC can solve this using its provided lock manager, one of the many features of the EAC scripting framework, to synchronize the processes. The concurrent processes can all obtain locks using the lock manager, and the index will wait for all the obtained locks to be released before executing.
The implementation details of EAC scripts are beyond the scope of this article, but the scripts that are provided out of the box are a great place to start. EAC scripting is the optimal solution for supporting pipelines; its flexibility allows even the most complex problems to be solved. Check back later for an article on EAC scripting.

Conclusion

The deployment template makes it easy to quickly deploy Endeca pipelines and provide excellent configuration management for the pipeline without additional development time. Developers need not worry about a new index wiping out an old one or of log files over-writing each other as long as the deployment template is being leveraged, and configuring archaic control systems is no longer a concern either. The deployment template will quickly become an essential part of any Endeca toolkit after just one use.