This document covers the steps involved in the installation of the
Transparensee Discovery Search Engine and Discovery Data Tool on UNIX variants.
It assumes a basic understanding of UNIX system administration concepts and
tools. Syntax may vary slightly based on your installed UNIX.

Note that the Discovery Data Tool is an optional separate component. It may be
deployed alongside the Discovery Engine on the same server, or may be deployed
separately. In order to ease maintenance and configuration, both the Discovery
Engine and the Discovery Data Tool use similar directory layouts, start/stop
scripts, and configuration files. They are set up for running as UNIX daemons
in the same manner, and can be upgraded in a similar fashion.

If you are interested in upgrading an existing engine, you should be familiar
with this section and refer to Upgrading an Engine for UNIX Variants for specific instructions,
best practices and tips.

Transparensee recommends a simple directory structure, whose top-level directory
we will call the home directory. The home directory includes a read-only archives
sub directory for the compressed engine version binary file; a read-only releases directory
that includes the uncompressed binaries; and a working engines directory for
each running instance of the engine or data tool. This structure makes it easier to manage
various server instances at any stage of your development life cycle, making
upgrading an engine or data tool instance easier to manage.

The recommended directory structure is designed such that the archives directory
can be considered read-only. There should be one directory for each version release
of the Discovery Search Engine and Discovery Data Tool, named the same as the
compressed binary file.

For example, engine version 2.8.3 would be delivered as discovery-2.8.3.zip.
Copy this file to archives/discovery-2.8.3.zip.

Follow the same procedure for subsequent version upgrades you receive from
Transparensee.

The recommended directory structure is designed such that the releases directory,
like the archives directory, be considered read-only.

There should be one directory for each version release
of the Discovery Search Engine and Discovery Data Tool, named the same as the compressed binary file,
that contains the uncompressed and ready-to-run engine binaries.

For example, engine version 2.8.3 would be delivered as discovery-2.8.3.zip.
When you unzip this file under releases it will unpack into releases/discovery-2.8.3/.
Similarly, data tool version 1.10 would be delivered as discovery_datatool-1.10.zip, and
when you unzip this file under releases it will unpack into releases/discovery_datatool-1.10/.

Follow the same procedure for subsequent version upgrades you receive from
Transparensee.

Each named releases sub directory for each engine executable version contains a bin
directory. The discovery shell script from that directory will start and stop the engine
contained in it’s releases directory.

Each named releases sub directory for each data tool executable version contains a bin
directory. The discovery_datatool shell script from that directory will start and stop the data tool
contained in it’s releases directory.

There should be one engine data directory for each running instance of an engine or data tool.
We recommend that these directories be created under the engines directory.

For the Discovery Engine, the directory should contain the instance-specific discovery.properties
file and a symlink to the discovery start/stop script of the version this instance
runs. The engine creates all other folders and directories automatically when it
starts up. The directories that it creates include all of the index definitions,
changeset data and log files.

For the Discovery Data Tool, the directory should contain the instance-specific
datatool.properties file, the discovery_datatool.xml configuraiton file,
and a symlink to the discovery_datatool start/stop script of the version this
instance runs. The data tool creates all other directories automatically when
it starts up.

The Discovery Engine shell scripts refer to the releases directory containing the discovery
engine release files as RELEASE_DIR. The directory in engines that contains
the per-instance data and configuration files is referred to as DISCOVERY_DIR.

The Discovery Data Tool shell scripts refer to the releases directory containing the discovery
engine release files as RELEASE_DIR. The directory in engines that contains
the per-instance data and configuration files is referred to as DATATOOL_DIR.

The discovery.properties file, found in a named sub directory in the engines
directory, serves as the configuration medium for an engine.
The most important configuration settings to determine is the port to listen on
and the amount of memory to allocate to the JVM. For information on other
discovery.properties settings refer to discovery.properties Reference.

The discovery.properties file is a plain text file, can be edited with any text
editor or created from the command line. The following example creates an engines
directory for the production engine instance, to listen on port 8090 and allocates
no more than 512 Mb of RAM:

The datatool.properties file, found in a named sub directory in the engines
directory, serves as the configuration medium for an instance of the data tool.
For basic use of the Discovery Data Tool, default configuration values can be
used. Even if no properties are to be set, and the file is empty, a
datatool.properties file must exist.

The datatool.properties file is a plain text file, can be edited with any text
editor or created from the command line.

To start an engine instance manually, first change working directory to the engine
directory of the instance you want to start. Use the discovery symlink to start
the instance, in this example, the production instance:

$ cd ~discovery/engines/production
$ ./discovery start

To stop an engine instance manually, change the working directory to the engine
directory of the instance you want to stop and use the discovery symlink to stop
the instance:

$ cd ~discovery/engines/production
$ ./discovery stop

Starting and stopping a data tool instance is quite similar to working with an
engine. The only difference is that the script is named discovery_datatool.

To start a data tool instance manually, first change working directory to the engine
directory of the instance you want to start. Use the discovery_datatool symlink to start
the instance, in this example, the production_feed instance:

$ cd ~discovery/engines/production_feed
$ ./discovery_datatool start

To stop a data tool instance manually, change the working directory to the engine
directory of the instance you want to stop and use the discovery_datatool symlink to stop
the instance:

$ cd ~discovery/engines/production_feed
$ ./discovery_datatool stop

The discovery and discovery_datatool scripts are useful when your team is in full-fledged development
mode because it easily allows you to start/stop and monitor an engine or data tool instance. As
you transition into production mode, you will most likely automate starting these
components when your server starts up. Refer to the next section for more information
about configuring init.d scripts to automatically start instances of the
Discovery Search engine and Discovery Data Tool.

Running the engine or data tool as root poses an unnecessary security risk.
It is never recommended to run the Discovery components as root, even if you
are going to automatically start them when your server boots. Instead we
recommend that you create a non-privileged user to run the engine and data
tool.

The engine and data tool can be installed at any location. Unless you configure the
applications to run on restricted ports, they can by run as a
user with basic privileges. Extract the compressed file and change to the newly
created discovery/ directory.

The user that will run the Discovery Search Engine needs to have
write permissions in the discovery/ directory.

The Discovery Search Engine uses your machine’s hostname and resolved IP to
generate unique ids. If your machine’s IP is not resolved remotely
(e.g. by DNS or NIS), make sure it can be resolved locally, either in
/etc/hosts or elsewhere. For example your /etc/hosts may look like
this:

127.0.0.1 localhost
192.168.1.100 myhostname

If the machine’s host name cannot be resolved, then the engine will not start.
You will see an exception like the one below in the file logs/discovery.log.

Each of the discovery engine and data tool releases contain an init-script folder structure with sample init.d and
sysconfig files that can be customized and installed to start the discovery
search engine and discovery data tool at system startup. The init.d scripts are structured so that basic settings such
as the directory where your desired instance’s properties file is, and
the user to run as can be configured without modification of the
provided init.d script. This is done to make future upgrades easy - just drop in the updated init.d script without the need to merge any local changes.

The init-script folders each contain a standard init script wrapper for its discovery component.

To install the engine init.d script:

Copy init.d/discovery to /etc/init.d/discovery.

Copy sysconfig/discovery file to /etc/sysconfig/discovery.

Edit /etc/sysconfig/discovery and set the RELEASE_DIR variable.

The default settings require a discovery user. The default run directory
the is /opt/discovery/engines/production.

Archived release zip files should be stored in /opt/discovery/archives
and the unzipped releases should be in /opt/discovery/releases.

Once configured you can start/stop the engine by running
/etc/init.d/discovery.

# You may customize the startup of the discovery# search engine by specifying variables in this# file, RELEASE_DIR is required## DISCOVERY_USER# will set what user account the engine is started# under# DISCOVERY_DIR# will set which directory is used for the data# files and properties files for startup of the# engine. The default is# /opt/discovery/engines/production# RELEASE_DIR# will set which version of the engine is startedRELEASE_DIR=

# You may customize the startup of the discovery# data tool by specifying variables in this# file, RELEASE_DIR is required## DATATOOL_USER# Which user account to use when running the data tool## DATATOOL_DIR# The directory used for the configuration# files for startup of the data tool. The default is# /opt/discovery/engines/feed## RELEASE_DIR# Determines which version of the data tool is startedRELEASE_DIR=

If you require multiple engine installs or multiple data tool installs on a
single box, then copy the discovery scripts in /etc/init.d and
/etc/sysconfig. Give each install a different name. Note that each init.d
script references its configuration file under /etc/sysconfig with the variable
CONF_FILE. You will need to update that line in the init.d script to reference
its corresponding configuration file.

File Naming Guildines

It is recommended that you use a suffix that helps identify which engine or data tool is
starting and synchronize the init.d script name with the configuration file
name.

For example, /etc/init.d/discovery-testing would have the CONF_FILE
variable set to /etc/sysconfig/discovery-testing which would be created from
/etc/sysconfig/discovery and set the correct DISCOVERY_DIR to use.

DEBIAN

Debian users should follow the previous instructions but replace all
references to /etc/sysconfig with /etc/default.

Starting the Discovery Engine or Data Tool At System Startup (chkconfig)¶

The supplied init.d scripts for starting and stopping the discovery
components contain chkconfig and LSB 3.1 compliant comments instructing the
engine or data tool to start at run levels 2, 3, 4 and 5.

When using the discovery start/stop script, the engine log file, discovery.log,
will be located in the logs directory of the named engines sub directory.

As of version 3.13, the Discovery Engine logs via log4j
and is preconfigured to rotate the log files by size. You can customize logging
by setting the log4j.configuration system property. For more information
on configuring the Discovery engine’s logging, refer to Discovery Log File.

Monitor the output from the discovery.log to ensure the server started successfully.