Supported Platforms

Download and start SMILA

Download the SMILA package matching your operation system and unpack it to an arbitrary folder. This will result in the following folder structure:

/<SMILA>
/configuration
...
SMILA
SMILA.ini

Preconditions

To be able to start SMILA, check the following preconditions first:

JRE

You will have to provide a JRE executable to be able to run SMILA. The JVM version should be Java 7 (or newer). You may either:

add the path of your local JRE executable to the PATH environment variable or

add the argument -vm <path/to/jre/executable> right at the top of the file SMILA.ini. Make sure that -vm is indeed the first argument in the file, that there is a line break after it and that there are no leading or trailing blanks. It should look similar to the following:

-vm
d:/java/jre7/bin/java
...

Linux

When using Linux, make sure that the file SMILA has executable permissions. If not, set the permission by running the following commands in a console:

chmod +x ./SMILA

MacOS

When using MAC, switch to SMILA.app/Contents/MacOS/ and set the permission by running the following command in a console:

chmod a+x ./SMILA

Start SMILA

To start SMILA, simply start the SMILA executable.

You can see that SMILA has fully started if the following line is printed on the OSGI console:

Further information: The "indexUpdate" workflow uses the ScriptProcessorWorker that executes the JavaScript "add.js" workflow. So, the synchronous script call is embedded in the asynchronous "indexUpdate" workflow. For more details about the "indexUpdate" workflow and "indexUpdate" job definitions see SMILA/configuration/org.eclipse.smila.jobmanager/workflows.json and jobs.json). For more information about job management in general please check the JobManager documentation.

Start the crawl job run

Now that the indexing job is running we need to push some data to it. There is a predefined job for importing the SMILA Wiki pages which we are going to start right now.

This starts the job crawlSmilaWiki, which crawls the SMILA Wiki starting with http://wiki.eclipse.org/SMILA and (by applying the configured filters) following only links that have the same prefix. All pages crawled matching this prefix will be pushed to the import job.

The crawling of the SMILA Wiki pages should take some time. If all pages are processed, the status of the crawlSmilaWiki's job run will change to SUCCEEDED. You can continue with the SMILA search (next chapter) to find out if some of the pages have already made their way into the Solr index.

Further information: For more information about importing and crawl jobs please see SMILA Importing . For more information on jobs and tasks in general visit the JobManager manual.

Search the index

To have a look at the index state, e.g. how many documents are already indexed, call:

There are currently two stylesheets from which you can select by clicking the respective links in the upper left corner of the header bar: The Default stylesheet shows a reduced search form with text fields like Query, Result Size, and Index, adequate to query the full-text content of the indexed documents. The Advanced stylesheet in turn provides a more detailed search form with text fields for meta-data search like for example Path, MimeType, Filename, and other document attributes.

You crawled the SMILA Wiki, indexed the pages and searched through them. For more, just continue with the chapter below or visit the SMILA Documentation.

Further steps

Crawl the filesystem

SMILA has also a predefined job to crawl the file system ("crawlFilesystem"), but you will have to either adapt the predefined job to point it to a valid folder in your filesystem or create your own job.

We will settle for the second option, because it does not need that you stop and restart SMILA.

Create your Job

POST the following job description to SMILA's Job API. Adapt the rootFolder parameter to point to an existing folder on your machine where you have placed some files (e.g. plain text, office docs or HTML files). If your path includes backslashes, escape them with an additional backslash, e.g. c:\\data\\files.

Search for your new data

After the job run's finished, wait a bit, then check whether the data has been indexed (see Search the index).

It is also a good idea to check the log file for errors.

5 more minutes to change the workflow

The 5 more minutes to change the workflow show how you can configure the system so that data from different data sources will go through different workflows and scripts and will be indexed into different indices.