Table Of Contents

Benchmarking By Tasks

Benchmark Lucene using task primitives.

A benchmark is composed of some predefined tasks, allowing for creating an
index, adding documents,
optimizing, searching, generating reports, and more. A benchmark run takes an
"algorithm" file
that contains a description of the sequence of tasks making up the run, and some
properties defining a few
additional characteristics of the benchmark run.

How to use

Easiest way to run a benchmarks is using the predefined ant task:

ant run-task
- would run the micro-standard.alg "algorithm".

ant run-task -Dtask.alg=conf/compound-penalty.alg
- would run the compound-penalty.alg "algorithm".

java org.apache.lucene.benchmark.byTask.programmatic.Sample
- would run a performance test programmatically - without using an alg
file. This is less readable, and less convenient, but possible.

You may find existing tasks sufficient for defining the benchmark you
need, otherwise, you can extend the framework to meet your needs, as explained
herein.

Each benchmark run has a DocMaker and a QueryMaker. These two should usually
match, so that "meaningful" queries are used for a certain collection.
Properties set at the header of the alg file define which "makers" should be
used. You can also specify your own makers, extending DocMaker and implementing
QueryMaker.

Note: since 2.9, DocMaker is a concrete class which accepts a
ContentSource. In most cases, you can use the DocMaker class to create
Documents, while providing your own ContentSource implementation. For
example, the current Benchmark package includes ContentSource
implementations for TREC, Enwiki and Reuters collections, as well as
others like LineDocSource which reads a 'line' file produced by
WriteLineDocTask.

Benchmark .alg file contains the benchmark "algorithm". The syntax is described
below. Within the algorithm, you can specify groups of commands, assign them
names, specify commands that should be repeated,
do commands in serial or in parallel,
and also control the speed of "firing" the commands.

This allows, for instance, to specify
that an index should be opened for update,
documents should be added to it one by one but not faster than 20 docs a minute,
and, in parallel with this,
some N queries should be searched against that index,
again, no more than 2 queries a second.
You can have the searches all share an index reader,
or have them each open its own reader and close it afterwords.

If the commands available for use in the algorithm do not meet your needs,
you can add commands by adding a new task under
org.apache.lucene.benchmark.byTask.tasks -
you should extend the PerfTask abstract class.
Make sure that your new task class name is suffixed by Task.
Assume you added the class "WonderfulTask" - doing so also enables the
command "Wonderful" to be used in the algorithm.

External classes: It is sometimes useful to invoke the benchmark
package with your external alg file that configures the use of your own
doc/query maker and or html parser. You can work this out without
modifying the benchmark package code, by passing your class path
with the benchmark.ext.classpath property:

External tasks: When writing your own tasks under a package other than
org.apache.lucene.benchmark.byTask.tasks specify that package thru the
alt.tasks.packages property.

Benchmark "algorithm"

The following is an informal description of the supported syntax.

Measuring: When a command is executed, statistics for the elapsed
execution time and memory consumption are collected.
At any time, those statistics can be printed, using one of the
available ReportTasks.

Comments start with '#'.

Serial sequences are enclosed within '{ }'.

Parallel sequences are enclosed within
'[ ]'

Sequence naming: To name a sequence, put
'"name"' just after
'{' or '['.
Example - { "ManyAdds" AddDoc } : 1000000 -
would
name the sequence of 1M add docs "ManyAdds", and this name would later appear
in statistic reports.
If you don't specify a name for a sequence, it is given one: you can see it as
the algorithm is printed just before benchmark execution starts.

Repeating:
To repeat sequence tasks N times, add ': N' just
after the
sequence closing tag - '}' or
']' or '>'.
Example - [ AddDoc ] : 4 - would do 4 addDoc
in parallel, spawning 4 threads at once.
Example - [ AddDoc AddDoc ] : 4 - would do
8 addDoc in parallel, spawning 8 threads at once.
Example - { AddDoc } : 30 - would do addDoc
30 times in a row.
Example - { AddDoc AddDoc } : 30 - would do
addDoc 60 times in a row.
Exhaustive repeating: use * instead of
a number to repeat exhaustively.
This is sometimes useful, for adding as many files as a doc maker can create,
without iterating over the same file again, especially when the exact
number of documents is not known in advance. For instance, TREC files extracted
from a zip file. Note: when using this, you must also set
doc.maker.forever to false.
Example - { AddDoc } : * - would add docs
until the doc maker is "exhausted".

Command parameter: a command can optionally take a single parameter.
If the certain command does not support a parameter, or if the parameter is of
the wrong type,
reading the algorithm will fail with an exception and the test would not start.
Currently the following tasks take optional parameters:

AddDoc takes a numeric parameter, indicating the required size of
added document. Note: if the DocMaker implementation used in the test
does not support makeDoc(size), an exception would be thrown and the test
would fail.

DeleteDoc takes numeric parameter, indicating the docid to be
deleted. The latter is not very useful for loops, since the docid is
fixed, so for deletion in loops it is better to use the
doc.delete.step property.

SetProp takes a name,value mandatory param,
',' used as a separator.

SearchTravRetTask and SearchTravTask take a numeric
parameter, indicating the required traversal size.

SearchTravRetLoadFieldSelectorTask takes a string
parameter: a comma separated list of Fields to load.

SearchTravRetHighlighterTask takes a string
parameter: a comma separated list of parameters to define highlighting. See that
tasks javadocs for more information

Example - AddDoc(2000) - would add a document
of size 2000 (~bytes).
See conf/task-sample.alg for how this can be used, for instance, to check
which is faster, adding
many smaller documents, or few larger documents.
Next candidates for supporting a parameter may be the Search tasks,
for controlling the query size.

Statistic recording elimination: - a sequence can also end with
'>',
in which case child tasks would not store their statistics.
This can be useful to avoid exploding stats data, for adding say 1M docs.
Example - { "ManyAdds" AddDoc > : 1000000 -
would add million docs, measure that total, but not save stats for each addDoc.
Notice that the granularity of System.currentTimeMillis() (which is used
here) is system dependant,
and in some systems an operation that takes 5 ms to complete may show 0 ms
latency time in performance measurements.
Therefore it is sometimes more accurate to look at the elapsed time of a larger
sequence, as demonstrated here.

Rate:
To set a rate (ops/sec or ops/min) for a sequence, add
': N : R' just after sequence closing tag.
This would specify repetition of N with rate of R operations/sec.
Use 'R/sec' or
'R/min'
to explicitly specify that the rate is per second or per minute.
The default is per second,
Example - [ AddDoc ] : 400 : 3 - would do 400
addDoc in parallel, starting up to 3 threads per second.
Example - { AddDoc } : 100 : 200/min - would
do 100 addDoc serially,
waiting before starting next add, if otherwise rate would exceed 200 adds/min.

Disable Counting: Each task executed contributes to the records count.
This count is reflected in reports under recs/s and under recsPerRun.
Most tasks count 1, some count 0, and some count more.
(See Results record counting clarified for more details.)
It is possible to disable counting for a task by preceding it with -.
Example - -CreateIndex - would count 0 while
the default behavior for CreateIndex is to count 1.

Command names: Each class "AnyNameTask" in the
package org.apache.lucene.benchmark.byTask.tasks,
that extends PerfTask, is supported as command "AnyName" that can be
used in the benchmark "algorithm" description.
This allows to add new commands by just adding such classes.

Supported tasks/commands

Existing tasks can be divided into a few groups:
regular index/search work tasks, report tasks, and control tasks.

Report tasks: There are a few Report commands for generating reports.
Only task runs that were completed are reported.
(The 'Report tasks' themselves are not measured and not reported.)

RepAll - all (completed) task runs.

RepSumByName - all statistics,
aggregated by name. So, if AddDoc was executed 2000 times,
only 1 report line would be created for it, aggregating all those
2000 statistic records.

RepSelectByPref prefixWord - all
records for tasks whose name start with
prefixWord.

RepSumByPref prefixWord - all
records for tasks whose name start with
prefixWord,
aggregated by their full task name.

RepSumByNameRound - all statistics,
aggregated by name and by Round.
So, if AddDoc was executed 2000 times in each of 3
rounds, 3 report lines would be
created for it,
aggregating all those 2000 statistic records in each round.
See more about rounds in the NewRound
command description below.

RepSumByPrefRound prefixWord -
similar to RepSumByNameRound,
just that only tasks whose name starts with
prefixWord are included.

If needed, additional reports can be added by extending the abstract class
ReportTask, and by
manipulating the statistics data in Points and TaskStats.

Control tasks: Few of the tasks control the benchmark algorithm
all over:

ClearStats - clears the entire statistics.
Further reports would only include task runs that would start after this
call.

NewRound - virtually start a new round of
performance test.
Although this command can be placed anywhere, it mostly makes sense at
the end of an outermost sequence.
This increments a global "round counter". All task runs that
would start now would
record the new, updated round counter as their round number.
This would appear in reports.
In particular, see RepSumByNameRound above.
An additional effect of NewRound, is that numeric and boolean
properties defined (at the head
of the .alg file) as a sequence of values, e.g.
merge.factor=mrg:10:100:10:100 would
increment (cyclic) to the next value.
Note: this would also be reflected in the reports, in this case under a
column that would be named "mrg".

ResetInputs - DocMaker and the
various QueryMakers
would reset their counters to start.
The way these Maker interfaces work, each call for makeDocument()
or makeQuery() creates the next document or query
that it "knows" to create.
If that pool is "exhausted", the "maker" start over again.
The ResetInputs command
therefore allows to make the rounds comparable.
It is therefore useful to invoke ResetInputs together with NewRound.

ResetSystemErase - reset all index
and input data and call gc.
Does NOT reset statistics. This contains ResetInputs.
All writers/readers are nullified, deleted, closed.
Index is erased.
Directory is erased.
You would have to call CreateIndex once this was called...

ResetSystemSoft - reset all
index and input data and call gc.
Does NOT reset statistics. This contains ResetInputs.
All writers/readers are nullified, closed.
Index is NOT erased.
Directory is NOT erased.
This is useful for testing performance on an existing index,
for instance if the construction of a large index
took a very long time and now you would to test
its search or update performance.

Other existing tasks are quite straightforward and would
just be briefly described here.

CreateIndex and
OpenIndex both leave the
index open for later update operations.
CloseIndex would close it.

OpenReader, similarly, would
leave an index reader open for later search operations.
But this have further semantics.
If a Read operation is performed, and an open reader exists,
it would be used.
Otherwise, the read operation would open its own reader
and close it when the read operation is done.
This allows testing various scenarios - sharing a reader,
searching with "cold" reader, with "warmed" reader, etc.
The read operations affected by this are:
Warm,
Search,
SearchTrav (search and traverse),
and SearchTravRet (search
and traverse and retrieve).
Notice that each of the 3 search task types maintains
its own queryMaker instance.

CommitIndex and
Optimize can be used to commit
changes to the index and/or optimize the index created thus
far.

WriteLineDoc prepares a 'line'
file where each line holds a document with title,
date and body elements, separated by [TAB].
A line file is useful if one wants to measure pure indexing
performance, without the overhead of parsing the data.
You can use LineDocSource as a ContentSource over a 'line'
file.

ConsumeContentSource consumes
a ContentSource. Useful for e.g. testing a ContentSource
performance, without the overhead of preparing a Document
out of it.

Benchmark properties

Properties are read from the header of the .alg file, and
define several parameters of the performance test.
As mentioned above for the NewRound task,
numeric and boolean properties that are defined as a sequence
of values, e.g. merge.factor=mrg:10:100:10:100
would increment (cyclic) to the next value,
when NewRound is called, and would also
appear as a named column in the reports (column
name would be "mrg" in this example).

Some of the currently defined properties are:

analyzer - full
class name for the analyzer to use.
Same analyzer would be used in the entire test.

directory - valid values are
This tells which directory to use for the performance test.

Index work parameters:
Multi int/boolean values would be iterated with calls to NewRound.
There would be also added as columns in the reports, first string in the
sequence is the column name.
(Make sure it is no shorter than any value in the sequence).

max.bufferedExample: max.buffered=buf:10:10:100:100 -
this would define using maxBufferedDocs of 10 in iterations 0 and 1,
and 100 in iterations 2 and 3.

merge.factor - which
merge factor to use.

compound - whether the index is
using the compound format or not. Valid values are "true" and "false".

alt.tasks.packages
- comma separated list of additional packages where tasks classes will be looked for
when not found in the default package (that of PerfTask). If the same task class
appears in more than one package, the package indicated first in this list will be used.

Results record counting clarified

Two columns in the results table indicate records counts: records-per-run and
records-per-second. What does it mean?

Almost every task gets 1 in this count just for being executed.
Task sequences aggregate the counts of their child tasks,
plus their own count of 1.
So, a task sequence containing 5 other task sequences, each running a single
other task 10 times, would have a count of 1 + 5 * (1 + 10) = 56.

The traverse and retrieve tasks "count" more: a traverse task
would add 1 for each traversed result (hit), and a retrieve task would
additionally add 1 for each retrieved doc. So, regular Search would
count 1, SearchTrav that traverses 10 hits would count 11, and a
SearchTravRet task that retrieves (and traverses) 10, would count 21.

Confusing? this might help: always examine the elapsedSec column,
and always compare "apples to apples", .i.e. it is interesting to check how the
rec/s changed for the same task (or sequence) between two
different runs, but it is not very useful to know how the rec/s
differs between Search and SearchTrav tasks. For
the latter, elapsedSec would bring more insight.