Class Builder<T>

Builds a minimal FST (maps an IntsRef term to an arbitrary
output) from pre-sorted terms with outputs. The FST
becomes an FSA if you use NoOutputs. The FST is written
on-the-fly into a compact serialized format byte array, which can
be saved to / loaded from a Directory or used directly
for traversal. The FST is always finite (no cycles).

NOTE: The algorithm is described at
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698

The parameterized type T is the output type. See the
subclasses of Outputs.

WARNING: This API is experimental and might change in incompatible ways in the next release.

minSuffixCount1 - If pruning the input graph during construction, this threshold is used for telling
if a node is kept or pruned. If transition_count(node) >= minSuffixCount1, the node
is kept.

minSuffixCount2 - (Note: only Mike McCandless knows what this one is really doing...)

doShareSuffix - If true, the shared suffixes will be compacted into unique paths.
This requires an additional hash map for lookups in memory. Setting this parameter to
false creates a single path for all input sequences. This will result in a larger
graph, but may require less memory and will speed up construction.

doShareNonSingletonNodes - Only used if doShareSuffix is true. Set this to
true to ensure FST is fully minimal, at cost of more
CPU and more RAM during building.

shareMaxTailLength - Only used if doShareSuffix is true. Set this to
Integer.MAX_VALUE to ensure FST is fully minimal, at cost of more
CPU and more RAM during building.

willPackFST - Pass true if you will pack the FST before saving. This
causes the FST to create additional data structures internally to facilitate packing, but
it means the resulting FST cannot be saved: it must
first be packed using FST.pack(int, int)}.

Method Detail

getTotStateCount

public int getTotStateCount()

getTermCount

public long getTermCount()

getMappedStateCount

public int getMappedStateCount()

setAllowArrayArcs

public void setAllowArrayArcs(boolean b)

Pass false to disable the array arc optimization
while building the FST; this will make the resulting
FST smaller but slower to traverse.