lucene-dev mailing list archives

The thing you're describing is a regular composition of automata (as it
exists, for example, when composing clauses of a regular expression). If I
recall right the Levenshtein automaton in Lucene is built on modified brics
code... if so then this should be not a problem. The problem may be that
currently automatons are used in enums in a way that skips from one accepted
sequence to another accepted sequence (if possible). If the automaton has *
operators then there is no way to establish these and everything falls back
to full matching strategy.
Dawid
On Wed, Aug 10, 2011 at 10:54 AM, eks dev <eksdev@yahoo.co.uk> wrote:
>
> Hi Robert, Mike & other FS(A|T) gurus,
>
> a challenge for you ;)
>
> Would it be possible to combine these brilliant peaces of functionality
> with normal Automaton somehow...
>
> Example to illustrate.
> DirectSpellChecker:
> - where instead of minPrefix, we would specify Regex (other Automaton)
> pfxAutiomaton = Regex("(AB)|(BA)") // e.g. Saying,
> levAutomaton = LevenshteinAutomata("XYZ")
>
> spell(pfxAutomaton, levAutomaton);
>
> would match terms that start with "AB" or "BA" and suffix part are normal
> edit distance matches, like ABXY, with one delete
> This would support wild things, like "enable only transpositions in first
> three characters"... In order to gat these matches today, you need to make
> Lev. Automata with maxDistance = 2 (which is then HUGE space to search
> without prefix)... Or generate more Lev. automata and make union of results
> (expensive to itterate)
>
> Other good use cases are simple to construct...
>
> The most general question, can we support at least concatenation between
> LevenshteinAutomata and normal Automata. Intersection/union would be crazy
> thing as well? Where we would have:
> FilteringAutomata.intersect(LevenshteinAutomata)... but I guess I am
> dreaming with this one, but concatenation sounds doable (at least prefix
> side)
>
> Cheers,
> Eks
>
>