-- |-- Module : Data.ByteString.Search.BoyerMoore-- Copyright : Daniel Fischer-- Chris Kuklewicz-- Licence : BSD3-- Maintainer : Daniel Fischer <daniel.is.fischer@googlemail.com>-- Stability : Provisional-- Portability : non-portable (BangPatterns)---- Fast overlapping Boyer-Moore search of both strict and lazy-- 'S.ByteString' values.---- Descriptions of the algorithm can be found at-- <http://www-igm.univ-mlv.fr/~lecroq/string/node14.html#SECTION00140>-- and-- <http://en.wikipedia.org/wiki/Boyer-Moore_string_search_algorithm>---- Original authors: Daniel Fischer (daniel.is.fischer at googlemail.com) and-- Chris Kuklewicz (haskell at list.mightyreason.com).moduleData.ByteString.Search.BoyerMoore{-# DEPRECATED "Use the new interface instead" #-}(-- * Overview-- $overview-- ** Changes-- $changes-- ** Deprecation-- $deprecation-- ** Parameter and return types-- $types-- ** Lazy ByteStrings-- $lazy-- ** Performance-- $performance-- ** Complexity-- $complexity-- ** Partial application-- $currying-- ** Integer overflow-- $overflow-- * FunctionsmatchLL,matchLS,matchSL,matchSS)whereimportData.ByteString.Search.Internal.BoyerMoore(matchLS,matchSS)importData.ByteString.Lazy.Search.Internal.BoyerMoore(matchLL,matchSL)-- $overview---- This module exists only for backwards compatibility. Nevertheless-- there have been small changes in the behaviour of the functions.-- The module exports four search functions: 'matchLL', 'matchLS',-- 'matchSL', and 'matchSS'. All of them return the list of all-- starting positions of possibly overlapping occurrences of a pattern-- in a string.-- $changes---- Formerly, all four functions returned an empty list when passed-- an empty pattern. Now, in accordance with the functions from the other-- modules, @matchXY \"\" target = [0 .. 'length' target]@.-- $deprecation---- This module is /deprecated/. You should use the new interface provided-- in "Data.ByteString.Search" resp. "Data.ByteString.Lazy.Search".-- $types---- The first parameter is always the pattern string. The second-- parameter is always the target string to be searched. The returned-- list contains the offsets of all /overlapping/ patterns.---- A returned @Int@ or @Int64@ is an index into the target string-- which is aligned to the head of the pattern string. Strict targets-- return @Int@ indices and lazy targets return @Int64@ indices. All-- returned lists are computed and returned in a lazy fashion.-- $lazy---- 'matchLL' and 'matchLS' take lazy bytestrings as patterns. For-- performance, if the pattern is not a single strict chunk then all-- the the pattern chunks will copied into a concatenated strict-- bytestring. This limits the patterns to a length of (maxBound ::-- Int).---- 'matchLL' and 'matchSL' take lazy bytestrings as targets.-- These are written so that while they work they will not retain a-- reference to all the earlier parts of the the lazy bytestring.-- This means the garbage collector would be able to keep only a small-- amount of the target string and free the rest.-- $currying---- These functions can all be usefully partially applied.-- Given only a pattern the partially applied version will compute-- the supporting lookup tables only once, allowing for efficient re-use.-- Similarly, the partially applied 'matchLL' and 'matchLS' will compute-- the concatenated pattern only once.-- $performance---- In general, the Boyer-Moore algorithm is the most efficient method to-- search for a pattern inside a string. The advantage over other algorithms-- (e.g. Na&#239;ve, Knuth-Morris-Pratt, Horspool, Sunday) can be made-- arbitrarily large for specially selected patterns and targets, but-- usually, it's a factor of 2&#8211;3 versus Knuth-Morris-Pratt and of-- 6&#8211;10 versus the na&#239;ve algorithm. The Horspool and Sunday-- algorithms, which are simplified variants of the Boyer-Moore algorithm,-- typically have performance between Boyer-Moore and Knuth-Morris-Pratt,-- mostly closer to Boyer-Moore. The advantage of the Boyer-moore variants-- over other algorithms generally becomes larger for longer patterns. For-- very short patterns (or patterns with a very short period), other-- algorithms, e.g. "Data.ByteString.Search.DFA" can be faster (my-- tests suggest that \"very short\" means two, maybe three bytes).---- In general, searching in a strict 'S.ByteString' is slightly faster-- than searching in a lazy 'L.ByteString', but for long targets the-- smaller memory footprint of lazy 'L.ByteStrings' can make searching-- those (sometimes much) faster. On the other hand, there are cases-- where searching in a strict target is much faster, even for long targets.---- On 32-bit systems, 'Int'-arithmetic is much faster than 'Int64'-arithmetic,-- so when there are many matches, that can make a significant difference.---- Also, the modification to ameliorate the case of periodic patterns-- is defeated by chunk-boundaries, so long patterns with a short period-- and many matches exhibit poor behaviour (consider using @indices@ from-- "Data.ByteString.Lazy.Search.DFA" or "Data.ByteString.Lazy.Search.KMP"-- in those cases, the former for medium-length patterns, the latter for-- long patterns; only 'matchLL' and 'matchSL' suffer from-- this problem, though).-- $complexity---- Preprocessing the pattern string is O(@patternLength@). The search-- performance is O(@targetLength@\/@patternLength@) in the best case,-- allowing it to go faster than a Knuth-Morris-Pratt algorithm. With-- a non-periodic pattern the worst case uses O(3\*@targetLength@)-- comparisons. The periodic pattern worst case is quadratic-- O(@targetLength@\*@patternLength@) complexity for the original-- Boyer-Moore algorithm.---- The searching functions in this module contain a modification which-- drastically improves the performance for periodic patterns.-- I believe that for strict target strings, the worst case is now-- /O/(@targetLength@) also for periodic patterns and for lazy target-- strings, my semi-educated guess is-- /O/(@targetLength@ * (1 + @patternLength@ \/ @chunkSize@)).-- $overflow---- The current code uses @Int@ to keep track of the locations in the-- target string. If the length of the pattern plus the length of any-- strict chunk of the target string is greater or equal to-- @'maxBound'::Int@ then this will overflow causing an error. We try-- to detect this and call 'error' before a segfault occurs.