SWISH::Prog tries to fill a niche similar to Data::SearchEngine or DBI: providing a uniform and flexible interface to several different search engine tools and libraries.

SWISH::Prog does not try to replace the use of the underlying search engine tools, but instead tries to fill in some usability gaps and, like the DBI, make it relatively easy to switch between backend tools without needing to re-write an entire codebase.

SWISH::Prog implements all five basic components of a search application:

Gather a document collection. A collection might be a group of HTML pages, or XML documents, or rows in a database. A collection might originate from the web, a filesystem, a database, an email inbox, or anywhere bytes are stored. An Aggregator gathers those documents in a uniform way.

SWISH::Prog provides a variety of Aggregators, for filesystems, email trees, spidering the web, pulling from databases, to name a few. See SWISH::Prog::Aggregator and its subclasses.

With the exception of the Native classes, SWISH::Prog uses SWISH::3 to parse HTML and XML documents (the most common normalized format for SWISH::Filter), and then delegates further analysis (tokenization, etc) to backend tools or libraries.

Each SWISH::Prog::Indexer subclass fronts an information retrieval (IR) tool or library that implements its own proprietary, highly optimized inverted index storage system that preserves the intelligence of the Parser/Analyzer.

The name "SWISH::Prog" comes from the Swish-e -S prog feature. "prog" is short for "program". SWISH::Prog makes it easy to write indexing and search programs.

SWISH::Prog started as a way of making the swish-e binary tool easier to integrate into Perl applications, and has since been expanded as a full implementation of Swish3, with alternate backend libraries (KinoSearch, Xapian, Apache Lucy, etc) filling the Indexer and Searcher roles.

Returns the indexer's count. NOTE This is the number of documents actually indexed, not counting the number of documents considered and discarded by the aggregator. If you want the number of documents the aggregator looked at, regardless of whether they were indexed, use the aggregator's count() method.