Content Extraction and Intelligence Suite

The Content Extraction & Intelligence Suite (CEIS) provides a platform for crawling, extracting or importing content, parsing and normalizing content, and creating a searchable index before loading on the customer’s enterprise system:

The data scraping can comprise of images or text from a variety of sources like web, XML, PDF forms, client databases or other formats. The Transform module is a business rules layer that can be defined via an user interface by an operator. Content enrichment and uploading is dependent on client specification.

The Intelligence is powered through the Content Analytics layer which extracts business value from your content and drives automation.

SPiZone

SPiZone is the SPi Global Platform for content extraction, normalization and transformation that works with both data PDF files as well as scanned images. SPiZone's data scraping feature can be used for digitization and content extraction from a wide range of PDF and image formats such as book/journal pages, customer invoices, and purchase orders.

Some of the key data scraping features include:
• Coordinate extraction along with content extraction for effective searching
• Automated entity or zone extraction based on rules/content analysis
• Content analysis and QA tools layered on OCR engines to improve the accuracy of extracted content