How Artificial Intelligence can help with data capture?

A heating company from a small town in Poland has been storing network data in paper documentation in archives for many years. The data was relatively structured, but… Can you imagine searching for one particular information in a room filled with hundreds of papers stored on endless shelves? It’s a hard, time-consuming and frustrating task. Data was often incomplete, as updating is a difficult task. Another issue was the fact that new employees didn’t always have a full knowledge of the network, and the older, more experienced were retiring or changing the job. All these factors were negatively affecting many processes in the company.

The company decided to get all documents scanned to enable easy access to network information. The result of this process was a digital version of the documentation with data pre-divided into areas (by shelves, analogous to how it was stored). It turned out that the searching process was still very long, and data remained incomplete or out of date. The documentation was still lacking detailed information necessary to find a document on a specific network part, area or date range. The acceleration of the process wasn’t as big as predicted.

Furthermore, the company wanted to use network data in various business applications, but due to still unstructured information, the costs of such project was out of company’s reach. The documents had to be categorized in detail – the previous specification wasn’t sufficient which also raised the cost. For employees, categorizing and describing each document means hundreds of hours of tedious work which often collides with other important tasks. It takes a lot of focus and deep knowledge on the network to describe it properly and reliably.

Sounds familiar?

This is an everyday situation for numerous companies, not only from the district heating sector. Our latest experiments and work in the GlobIQ project show that these issues can be solved by using artificial intelligence.

GlobIQ platform is designed to collect data on technical dispersed asset objects in a natural way – based on communicating by voice and image. It enables converting gathered information to a structural form, in which data can be processed in specific business applications. We plan to create dedicated solutions for energy, telecommunications, and municipal services.

By improving data quality, recognizing the text (from hard-writing too) and analyzing the content, AI can automatically categorize and conduct an indexation of the documents. It can replace monotonous and repetitive work as the manual analyzing and categorizing is, and it results in a significant acceleration of work as well as lower costs.

How did we do it?

We have worked on a group of 6200 documents, which contained over 24,000 A4 pages (mostly technical and subcontracting projects, sketches). Scanned documents had to be correctly prepared – we’ve cut them into single pages and improve their quality while testing various options. A few solutions were used for better structure recognition and content analysis. The learning group included 470 documents.

What we discovered?

Over 90% of the documents were correctly recognized and categorized – only 10% were left for manual work. To calculate the number of correctly categorized documents, we compared correct predictions with the total number of documents. Most of the work was done for the employees, with much less time and effort.

Benefits

Using GlobIQ, our heating company could quickly organize and collect all information and create a comprehensive, up to date database on the network infrastructure. The detailed documentation enables to shorten the process of searching for necessary information. GlobIQ allows multiple users to access the data and update it if needed.

What next?

We are currently working on other aspects that GlobIQ can improve. Our research is focused on the following issues:

reading technical parameters from documentation and its spatial indexing: companies need very thorough data, ordered chronologically and categorized by the validity of documents

entering data from coordinate tables: the process is very time-consuming due to the diverse quality of the copies

building network model based on data from paper sources – a single sketch can contain a lot of information and it often doesn’t reflect the actual appearance on the map.

We will keep you updated on our progress and development of the GlobIQ platform. Stay tuned!