Sunday, January 8, 2017

C++11 : String Replacer for Directory Files using PPL

Sequential algorithms can be easily replaced with the corresponding Parallel Algorithms. However, these algorithms can only be applied, if the loop operations are completely independent and they are not using any shared variables. The second condition can be relaxed by using concurrent data structures. This scheme is perfect for cases, in which we need to process a chain of independent items, such as modifying files in a directory.

The devil is in the details

Let us think a real-life case, in which we have a lot of CSV files in one specific directory. For example, we may have transaction files (containing cash flow tables, date schedule tables, parameters) produced by some front office system batch on a daily basis. These files are then used as feed for a third-party analytical software for some intensive calculations. The only problem is, that some of the information in feed files is not in recognizable form for third-party software. In a nutshell, feed files may contain some specific strings, which should be replaced with another string.

War stories

The next question is how to do this operation quickly and on a safe manner? Needless to say, there are pretty much as many approaches as there are people performing this heartbreaking operation. Personally, I have used

find-and-replace strategy with notepad++, until the amount of files to be processed and parameters to be replaced grew a bit too large.

custom batch script, until I realized that the script was sometimes not working as expected, by either leaving some cases out from replacement operations or resulting incorrect replacements. The scariest part was, that the script was actually working well most of the time.

custom Powershell script (created by someone else), which was using hard-coded configurations for all replacement strings and all strings which should be replaced. All configurations (hosted in source code file) needed to be set in a specific order. This was actually working well up to a point, where those hard-coded configurations should have been changed to correspond changes made in other systems. Moreover, execution times were a bit too high.

Finally, I decided to create my own program for handling this task.

United we fall, divided we stand

After learning a bit of parallel algorithms, I soon realized that this
kind of a scheme (due to task independency) would be suitable candidate for implementing parallelism,
in order to improve program execution speed. Each source file (containing transaction information) has to be processed separately and all key-value pairs (string-to-be-replaced and corresponding replacement string) can be stored in concurrent-safe concurrent unordered map as string pairs.

FileHandler.h

This header file is consisting free functions for all required file handling operations.

Tester.cpp

First, there has to be a directory containing all source files, which are going to be processed by file
processing program. Also, there has to be a specific file, which
contains key-value pairs (key = string to be found and replaced, value =
replacement string) for all desired string replacement cases. Main
program is creating file processors into vector container (there will be
as many processors as there are source files to be processed). Then,
main program is executing all file processors. A single file processor
is first reading source file information into string, looping through
all key-value pairs and checking whether any occurences are to be found,
performing string replacements and finally, writing modified
information back to original source file.

The programs, which are presented in this blog, can be freely used, but without warranty or support of any kind. By using the programs presented in this blog, you accept to bear the entire risk, concerning quality or performance of any programs used. In no event, will I be liable to you for the damages, including any general, special, incidental or consequential damages arising out of the use or inability to use the programs presented in this blog. By using the programs presented in this blog, you are accepting the content of this disclaimer.