How to quickly write an application to load thousands of records into DB using Spring Batch?

Let’s assume we want to measure air pollution in our city. We’ve got around 100 detectors located in different parts of the town. Measurement results are collected every 15 minutes. It gives us 9600 records a day...

Paweł Weselak

|

20 Dec
2017

Introduction

Let’s assume we want to measure air pollution in our city. We’ve got around 100 detectors located in different parts of the town. Measurement results are collected every 15 minutes. It gives us 9600 records a day. We want aggregate the data hourly or daily and do some analytics but first we need to load results of our measurements from the detectors into a relational database. The detector saves data in a flat file let’s say in CSV format. How to load such amount of data into DB quickly? Spring Batch is one of the available solutions. In the article I will show a simple application using this framework to solve the presented problem.

Concepts of Spring Batch

Basic concept of Spring Batch is a Job. JobParameters are a context in which a Job is executed. Single execution of a Job together with JobParemters creates a JobExectution. Information about each JobExection is stored in a JobRepository. In addition to that we can distinguish a JobLauncher, PlatformTransactionManager and of course DataSource.

Job consists of Steps. Among Steps data are processed in Chunks (piece of data). Typically during batch processing we want first read the data, process it and write back to some DB storage. For these three phases Reader, Processor and Writer are responsible respectively.

Input data format

CSV files we want to read contains a header in the first line, then we have the results of measurements:

What interesting here is @EnableBatchProcessing annotation. It creates all necessary beans used by Spring Batch and we can focus on defining Jobs and Steps. It is worth to mention that Spring Batch requires special database tables to be created for keeping JobExecution info and similar. Passing schema-hsqldb.sql to the DataSource builder will prepare the DB (HSQLDB in our case) schema. Spring Batch includes corresponding scripts for the most popular RDBMSes.

In our case we use MultiResourceItemReader to read items from multiple files (measurements from one location are kept is separate file – each detector produces own file). To read one particular CVS file FlatFileItemReader is used. At the definition we can pass how many lines from the beginning of the file we want to skip. Besides skipped lines, every line from the input file must be mapped to the class instance representing a given item. We can achieve that with the following beans:

First of all we need to split the line to tokens with DelimitedLineTokenizer (the default separator is a comma) and then map tokens to the properties of our Measurement bean. Furthermore, Spring Batch needs to know how to parse date/time from the first field of the input file so we added CustomDateEditor with desired date format.

Storing into database

We’ve created reading flow of our app. Now it’s time for storing items (finally!). It’s far way easier than reading part:

Thanks to BeanPropertyItemSqlParameterSourceProvider we can match Measurement class fields directly to the parameters in the SQL statement.

Testing the flow

We can try run our application now. For that purpose I’ve written a unit test. You can also go to the command line and try out CommandLineJobRunner which has a main method. As a first parameter you can pass either Configuration class name or XML application context file. More detailed description you can find in the Spring Batch documentation.

Summarize

I hope in this short article I encouraged you to take a look closer to Spring Batch framework. Complete sources are available on my github. Have a nice further exploring!

Paweł Weselak

Senior Java Developer. In IT industry since 2010. Sharing his knowledge and experiences Pawel finds the most rewarding in everyday work. He doesn't underestimate importance of developing soft skills in the job of software engineer. He is keen on Machine Learning and data analysis. His free time Pawel spends on traveling, dancing, snowboarding and striking up new acquaintances.