1. Introduction

1.1. Terms of Use

In order to use this document, you are required to agree to abide by the following terms. If you do not agree with the terms, you must immediately delete or destroy this document and all its duplicate copies.

Copyrights and all other rights of this document shall belong to NTT DATA or third party possessing such rights.

This document may be reproduced, translated or adapted, in whole or in part for personal use. However, deletion of the terms given on this page and copyright notice of NTT DATA is prohibited.

This document may be changed, in whole or in part for personal use. Creation of secondary work using this document is allowed. However, “Reference document: TERASOLUNA Batch Framework for Java (5.x) Development Guideline” or equivalent documents may be mentioned in created document and its duplicate copies.

Document and its duplicate copies created according to Clause 2 may be provided to third party only if these are free of cost.

Use of this document and its duplicate copies, and transfer of rights of this contract to a third party, in whole or in part, beyond the conditions specified in this contract, are prohibited without the written consent of NTT Data.

NTT DATA shall not bear any responsibility regarding correctness of contents of this document, warranty of fitness for usage purpose, assurance for accuracy and reliability of usage result, liability for defect warranty, and any damage incurred directly or indirectly.

NTT DATA does not guarantee the infringement of copyrights and any other rights of third party through this document. In addition to this, NTT DATA shall not bear any responsibility regarding any claim (Including the claims occurred due to dispute with third party) occurred directly or indirectly due to infringement of copyright and other rights.

Registered trademarks or trademarks of company name and service name, and product name of their respective companies used in this document are as follows.

TERASOLUNA is a registered trademark of NTT DATA Corporation.

All other company names and product names are the registered trademarks or trademarks of their respective companies.

1.2. Introduction

1.2.1. Goal of guideline

This guideline provides best practices to develop high maintainability Batch applications
using full stack framework focusing on Spring Framework, Spring Batch and MyBatis.

This guideline helps to proceed with the software development (mainly coding) smoothly.

1.2.2. Target readers

This guideline is written for architects and programmers having software development experience
and knowledge of the following.

Basic knowledge of DI and AOP of Spring Framework

Application development experience using Java

Knowledge of SQL

Have experiences on using Maven

This guideline is not for beginners.

In order to check whether one has enough basic knowledge to understand the document,
refer to
Spring Framework Comprehension Check
If one is not able to answer 40% of the comprehension test, then it is recommended to study the following books separately.

1.2.3. Structure of guideline

The most important thing is that the guideline is considered as the subset of
TERASOLUNA Server Framework for Java (5.x) Development Guideline
(hereafter, referred to as TERASOLUNA Server 5.x Development Guideline).
By using TERASOLUNA Server 5.x Development Guideline, you can eliminate duplication in explanation and reduce the cost of learning as much as possible.
Since it indicates reference to TERASOLUNA Server 5.x Development Guideline everywhere, we would like you to proceed with the development by using both guides.

Developers who want to experience actual application development by using TERASOLUNA Batch Framework for Java (5.x) are recommended to read following contents.
While experiencing TERASOLUNA Batch Framework for Java (5.x) for the first time, you should read these contents first and then move on to other contents.

This guideline does not force the user to use namespace.
We would like to consider it for simplifying the explanation.

1.2.5. Tested environments of guideline

For tested environments of contents described in this guideline,
refer to " Tested Environment".

1.3. Change Log

Modified on

Modified locations

Modification details

2017-09-27

-

Released 5.0.1 RELEASE version

General

Description details modified
・Errors in the guideline (typing errors, simple description errors etc.) modified
・Design of the link on the index for header and footer modified (Management ID#196)
・JDK8 dependent code changed to code prior to JDK7 considering it will be used by persons who do not know JDK8 (Management ID#231)

Description details modified
・Suffix of class name modified to Repository (Management ID#241)
・Explanation for job request sequence modified to the details which are not dependent on specific RDBMS products (Management ID#233)

2. TERASOLUNA Batch Framework for Java (5.x) concept

2.1. Batch Processing in General

2.1.1. Introduction to Batch Processing

The term of "Batch Processing" refers to the execution or process of a series of jobs in a computer program without manual intervention (non-interactive).
It is often a process of reading, processing and writing a large number of records from a database or a file.
Batch processing consists of following features and is a processing method which prioritizes process throughput than the responsiveness, as compared to online processing.

Commons of batch processing

Process large number of data is collected and processed.

Uninterruptible process for certainty of time is done in a fixed sequence.

Process runs in accordance with the schedule.

Objective of batch processing is given below.

Enhanced throughput

Process throughput can be enhanced by processing the data sets collectively in a batch.
File or database does not input or output data one by one, and instead sums up data of a fixed quantity thus dramatically reducing overheads of waiting for I/O resulting in the increased efficiency.
Even though waiting period for I/O of a single record is insignificant, cumulative accumulation while processing a large amount of data result in fatal delay.

Ensuring responsiveness

Processes which are not required to be processed immediately are cut for batch processing in order to ensure responsiveness of online processing.
For example, when the process results are not required immediately, the processing is done by online processing till its acceptance and batch processing is performed in the background.
The processing method is generally called "delayed processing".

Response to time and events

Processes corresponding to specific period and events are naturally implemented by batch processing.
For example, aggregating business data sets per month on the next 1st weekend,
taking backup every Sunday at 2a.m in accordance with the system operation rules,
and so on.

Restriction for coordination with external system

Batch processing is also used due to restrictions of interface like files with interactions of external systems.
File sent from the external system is a summary of data collected for a certain period.
Batch processing is better suited for the processes which incorporate these files, than the online processing.

It is very common to combine various techniques to achieve batch processing. Major techniques are introduced here.

Job Scheduler

A single execution unit of a batch processing is called a job. A job scheduler is a middleware to manage this job.
A batch system rarely has several jobs, and usually the number of jobs can reach hundreds or even thousands at times.
Hence, an exclusive system to define the relation with the job and manage execution schedule becomes indispensable.

Shell script

One of the methods to implement a job. A process is achieved by combining the commands implemented in OS and middleware.
Although the method can be implemented easily, it is not suitable for writing complex business logic. Hence, it is primarily used in simple processes like copying a file, backup, clearing a table etc.
Further, shell script performs only the pre-start settings and post-execution processing while executing a process implemented in another programming language.

Programming language

One of the methods to implement a job. Structured code can be written rather than the shell script and is advantageous for securing development productivity, maintainability and quality.
Hence, it is commonly used to implement business logic that processes data of file or database which tend to be relatively complex with logic.

2.1.2. Requirements for batch processing

Requirements for batch processing in order to implement business process is as given below.

Performance improvement

A certain quantity of data can be processed in a batch.

Jobs can be executed in parallel/in multiple.

Recovery in case of an abnormality

Jobs can be reexecuted (manual/schedule).

At the time of reprocessing, it is possible to process only unprocessed records by skipping processed records.

Various activation methods for running jobs

Synchronous execution possible.

Asynchronous execution possible.

DB polling, HTTP requests can be used as opportunities for execution.

Various input and output interfaces

Database

File

Variable length like CSV or TSV

Fixed length

XML

Specific details for the above requirements are given below.

A large amount of data can be efficiently processed using certain resources (Performance improvement)

Processing time is reduced by processing the data collectively. Important part here is "Certain resources" part.
Processing can be done by using a CPU and memory for 100 or even 1 million records and the processing time is ideally extended slowly and linearly according to number of records.
Transaction is started and terminated for certain number of records to perform a process collectively. Resources to be used must be levelled in order to perform I/O collectively.
Still, when a large amount of data is to be handled which is yet to be processed, a system wherein hardware resources are used till the limit going a step further.
Data to be processed is divided into records or groups and multiple processing is done by using multiple processes and multiple threads.
Moving ahead, distributed processing using multiple machines is also implemented.
When resources are used upto the limit, it becomes extremely important to reduce as much as possible.

Continue the processing as much as possible (Recovery at the time of occurrence of abnormality)

When a large amount of data is to be processed, the countermeasures when an abnormality occurs in input data or system itself must be considered.
A large amount of data takes a long time to finish processing, however if the time till recovery after occurrence of error is prolonged, it is likely to affect the system a great deal.
For example, consider a data consisting of 1 billion records to be processed. Operation schedule would be obviously affected a great deal if error is detected in 999 millionth record and the processing so far is to be performed all over again.
To control this impact, process continuity unique to batch processing becomes very important Hence a system wherein error data skipped and next data record is processed, a system to restart the process and a system which attempts
auto-recovery become necessary. Further, it is important to simplify a job as much as possible and enable its easy execution later.

Can be executed flexibly according to triggers of execution (various activation methods)

A system to respond to various execution triggers is necessary when triggered by time, or by connecting online or connecting with external system.
various systems are widely known such as synchronous processing wherein processing starts when job scheduler reaches
scheduled time, asynchronous processing wherein the process is kept resident and batch processing
is performed as per the events.

It is important to handle various files like CSV/XML as well as databases for linking online and external systems.
Further, if a method which transparently handles respective input and output method exists, implementation becomes easier and to deal with various formats becomes more quickly.

2.1.3. Rules and precautions to be considered in batch processing

Important rules while building a batch processing system and a few considerations are shown.

Simplify unit batch processing as much as possible and avoid complex logical structures.

Keep process and data in physical proximity (Save data at the location where process is executed).

Minimise the use of system resources (especially I/O) and execute operations in in-memory as much as possible.

2.2.1. Overview

2.2.2. TERASOLUNA Batch Framework for Java (5.x) stack

Software Framework used in TERASOLUNA Batch Framework for Java (5.x) is a combination of OSS
focusing on Spring Framework (Spring Batch)
A stack schematic diagram of TERASOLUNA Batch Framework for Java (5.x) is shown below.

TERASOLUNA Batch Framework for Java (5.x) stack - schematic diagram

Descriptions for products like job scheduler and database are excluded from this guideline.

2.2.2.1. OSS version to be used

List of OSS versions to be used in 5.0.1.RELEASE of TERASOLUNA Batch Framework for Java (5.x) is given below.

OSS version to be used in TERASOLUNA Batch Framework for Java (5.x) as a rule conforms to definition of Spring IO platform.
Note that, version of Spring IO platform in 5.0.1.RELEASE is
Athens-SR2.+
For details of Spring IO platform, refer
OSS version to be used of TERASOLUNA Server Framework for Java (5.x).

Break down a fixed length record in individual field by number of bytes.

Control output of enclosed characters by variable length records.

2.3. Spring Batch Architecture

2.3.1. Overview

Spring Batch architecture acting as a base for TERASOLUNA Server Framework for Java (5.x) is explained.

2.3.1.1. What is Spring Batch

Spring Batch, as the name implies is a batch application framework.
Following functions are offered based on DI container of Spring, AOP and transaction control function.

Functions to standardize process flow

Tasket model

Simple process

It is a method to freely describe a process. It is used in a simple cases like issuing SQL once, issuing a command etc and the complex cases like performing processing while accessing multiple database or files, which are difficult to standardize.

Chunk model

Efficient processing of large amount of data

A method to collectively input/process/output a fixed amount of data. Process flow of data input/processing and
output is standardized and job can be implemented by implementing only a part of it.

Various activation methods

Execution is achieved by various triggers like command line execution, execution on Servlet and other triggers.

I/O of various data formats

Input and output for various data resources like file, database, message queue etc can be performed easily.

Efficient processing

Multiple execution, parallel execution, conditional branching are done based on the settings.

Job execution control

Permanence of execution, restart operation using data records as a standard can be performed.

2.3.1.2. Hello, Spring Batch！

If Spring Batch is not covered in understanding of Spring Batch architecture so far,
the official documentation given below should be read.
We would like you to get used to Spring Batch through creating simple application.

2.3.1.3. Basic structure of Spring Batch

Spring Batch defines structure of batch process. It is recommended to perform development after understanding the structure.

Primary components appearing in Spring Batch

Primary components appearing in Spring Batch

Components

Roles

Job

A single execution unit that summarises a series of processes for batch application in Spring Batch.

Step

A unit of processing which constitutes Job. 1 job can contain 1~N steps
Reusing a process, parallelization, conditional branching can be performed by dividing 1 job process in multiple steps.
Step is implemented by either chunk model or tasket model(will be described later).

JobLauncher

An interface for running a Job.
JobLauncher can be directly used by the user, however, a batch process can be started simply
by starting CommandLineJobRunner from java command.
CommandLineJobRunner undertakes various processes for starting JobLauncher.

ItemReader
ItemProcessor
ItemWriter

An interface for dividing into three processes - input/processing/output of data while implementing chunk model.
Batch application consists of processing of these 3 patterns and in Spring Batch, implementation of these interfaces
is utilized primarily in chunk model.
User describes business logic by dividing it according to respective roles.
Since ItemReader and ItemWriter responsible for data input and output are often the processes that perform conversion of database and files to Java objects and vice versa, a standard implementation is provided by Spring Batch.
In general batch applications which perform input and output of data from file and database, conditions
can be satisfied just by using standard implementation of Spring Batch as it is.
ItemProcessor which is responsible for processing data implements input check and business logic.

In Tasket model, ItemReader/ItemProcessor/ItemWriter substitutes a single Tasklet interface implementation. Input-Output, Input check and business logic all must be implemented in Tasklet.

JobRepository

A system to manage condition of Job and Step. The management information is persisted on the database based on the table schema specified by Spring Batch.

Main processing flow (black line) and the flow which persists job information (red line) are explained.

Main processing flow

JobLauncher is initiated from the job scheduler.

Job is executed from JobLauncher.

Step is executed from Job.

Step fetches input data by using ItemReader.

Step processes input data by using ItemProcessor.

Step outputs processed data by using ItemWriter.

A flow for persisting job information

JobLauncher registers JobInstance in Database through JobRepository.

JobLauncher registers that Job execution has started in Database through JobRepository.

JobStep updates miscellaneous information like counts of I/O records and status in Database through JobRepository.

JobLauncher registers that Job execution has completed in Database through JobRepository.

Components and JobRepository focusing on persistence are explained freshly again.

Components related to persistence

Components

Roles

JobInstance

Spring Batch indicates "logical" execution of a Job. JobInstance is identified by Job name and arguments.
In other words, execution with identical Job name and argument is identified as execution of identical JobInstance and Job is executed as a continuation from previous activation.
When the target Job supports re-execution and the process was suspended in between due to error in the previous execution, the job is executed from the middle of the process.
On the other hand, when the target job does not support re-execution or when the target JobInstance has already been successfully processed, exception is thrown and Java process is terminated abnormally.
For example, JobInstanceAlreadyCompleteException is thrown when the process has already been completed successfully.

JobExecution
ExecutionContext

JobExecution indicates "physical" execution of Job. Unlike JobInstance, it is termed as another JobExecution even while re-executing identical Job. As a result, JobInstance and JobExecution shows one-to-many relationship.
ExecutionContext is considered as an area for sharing metadata such as progress of a process in identical JobExecution.
ExecutionContext is primarily used for enabling Spring Batch to record framework status, however, means to access ExecutionContext by the application is also provided.
The object stored in the JobExecutionContext must be a class which implements java.io.Serializable.

StepExecution
ExecutionContext

StepExecution indicates "physical" execution of Step. JobExecution and StepExecution shows one-to-many relationship.
Similar to JobExecution, ExecutionContext is an area for sharing data in Step. From the viewpoint of localization of data, information which is not required to be shared by multiple steps should use ExecutionContext of target step
instead of using ExecutionContext of Job.
The object stored in StepExecutionContext must be a class which implements java.io.Serializable.

JobRepository

A function to manage and persist data for managing execution results and status of batch application like JobExecution or StepExecution is provided.
In general batch applications, the process is started by starting a Java process and Java process is also terminated along with termination of process.
Hence, since the data is likely to be referred across Java process, it is stored in volatile memory as well as permanent layers like database.
When data is to be stored in the database, database objects like table or sequence are required for storing JobExecution or StepExecution.
It is necessary to generate a database object based on schema information provided by Spring Batch.

Spring Batch heavily manages metadata in order to perform re-execution.
A snapshot at the time of earlier execution must be retained and metadata and JobRepository should be used as a base in order to re-execute a batch process.

2.3.2.2. Running a Job

How to run a Job is explained.

A scenario is considered wherein a batch process is started immediately after starting Java process and Java process is terminated after completing a batch process.
Figure below shows a process flow from starting a Java process till starting a batch process.

A shell script to start Java is generally described to start a Job defined on Spring Batch, along with starting a Java process.
When CommandLineJobRunner offered by Spring Batch is used, Job on Spring Batch defined by the user can be easily started.

Start command of the Job which use CommandLineJobRunner is as shown below.

CommandLineJobRunner can pass arguments (job parameters) as well along with Job name to be started.
Arguments are specified in <Job argument name>=<Value> format as per the example described earlier.
All the arguments are stored in JobExecution after conversion to JobParameters after interpreting and checking by CommandLineJobRunner or JobLauncher.
For details, refer to running parameter of Job.

Register and restore JobInstance

JobLauncher fetches Job name from JobRepository and JobInstance matching with the argument from the database.

When corresponding JobInstance does not exist, JobInstance is registered as new.

When corresponding JobInstance exists, the associated JobExecution is restored.

In Spring Batch, for the jobs that can be executed repeatedly like daily execution etc, a method to add arguments only for making the JobInstance unique is listed.
For example, adding system date or random number to arguments are listed.
For the method recommended in this guideline, refer parameter conversion class.

2.3.2.3. Execution of business logic

Job is divided into smaller units called steps in Spring Batch.
When Job is started, Job activates already registered steps and generates StepExecution.
Step is a framework for dividing the process till the end and execution of business logic is delegated to Tasket called from Step.

Flow from Step to Tasklet is shown below.

Process flow from Step to Tasklet

A couple of methods can be listed as the implementation methods of Tasklet - "Chunk model" and "Tasket model".
Since the overview has already been explained, the structure will be now explained here.

2.3.2.3.1. Chunk model

As described above, chunk model is a method wherein the processing is performed in a certain number of units (chunks) rather than processing the data to be processed one by one unit.
ChunkOrientedTasklet acts as a concrete class of Tasklet which supports the chunk processing.
Maximum records of data to be included in the chunk (hereafter referred as "chunk size") can be adjusted by using setup value called commit-interval of this class.
ItemReader, ItemProcessor and ItemWriter are all the interfaces based on chunk processing.

Next, explanation is given about how ChunkOrientedTasklet calls the ItemReader, ItemProcessor and ItemWriter.

ChunkOrientedTasklet repeatedly executes ItemReader and ItemProcessor by the chunk size, in other words, reading and processing of data.
After completing reading all the data of chunks, data writing process of ItemWriter is called only once and all the processed data in the chunks is passed.
Data update processing is designed to be called once for chunks to enable easy organising like addBatch and executeBatch of JDBC.

Next, ItemReader, ItemProcessor and ItemWriter which are responsible for actual processing in chunk processing are introduced.
Although it is assumed that the user handles his own implementation for each interface, it can also be covered by a generic concrete class provided by Spring Batch.

Especially, since ItemProcessor describes the business logic itself, the concrete classes are hardly provided by Spring Batch.
ItemProcessor interface is implemented while describing the business logic.
ItemProcessor is designed to allow types of objects used in I/O to be specified in respective generics so that typesafe programming is enabled.

Various concrete classes are offered by Spring Batch for ItemReader or ItemWriter and these are used quite frequently.
However, when a file of specific format is to be input or output, a concrete class which implements individual ItemReader or ItemWriter can be created and used.

Read flat files (non-structural files) like CSV file. Mapping rules for delimiters and objects can be customised by using Resource object as input.

StaxEventItemReader

Read XML file. As the name implies, it is an implementation which reads a XML file based on StAX.

JdbcPagingItemReader
JdbcCursorItemReader

Execute SQL by using JDBC and read records on the database. When a large amount of data is to be processed on the database, it is necessary to avoid reading all the records on memory, and to read and discard only the data necessary for one processing.
JdbcPagingItemReader is implemented by dividing SELECT SQL for each page by using JdbcTemplate and then issuing the same. On the other hand, JdbcCursorItemReader is implemented by issuing one SELECT SQL by using JDBC cursor. Using MyBatis is considered as a base in TERASOLUNA Batch 5.x.

MyBatisCursorItemReader
MyBatisPagingItemReader

Read records on the database in coordination with MyBatis. Spring coordination library offered by MyBatis is provided by MyBatis-Spring. For the difference between Paging and Cursor, it is same as JdbcXXXItemReader except for using MyBatis for implementation.+
In addition, JpaPagingItemReader, HibernatePagingItemReader and HibernateCursor are provided which reads records on the database by coordinating with ItemReaderJPA implementation or Hibernate.

Using MyBatisCursorItemReader is considered as a base in TERASOLUNA Batch 5.x.

JmsItemReader
AmqpItemReader

Receive messages from JMS or AMQP and read the data contained in the same.

ItemProcessor

PassThroughItemProcessor

No operation is performed. It is used when processing and modification of input data is not required.

ValidatingItemProcessor

Performs input check. It is necessary to implement Spring Batch specific org.springframework.batch.item.validator.Validator for the implementation of input check rules, however,
SpringValidator which is an adaptor of a generalorg.springframework.validation.Validator offered by Spring is provided
and rules of org.springframework.validation.Validator can be used. Use of ValidatingItemProcessor is prohibited in TERASOLUNA Batch 5.x.
For details, refer Input check.

CompositeItemProcessor

Sequentially execute multiple ItemProcessor for identical input data. It is enabled when business logic is to be executed after performing input check using ValidatingItemProcessor.

ItemWriter

FlatFileItemWriter

Write processed Java object as a flat file like CSV file. Mapping rules for file lines can be customised from delimiters and objects.

StaxEventItemWriter

Write processed Java object as a XML file.

JdbcBatchItemWriter

Execute SQL by using JDBC and output processed Java object to database. Internally JdbcTemplate is used.

MyBatisBatchItemWriter

Coordinate with MyBatis and output processed Java object to the database. It is provided by Spring coordination library MyBatis-Spring offered by MyBatis. JPA implementation or JpaItemWriter and HibernateItemWriter for Hibernate is not used in TERASOLUNA Batch 5.x.

JmsItemWriter
AmqpItemWriter

Send a message of a processed Java object with JMS or AMQP.

PassThroughItemProcessor omitted

When a job is defined in XML, ItemProcessor setting can be omitted.
When it is omitted, input data is passed to ItemWriter without performing any operation similar to PassThroughItemProcessor.

2.3.2.3.2. Tasket model

Chunk model is a framework suitable for batch applications that read multiple input data one by one and perform a series of processing.
However, a process which does not fit with the type of chunk processing is also implemented.
For example, when system command is to be executed and when only one record of table for control is to be updated.

In such a case, merits of efficiency obtained by chunk processing are very less
and demerits owing to difficult design and implementation are significant. Hence, it is rational to use tasket model.

It is necessary for the user to implement Tasket interface provided by Spring Batch while using a Tasket model.
Further, following concrete class is provided in Spring Batch, subsequent description is not given in TERASOLUNA Batch 5.x.

Concrete class of Tasket offered by Spring Batch

Class name

Overview

SystemCommandTasklet

Tasket to execute system commands asynchronously. Command to be specified in the command property is specified.
Since the system command is executed by a thread different from the thread for calling, it is possible to set a timeout and cancel the execution thread of the system command during the process.

MethodInvokingTaskletAdapter

Tasket for executing specific methods of POJO class. Specify Bean of target class in targetObject property and name of the method to be executed in targetMethod property.
POJO class can return batch process termination status as a return value of the method, however then the ExitStatus described later must be set as a return value.
When a value of another type is returned, the status is considered as "normal termination (ExistStatus: COMPLETED) regardless of the return value.

2.3.2.4. Metadata schema of JobRepository

Spring Batch metadata table corresponds to a domain object (Entity object) which are represented by Java.

Correspondence list

Table

Entity object

Overview

BATCH_JOB_INSTANCE

JobInstance

Retains the string which serialises job name and job parameter.

BATCH_JOB_EXECUTION

JobExecution

Retains job status and execution results.

BATCH_JOB_EXECUTION_PARAMS

JobExecutionParams

Retains job parameters assigned at the startup.

BATCH_JOB_EXECUTION_CONTEXT

JobExecutionContext

Retains the context inside the job.

BATCH_STEP_EXECUTION

StepExecution

Retains status and execution results of step, number of commits and rollbacks.

BATCH_STEP_EXECUTION_CONTEXT

StepExecutionContext

Retains context inside the step.

JobRepository is responsible for accurately storing the contents stored in each Java object, in the table.

Regarding the character string stored in the meta data table

Character string stored in the meta data table allows only a restricted number of characters and when this limit is exceeded, character string is truncated.
Note that, multibyte characters are not taken into consideration in Spring Batch and an error is likely to occur in DDL of meta data table offered by Spring Batch even if character string to be stored is within the character limit.
It is necessary to extend the size by encoding using a column of meta data table, and to set the character data type in character count definition, in order to store multibyte characters.

Oracle Schema provided by Spring Batch offers a DDL for Oracle in TERASOLUNA Batch 5.x that explicitly sets character data type in character count definition
since character data type of database is defined by default number of bytes.
DDL to be offered is included in the org.terasoluna.batch package which is included in jar of TERASOLUNA Batch 5.x.

6 ERD models of all the tables and interrelations are shown below.

ER diagram

2.3.2.4.1. Version

Majority of database tables contain version columns.
This column is important since Spring Batch adopts an optimistic locking strategy to handle updates to database.
This record signifies that it is updated when the value of the version is incremented.
When JobRepository updates the value and the version number is changed, an OptimisticLockingFailureException which indicates an occurrence of simultaneous access error is thrown.
Other batch jobs may be running on a different machines, however, all the jobs use the same database, hence this check is required.

2.3.2.4.2. ID (Sequence) definition

BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION and BATCH_STEP_EXECUTION all contain column ending with _ID.
These fields act as a primary key for respective tables.
However, these keys are not generated in the database but are rather generated in a separate sequence.
After inserting one of the domain objects in the database, the keys which assign the domain objects should be set in the actual objects so that they can be uniquely identified in Java.
Sequences may not be supported depending on the database. In this case, a table is used instead of each sequence.

2.3.2.4.3. Table definition

Explanation is given for each table item.

BATCH_JOB_INSTANCE

BATCH_JOB_INSTANCE table retains all the information related to JobInstance and is at top level of the overall hierarchy.

Job name. A non-null value since it is necessary for identifying an instance.

JOB_KEY

JobParameters which are serialised for uniquely identifying same job as a different instance.
JobInstances with the same job name must contain different JobParameters (in other words, varying JOB_KEY values).

BATCH_JOB_EXECUTION

BATCH_JOB_EXECUTION table retains all the information related to JobExecution object.
When a job is executed, new rows are always registered in the table with new JobExecution.

Foreign key from BATCH_JOB_INSTANCE table which shows an instance wherein the job execution belongs.
Multiple executions are likely to exist for each instance.

CREATE_TIME

Time when the job execution was created.

START_TIME

Time when the job execution was started.

END_TIME

Indicates the time when the job execution was terminated regardless of whether it was successful or failed.
Even though the job is not running currently, the column value is empty which indicates there are several error types and the framework was unable to perform last save operation.

STATUS

A character string which indicates job execution status. It is a character string output by BatchStatus enumeration object.

EXIT_CODE

A character string which indicates an exit code of job execution. When it is activated by CommandLineJobRunner, it can be converted to a numeric value.

EXIT_MESSAGE

A character string which explains job termination status in detail.
When a failure occurs, a character string that includes as many as stack traces as possible is likely.

LAST_UPDATED

Time when job execution of the record was last updated.

BATCH_JOB_EXECUTION_PARAMS

BATCH_JOB_EXECUTION_PARAMS table retains all the information related to JobParameters object.
It contains a pair of 0 or more keys passed to the job and the value and records the parameters by which the job was executed.

A character string which indicates that the data type is string, date, long or double.

KEY_NAME

Parameter key.

STRING_VAL

Parameter value when data type is string.

DATE_VAL

Parameter value when data type is date.

LONG_VAL

Parameter value when data type is an integer.

DOUBLE_VAL

Parameter value when data type is a real number.

IDENTIFYING

A flag which indicates that the parameter is a value to identify that the job instance is unique.

BATCH_JOB_EXECUTION_CONTEXT

BATCH_JOB_EXECUTION_CONTEXT table retains all the information related to ExecutionContext of Job.
It contains all the job level data required for execution of specific jobs.
The data indicates the status that must be fetched when the process is to be executed again after a job failure and enables the failed job to start from the point where processing has stopped.

BATCH_STEP_EXECUTION table retains all the information related to StepExecution object.
This table very similar to BATCH_JOB_EXECUTION table in many ways. When each JobExecution is created, at least one entry exists for each Step.

Indicates time when step execution ends regardless of whether it is successful or failed.
Even though the job is not running currently, the column value is empty which indicates there are several error types and the framework was unable to perform last save operation.

STATUS

A character string that represents status of step execution. It is a string which outputs BatchStatus enumeration object.

COMMIT_COUNT

Number of times a transaction is committed.

READ_COUNT

Data records read by ItemReader.

FILTER_COUNT

Data records filtered by ItemProcessor.

WRITE_COUNT

Data records written by ItemWriter.

READ_SKIP_COUNT

Data records skipped by ItemReader.

WRITE_SKIP_COUNT

Data records skipped by ItemWriter.

PROCESS_SKIP_COUNT

Data records skipped by ItemProcessor.

ROLLBACK_COUNT

Number of times a transaction is rolled back.

EXIT_CODE

A character string which indicates exit code for step execution. When it is activated by using CommandLineJobRunner, it can be changed to a numeric value.

EXIT_MESSAGE

A character string which explains step termination status in detail.
When a failure occurs, a character string that includes as many as stack traces as possible is likely.

LAST_UPDATED

Time when the step execution of the record was last updated.

BATCH_STEP_EXECUTION_CONTEXT

BATCH_STEP_EXECUTION_CONTEXT table retains all the information related to ExecutionContext of Step.
It contains all the step level data required for execution of specific steps.
The data indicates the status that must be fetched when the process is to be executed again after a job failure and enables the failed job to start from the point where processing has stopped.

2.3.2.4.4. DDL script

JAR file of Spring Batch Core contains a sample script which creates a relational table corresponding to several database platforms.
These scripts can be used as it is or additional index or constraints can be changed as required.
The script is included in the package of org.springframework.batch.core and the file name is configured by schema-*.sql.
"*" is the short name for Target Database Platform..

2.3.2.5. Typical performance tuning points

Typical performance tuning points in Spring Batch are explained.

Adjustment of chunk size

Chunk size is increased to reduce overhead occurring due to resource output.
However, if chunk size is too large, it increases load on the resources resulting in deterioration in the performance. Hence, chunk size must be adjusted to a moderate value.

Adjustment of fetch size

Fetch size (buffer size) for the resource is increased to reduce overhead occurring due to input from resources.

Reading of a file efficiently

When BeanWrapperFieldSetMapper is used, a record can be mapped to the Bean only by sequentially specifying Bean class and property name.
However, it takes time to perform complex operations internally. Processing time can be reduced by using dedicated FieldSetMapper interface implementation which performs mapping.
For file I/O details, refer "File access".

Parallel processing, Multiple processing

Spring Batch supports parallel processing of Step execution and multiple processing by using data distribution. Parallel processing or multiple processing can be performed and the performance can be improved by running the processes in parallel.
However, if number of parallel processes and multiple processes is too large, load on the resources increases resulting in deterioration of performance. Hence, size must be adjusted to a moderate value.
For details of parallel and multiple processing, refer "parallel processing and multiple processing.

Reviewing distributed processing

Spring Batch also supports distributed processing across multiple machines. Guidelines are same as parallel and multiple processing.
Distributed processing will not be explained in this guideline since the basic design and operational design are complex.

If a single business logic is complex and large-scale, the business logic is divided into units.
As clear from the schematic diagram, since only one ItemProcessor can be set in 1 step, it looks like the division of business logic is not possible.
However, since CompositeItemProcssor which is an ItemProcessor consisting of multiple ItemProcessors exist,
the business logic can be divided and executed by using this implementation.

2.4.3. How to implement Step

2.4.3.1. Chunk model

Definition of chunk model and purpose of use are explained.

Definition

ItemReader, ItemProcessor and ItemWriter implementation and number of chunks are set in ChunkOrientedTasklet. Respective roles are explained.

ChunkOrientedTasklet…​Call ItemReader/ItemProcessor and create a chunk. Pass created chunk to ItemWriter.

2.4.4. Running a job method

2.4.4.1. Synchronous execution method

Synchronous execution method is an execution method wherein the control is not given back to the boot source from job start to job completion.

A schematic diagram which starts a job from job scheduler is shown.

Schematic diagram for synchronous execution

Start a shell script to run a job from job scheduler.
Job scheduler waits until the exit code (numeric value) is returned.

Start CommandLineJobRunner to run a job from shell script.
Shell script waits until CommandLineJobRunner returns an exit code (numeric value).

CommandLineJobRunner runs a job. Job returns an exit code (string) to CommandLineJobRunner after processing is completed.CommandLineJobRunner converts exit code (string) returned from the job to exit code (numeric value) and returns it to the shell script.

2.4.4.2. Asynchronous execution method

Asynchronous execution method is an execution method wherein the control is given back to boot source immediately after running a job, by executing a job on a different execution base than boot source (a separate thread etc).
In this method, it is necessary to fetch job execution results by a means different from that of running a job.

Following 2 methods are explained in TERASOLUNA Batch Framework for Java (5.x).

DB polling function periodically monitors the registration of the job request and executes the corresponding job when the registration is detected.

Run the job from SimpleJobOperator and receive JobExecutionId after completion of the job.

JobExecutionId is an ID which uniquely identifies job execution and execution results are browsed from JobRepository by using this ID.

Job execution results are registered in JobRepository by using Spring Batch system.

DB polling is itself executed asynchronously.

DB polling function updates JobExecutionId returned from SimpleJobOperator and the job request that started the status.

Job process progress and results are referred separately by using JobExecutionId.

2.4.4.2.2. Asynchronous execution method (Web container)

"Asynchronous execution (Web container)" is a method
wherein a job is executed asynchronously using the request sent to web application on the web container as a trigger.*
A Web application can return a response immediately after starting without waiting for the job to end.

Web container schematic diagram

Send a request from a client to Web application.

Web application asynchronously executes the job requested from a request.

Receive `JobExecutionId immediately after starting a job from SimpleJobOperator.

Job execution results are registered in JobRepository by using Spring Batch system.

Web application returns a response to the client without waiting for the job to end.

Job process progress and results are browsed separately by using JobExecutionId.

3.1. Development of batch application

3.1.1. What is blank project

Blank project is the template of development project wherein various settings are made in advance such as Spring Batch, MyBatis3
and is the start point of application development.
In this guideline, a blank project with a single project structure is provided.
Refer to Project structure for the explanation of structure.

Difference from TERASOLUNA Server 5.x

Multi-project structure is recommended forTERASOLUNA Server 5.x.
The reason is mainly to enjoy the following merits.

Makes the environmental differences easier to absorb

Makes separation of business logic and presentation easier

However, in this guideline, a single project structure is provided unlike TERASOLUNA Server 5.x.

This point should be considered for batch application also, however,
by providing single project structure, accessing the resources related to one job is given priority.
In case of batch application,
one of the reason is that there are many cases when environment differences can be switched by property file or environment variables.

3.1.2. Creation of project

How to create a project using archetype:generate of Maven Archetype Plugin is explained.

Regarding prerequisites of creating environment

Prerequisites are explained below.

Java SE Development Kit 8

Apache Maven 3.x

Internet should be connected

When connecting to the Internet via proxy, Maven proxy setting should be done

IDE

Spring Tool Suite / Eclipse etc.

Considerations after creating a project

Version of TERASOLUNA Batch 5.x defined in the generated pom.xml must be changed from 5.0.1-SNAPSHOT to 5.0.1.RELEASE.

3.1.3. Project structure

Project structure that was created above, is explained.
Project structure should be made by considering the following points.

Implement the job independent of startup method

Save the efforts of performing various settings such as Spring Batch, MyBatis

Make the environment dependent switching easy

The structure is shown and each element is explained below.
(It is explained based on the output at the time of executing the above mvn archetype:generate to easily understand.)

Directory configuration of project

Explanation of each element of blank project

Sr. No.

Explanation

(1)

root package that stores various classes of the entire batch application.

(2)

Package that stores various classes of 1 job.
It stores DTO, implementation of Tasklet and Processor, Mapper interface of MyBatis3.
Since there are no restrictions on how to store in this guideline, refer to this as an example.

You can customize it with reference to default state however,
consider making it easier to judge the resources specific to job.

(3)

Configuration file of the entire batch application.
In the default state, the settings related to database connection and asynchronous execution are set up.
You can add by referring default.

(4)

Configuration file of Logback(log output).

(5)

Configuration file that defines messages to be displayed when an error occurs during the input check using BeanValidation.
In the default state, after defining default messages of BeanValidation and HibernateValidator that is its implementation,
Comment-out All is done.
In this state, since default messages are used, it should be modified to any message by Comment-in
only when you want to customize the messages.

(6)

Mapper XML file that pairs with Mapper interface of MyBatis3.

(7)

Property file that defines messages used mainly for log output.

(8)

Directory that stores job-specific Bean definition file.
The hierarchical structure can be configured according to the number of jobs.

(9)

Directory that stores Bean definition file related to the entire batch application.
It is set to start a job regardless of default setting of Spring Batch or MyBatis or start trigger such as synchronous / asynchronous.

In pom.xml, define dependency relation of JDBC driver for connecting to the database to be used.
In the default state, H2 Database(in-memory database) and PostgreSQL are set, however add/delete should be performed whenever required.

(2)

Set JDBC driver connection.
- admin.jdbc.xxx is used by Spring Batch and TERASOLUNA Batch 5.x
- jdbc.xxx～ is used in individual job

(3)

Define whether or not to execute the initialization of database used by Spring Batch or TERASOLUNA Batch 5.x, and the script to be used.
Since Spring Batch accesses JobRepository and
TERASOLUNA Batch 5.x accesses job request table in the asynchronous execution(DB Polling),
database is mandatory.
Whether to enable it, is based on the following.
- Enable it when H2 Database is to be used. If disabled, JobRepositoryandjob request table* cannot be accessed and an error occurs.
- When not using H2 Database, disable it to prevent accidents.

Here, Whether to include environment dependent configuration file is switched.
By utilizing this setting, it is possible to absorb the environmental difference by separately placing the configurationfile at the time of environment deployment.
Moreover, by applying this, it is possible to change the configuration file to be included in Jar in the test environment and the commercial environment.
An example is shown below.

Description example of pom.xml for switching configuration file for each environment

In the actual system,
rather than issuing a java command directly when issuing a job from the job scheduler,
It is common to start by inserting shell script for starting java.

This is for setting the environment variables before starting the java command and for handling the exit code of the java command.
It is recommended that Handling of the exit code of the java command should always be done for the following reasons.

The normal exit code of the java command is 0 and abnormal is 1. The job scheduler judges the success / failure of the job within the range of the exit code.
Depending on the settings of the job scheduler, it judges as 'Normal end' irrespective of the fact that the java commandended abnormally.

The exit code that can be handled by OS and job scheduler has finite range.

It is important to define the range of the exit code to be used by the user according to the specifications of the OS and job scheduler.

Generally, it is in the range of 0 to 255 which is defined by the POSIX standards.

In {batch 5 _ shortname}, it is set to return the normal exit code as 0 or otherwise, 255.

Interface to fetch data from various resources.
Since implementation for flat files and database is provided by Spring Batch,
there is no need for the user to create it.

2

ItemProcessor

Interface for processing data from input to output.
The user implements this interface whenever required and implements business logic.

3

ItemWriter

Interface for the output of data to various resources.
An interface paired with ItemReader.
Since implementation for flat files and database is provided by Spring Batch,
there is no need for the user to create it.

The points in this table are as follows.

If the data is to be only transferred from input resource to output resource in a simple way, it can be implemented only by setting.

ItemProcessor should be implemented whenever required.

Hereafter, how to implement the job using these components, is explained.

3.2.2. How to use

How to implement chunk model job is explained in the following order here.

Job configuration.id attribute must be unique for all the jobs included in 1 batch application.

(9)

JobRepository configuration.
The value set in the job-repository attribute should be fixed to jobRepository unless there is a special reason.
This will allow all the jobs to be managed by 1 JobRepository.
Resolve Bean definition of jobRepository by (1).

(10)

Step configuration.
Although it is not necessary to use a unique id attribute for all the jobs in 1 batch application, a unique id is used for enabling easy tracking at the time of failure occurrence.
A format of [step+serial number] is used for id attribute specified in (8) unless for a specific reason to use a different format.

(11)

Tasklet configuration.
The value set in the transaction-manager attribute should be fixed to jobTransactionManager unless there is a special reason.
This will allow the transaction to be managed for each commit-interval of (12).
For details, refer to Transaction control.
Resolve Bean definition of jobTransactionManager by (1).

In the above example, 10 records are used however, exact count differs with the characteristics of available machine resource and job.
In case of a job that processes data by accessing multiple resources, the process throughput may reach to 100 records from 10 records. If input/output resource is of 1:1 correspondence and there is a job of transferring data,
then the process throughput may increase to 5000 records or even to 10000 records.

Temporarily set commit-interval` to 100 records at the time of implementing the job,
and then perform tuning of each job as per the result of performance measurement.

3.2.2.2. Implementation of components

3.2.2.2.1. Implementation of ItemProcessor

How to implement ItemProcessor is explained.

ItemProcessor is responsible for creating 1 record data for the output resource
based on the 1 record data fetched from the input resource as shown in the interface below.
In other words, ItemProcessor is where business logic for 1 record data is implemented.

The interface indicating I and O can be of same type or of different type as shown below.
Same type means modifying input data partially.
Different type means to generate output data based on the input data.

Example of implementation of ItemProcessor(Input/Output is of same type)

Return of null from ItemProcessor means the data is not passed to the subsequent process (Writer).
In other words, the data is filtered.
This can be effectively used to validate the input data.
For detail, refer to Input check.

To increase process throughput of ItemProcessor

As shown in the previous implementation example, the implementation class of ItemProcessor should access resources such as DB and files.
Since ItemProcessor is executed for each record of input data, even if there is small I/O, large I/O occurs in the entire job,
so it is important to suppress I/O as much as possible for increasing process throughput.

One method is to store the required data in memory in advance by utilizing Listener to be mentioned later
and implement most of the processing in ItemProcessor so that it completes between CPU/ memory.
However, since it consumes a large amount of memory per job, its not that anything can be stored in the memory.
The data to be stored in memory based on I/O frequency and data size should be studied.

3.3. Creation of tasklet model job

3.3.1. Overview

3.3.1.1. Components

Tasklet model job does not register multiple components.
It only implements org.springframework.batch.core.step.tasklet.Tasklet and sets it in Bean definition.
ItemReader and ItemWriter which are components of the chunk model can also be used as components as the advanced implementation means.

3.3.2. HowToUse

How to implement tasklet model job is explained in the following order here.

Import the settings to always read the required Bean definition when using TERASOLUNA Batch 5.x.

(2)

Enable Bean definition using annotation. Use it with (3).

(3)

Set base package of component-scan target. Use it with (2).
In the tasklet model, Bean is defined by annotation however, Bean definition of tasklet implementation class is not required in XML.

(4)

Job configuration.id attribute must be unique for all the jobs included in 1 batch application.

(5)

JobRepository configuration.
The value set in the job-repository attribute should be fixed to jobRepository unless there is a special reason.
This will allow all the jobs to be managed in one JobRepository.
Resolve Bean definition of jobRepository by (1).

(6)

Step configuration.
Although it is not necessary to use a unique id attribute for all the jobs in 1 batch application, a unique id is used for enabling easy tracking at the time of failure occurrence.
A format of [step+serial number] is used for id attribute specified in (4) unless for a specific reason to use a different format.

(7)

Tasklet configuration.
The value set in the transaction-manager attribute should be fixed to jobTransactionManager unless there is a special reason.
This will manage the processes of the entire tasklet in one transaction.
For details, refer to Transaction control.
Resolve Bean definition of jobTransactionManager by (1).

Also, ref attribute specifies a Bean ID of Tasklet implementation class to be resolved by (3).SimpleJobTasklet, the tasklet implementation class name should be simpleJobTasklet with the first letter in lower case.

Bean name when using annotation

Bean name when using @Component annotation is generated through
org.springframework.context.annotation.AnnotationBeanNameGenerator.
Refer to Javadoc of this class when you want to confirm the naming rules.

3.3.2.2. Implementation of tasklet

First, understand the overview with simple implementation, then proceed to implementation using the components of the chunk model.

Implement execute method to be defined by Tasklet interface.
Arguments StepContribution, ChunkContext are used however, they are not explained here.

(3)

Implement any process. INFO log is output here.

(4)

Return whether or not the tasklet process is completed.
Always specify as return RepeatStatus.FINISHED;.

3.3.2.4. Implementation of tasklet using the components of chunk model

Spring Batch does not mention using various components of chunk model during tasklet implementation.
In TERASOLUNA Batch 5.x, you may select this depending on the following situations.

When multiple resources are combined and processed, it is difficult to be as per the chunk model format

When processes are implemented at various places in the chunk model, tasklet model is better to understand the overall image easily

When recovery is made simple and you want to use batch commit of tasklet model instead of intermediate commit of chunk model

Note that, processing units should also be considered to implement Tasklet by using components of chunk model.
Following 3 patterns can be considered as units of output records.

Units and features of output records

Output records

Features

1 record

Since data is input, processed and output one by one for each record, processing images is easy.
It must be noted that performance deterioration is likely to occur due to frequent I/O in case of large amount of data.

All records

Data is input and processed one by one for each record and stored in the memory, all records are output together in the end.
Data consistency can be ensured and performance can be improved in case of small amount of data.
However, it must be noted that high load is likely to be applied on resources (CPU, memory) in case of large amount of data.

Fixed records

Data is input and processed one by one for each record and stored in the memory, data is output when a certain number of records are reached.
Performance improvement is anticipated by efficiently processing large amount of data with certain resources (CPU, memory).
Also, since the data is processed for a fixed number of records, intermediate commit can also be employed by implementing transaction control.
However, it must be noted that, processed and unprocessed data are likely to exist together in the recovery if the job has terminated abnormally, in case of intermediate commit method.

The tasklet implementation that uses ItemReader and ItemWriter which are the components of the chunk model is explained below.

The implementation example shows processing data one by one for each record.

Tasklet implementation example that uses the components of chunk model

Set the same step scope as the Bean scope of ItemReader to be used in this class.

(2)

Access input resources (flat files in this example) through ItemReader.
Specify Bean name as detailCSVReader but it is optional for clarity purpose.

(3)

Define the type as ItemStreamReader that is sub-interface of ItemReader.
This is because it is necessary to open/close the resource of (5), (8).
It is supplemented later.

(4)

Access output resources (database in this example) through Mapper of MyBatis.
Mapper is directly used for the sake of simplicity. There is no need to always use ItemWriter.
Of course, MyBatisBatchItemWriter can be used.

(5)

Open input resource.

(6)

Loop all input resources sequentially.ItemReader#read returns null when it reads all the input data and reaches the end.

(7)

Output to database.

(8)

Resource should be closed without fail.
Exception handling should be implemented.
When an exception occurs, the transactions of the entire tasklet are rolled-backed,
stack trace of exception is output and the job terminates abnormally.

The scope of tasklet implementation class and Bean to be Injected should have the same scope.

For example, if FlatFileItemReader receives an input file path from an argument, the Bean scope should be step.
In this case, the scope of tasklet implementation class should also be step.

If the scope of tasklet implementation class is set to singleton temporarily,
after instantiating the tasklet implementation class at the time of generating ApplicationContext at application startup
if it tries to Inject by resolving the instance of FlatFileItemReader,
FlatFileItemReader will be in the step scope, however will not exist yet. because it is to be generated at the time of step execution.
As the result, the tasklet implementation class cannot be instantiated and fails to generate ApplicationContext.

Regarding the type of field assigned with @Inject

Any one of the following type depending on the implementation class to be used.

ItemReader/ItemWriter

Used when there is no need to open/close the target resource.

ItemSteamReader/ItemStreamWriter

Used when there is a need to open/close the target resource.

It should be decided which type to use after confirming javadoc. Typical examples are shown below.

In case of FlatFileItemReader/Writer

handle by ItemSteamReader/ItemStreamWriter

In case of MyBatisCursorItemReader

handle by ItemStreamReader

In case of MyBatisBatchItemWriter

handle by ItemWriter

The implementation example imitates a chunk model to process a certain number of records

ItemWriter outputs a fixed number of records collectively.
It processes and output 10 records each.

(3)

As per the behavior of chunk model,
it should be read→process→read→process→…​→write.

(4)

Output through ItemWriter collectively.

Decide each time whether to use the implementation class of ItemReader or ItemWriter.
For file access, the implementation class of ItemReader and ItemWriter can be used.
For other than this such as database access, there is no need to use compulsorily. It can be used to improve performance.

3.4. How to choose chunk model or tasklet model

Here, how to choose chunk model and tasklet model is explained by organizing each feature.
Refer to the following chapters which are explained in detail appropriately.

Understand the following contents as examples of concepts without any constraints or recommendations.
Refer to it while creating a job depending on the characteristics of the users and systems.

The main differences between the chunk model and the tasklet model are given below.

Comparison of chunk model and tasklet model.

Item

Chunk

Tasklet

Components

It consists of 3 components mainly ItemReader, ItemProcessor and ItemWriter.

It is consolidated in one Tasklet.

Transaction

A certain number of records are processed by issuing intermediate commit. Batch commit cannot be done.
It can be processed by specific machine resources regardless of the data count.
If an error occurs in the midway, then unprocessed data and processed data will get mixed.

The data is entirely processed by batch commit. There is a need for the user to implement intermediate commit.
If the data to be processed is large, machine resources may get exhausted.
If an error occurs in the midway, only the unprocessed data is rolled back.

Restart

It can be restarted based on the record count.

It cannot be restarted based on the record count.

Based on this, we will introduce some examples of using each one as follows.

To make recovery as simple as possible

When the job having error, is to be recovered by only re-running the target job,
tasklet model can be chooseed to make recovery simple.
In chunk model, it should be dealt by returning the processed data
to the state before executing the job and
by creating a job to process only the unprocessed data.

To consolidate the process contents

When you want to prioritize the outlook of job such as 1 job in 1 class, tasklet can be chooseed.

To process large data stably

When performing batch process of 10 million records, consider to use chunk model in case the record count that influences the resources is the target.
It means stabilizing the process by intermediate commit.
Even in tasklet model, intermediate commit can be used, but it is simpler to implement in chunk model.

To restart based on the record count for the recovery after error

When batch window is difficult and you want to resume from error data onwards,
chunk model should be chooseed to use restart based on the record count provided by Spring Batch.
This eliminates the need to create that mechanism for each job.

Chunk model and tasklet model are basically used in combination.
It is not necessary to implement only one model in all jobs in the batch system.
Use one model based on the characteristics of jobs of the entire system and use the other model in accordance with the situation.

For example, in most cases it is to choose a tasklet model if there is a margin in the number of processesing records and processing time.
In a very small number of cases, choosing a chunk model for jobs that process large numbers of records.

4. Running a job

4.1. Synchronous job

4.1.1. Overview

Synchronous job is explained.
Synchronous job is the execution method of launching a new process through shell by job scheduler and returning the execution result of the job to the caller.

Overview of synchronous job

Sequence of synchronous job

The usage method of this function is same in the chunk model as well as tasklet model.

4.1.2. How to use

How to running a job by CommandLineJobRunner is explained.

Refer to Create project for building and executing the application.
Refer to Job parameters for how to specify and use job parameters.
Some explanation given in the above reference and in this section overlap however, the elements of synchronous job are mainly explained.

4.1.2.1. How to run

In TERASOLUNA Batch 5.x, run the synchronous job using CommandLineJobRunner provided by Spring Batch.
Start CommandLineJobRunner by issuing java command as shown below.

Abandons a stopped job. The abandoned job cannot be restarted.
In TERASOLUNA Batch 5.x, there is no case of using this option, hence it is not explained.

-next

Runs the job executed once in the past, again. However, in TERASOLUNA Batch 5.x, this option is not used.
In TERASOLUNA Batch 5.x, it is for avoiding the restriction "Running the job by the same parameter is recognized as the same job and the same job can be executed only once"
that is given by default in Spring Batch.
The details are explained in regarding parameter conversion class.
For using this option, implementation class of JobParametersIncrementer interface is required,
it is not set inTERASOLUNA Batch 5.x.
Therefore, when this option is specified and launched, an error occurs because the required Bean definition does not exist.

4.2. Job parameters

4.2.1. Overview

This section explains about using the job parameter (hereafter referred to as 'parameter').

The usage method of this function is same in the chunk model as well as tasklet model.

A parameter is used to flexibly switch the operation of the job according to the execution environment and execution timing as shown below.

The specified parameters can be referred in Bean definition or in Java under Spring management.

4.2.2. How to use

4.2.2.1. Regarding parameter conversion class

In Spring Batch, the received parameters are processed in the following sequence.

The implementation class of JobParametersConverter convert to JobParameters.

Refer to the parameters from JobParameters in Bean definition and Java under Spring management.

Regarding implementation class of parameter conversion class

Multiple implementation classes of the above mentioned JobParametersConverter are provided.
The features of each class are shown below.

DefaultJobParametersConverter

It can specify the data type of parameters(4 types; String, Long, Date, Double).

JsrJobParametersConverter

It cannot specify the data type of parameters (Only String).

It assigns ID (RUN_ID) that identifies job execution to parameter with the name jsr_batch_run_id automatically.

It increments the RUN_ID each time the job is executed. Since it uses SEQUENCE (name is JOB_SEQ) of the database for incrementing, the name does not overlap.

In Spring Batch, runnaing the job by the same parameters is identified as the same job and the same job can be executed only once.
Whereas, adding a unique value to the parameter name jsr_batch_run_id will recognize it as a separate job.
Refer to Spring Batch architecture for details.

In Spring Batch, when the implementation class of JobParametersConverter to be used in Bean definition, is not specified, DefaultJobParametersConverter is used.
However, in TERASOLUNA Batch 5.x, DefaultJobParametersConverter is not used due to the following reasons.

It is common to run one job by the same parameter at different timing.

It is possible to specify the time stamp of the start time and manage them as different jobs, but it is complicated to specify job parameters only for that purpose.

In TERASOLUNA Batch 5.x, by using JsrJobParametersConverter, RUN_ID is automatically assigned without the user knowledge.
By this, the same job is handled as a different job in Spring Batch as seen by the user.

About setting of parameter conversion class

In TERASOLUNA Batch 5.x, it is set in advance so as to use JsrJobParametersConverter in launch-context.xml.
Therefore, when TERASOLUNA Batch 5.x is used with the recommended setting, there is no need to set JobParametersConverter.

Parameters can be referred in Bean definition or in Java as shown below.

Refer in Bean definition

It can be referred by #{jobParameters['xxx']}

Refer in Java

It can be referred by @Value("#{jobParameters['xxx']}")

The scope of the Bean that refers to JobParameters should be Step scope

When referring to JobParameters, the scope of the Bean to be referred should be set to Step scope.
This is for using the mechanism of late binding of Spring Batch when JobParameters is to be referred.

As its name implies, late binding is setting of the delayed value.
ApplicationContext of Spring Framework generates an instance of ApplicationContext after resolving the properties of various Beans by default.
Spring Batch does not resolve the property at the time of generating an instance of ApplicationContext. It has a function
to resolve the property when various Beans are required. This is what the word Delay means.
With this function, after generating and executing ApplicationContext required for executing the Spring Batch itself,
it is possible to alter the behavior of various Beans according to parameters.

Step scope is a unique scope of Spring Batch and a new instance is generated for each Step execution.
The value with late binding can be resolved by using SpEL expression in Bean definition.

@StepScope annotation cannot be used for specifying Step scope

In Spring Batch, @StepScope is provided as the annotation that specifies Step scope. However,
this is an annotation that can only be used in JavaConfig.

Therefore, specify the Step scope in TERASOLUNA Batch 5.x by any one of the following methods.

In Bean definition, assign scope="step" to Bean.

In Java, assign @Scope("step") to class.

Example of referring to the parameter assigned by the command-line arguments in Bean definition

Specify the parameter to be referred by using @Value annotation.xyz is set as the default value.

4.2.2.5. Validation of parameters

Validation of the parameters is required at job launch in order to prevent operation errors or unintended behavior.
Validation of parameters can be implemented by using the JobParametersValidator provided by Spring Batch.

Since parameters are referred at various places such as ItemReader/ItemProcessor/ItemWriter,
validation is performed immediately after the job is launched.

There are two ways to verify the validity of a parameter, and it differs with the degree of complexity of the verification.

Set the required parameters to property requiredKeys.
Multiple parameter names of the required parameters can be specified using list tag.

(3)

Set jsr_batch_run_id to the required parameters.
In TERASOLUNA Batch 5.x, this setting is mandatory when using DefaultJobParametersValidator.
The reason for making the setting mandatory is explained later.

(4)

Set optional parameters to property optionalKeys.
Multiple parameter names of the optional parameters can be specified using list tag.

(5)

Apply the validator to the job using validator tag in the job tag.

Required parameters that cannot be omitted in TERASOLUNA Batch 5.x

JsrJobParametersConverter is used for parameter conversion in {batch 5 _ shortname}, so the following parameters are always set.

By the asynchronous start method (DB polling and Web container), it is possible to verify the parameters at the job launch in the same way,
however, it is desirable to verify them before launching the job at the following timing.

DB polling

Before INSERTing to job request table

Web container

At the time of calling Controller (assign @Validated)

In case of asynchronous start, since it is necessary to confirm the result separately, errors such as parameter settings
should be responded quickly and job requests should be rejected.

For validation in this case, there is no need to use JobParametersValidator.
The function to INSERT to the job request table and the controller in the Web container
mostly should not depend on Spring Batch and
it is better to avoid depending on Spring Batch since only JobParametersValidator is used.

4.2.3. How to extend

4.2.3.1. Using parameters and properties together

Spring Framework based on Spring Batch is equipped with the property management function to enable it
to handle the values set in the environment variables and property files.
For details, refer to
Property management of TERASOLUNA Server 5.x Development Guideline.

By combining properties and parameters, it is possible to overwrite some parameters after making common settings for most jobs in the property file.

About when parameters and propertis are resolved

As mentioned above, parameters and properties are different components that provide the function.
Spring Batch has a function of parameter management and Spring Framework has a function of property management.
This difference appears in the description method.

In case of function possessed by Spring Batch

#{jobParamaters[xxx]}

In case of function possessed by Spring Framework

@Value("${xxx}")

The timing of resolving each value is different.

In case of function possessed by Spring Batch

It is set when the job is executed after generating Application Context.

In case of function possessed by Spring Framework

It is set at the time of generating Application Context.

Therefore, the parameter value is given priority by Spring Batch.
Note that since the application is effective when they are combined together, both of them should be treated individually

How to set by combining properties and parameters, is explained.

In addition to the setting by environment variables, when additional settings is done by command-line arguments

In addition to the setting by environment variables, how to set the parameters using command-line arguments, is explained.
It is possible to refer to it in the same manner as Bean definition.

Example of setting parameters by command-line arguments in addition to environment variables

4.3. Asynchronous execution (DB polling)

4.3.1. Overview

Running a job using DB polling is explained.

The usage method of this function is same in the chunk model as well as tasklet model.

4.3.1.1. What is asynchronous execution by using DB polling?

A dedicated table which registers jobs to be executed asynchronously (hereafter referred to as Job-request-table) is monitored periodically and job is asynchronously executed based on the registered information.
In TERASOLUNA Batch 5.x, a module which monitors the table and starts the job is defined with the name asynchronous batch daemon.
Asynchronous batch daemon runs as a single Java process and executes by assigning threads in the process for each job.

A function which asynchronously executes the job based on information registered in the Job-request-table.
It also offers a table definition of Job-request-table.

Usage premise

Only job requests are managed in Job-request-table. Execution status and results of requested job is entrusted to JobRepository.
It is assumed that job status is managed through these two factors.

Further, if in-memory database is used in JobRepository, JobRepository is cleared after terminating asynchronous batch daemon and job execution status and results cannot be referred.
Hence, it is assumed that a database that is ensured to be persistent is used in JobRepository.

Using in-memory database

When job execution results success or failure can be obtained without referring JobRepository, in-memory database can be used.
When long term continuous operations are performed in in-memory database, a large quantity of memory resources are likely to get consumed resulting in adverse effect on job execution.
In other words, in-memory database is not suitable for long term continuous operations and should be restarted periodically.
However, if it is to be used for long term continuous operations, maintenance work like deleting data periodically from JobRepository is necessary.
In case of a restart, if initialization is enabled, it gets recreated at the time of restart. Hence, maintenance is not required.
For initialization, refer Database related settings.

4.3.1.1.2. Usage scene

A few scenes which use asynchronous execution (DB polling).

List of application scenes

Usage scene

Description

Delayed processing

When it is not necessary to complete the operation immediately in coordination with online processing and the operation which takes time to process is to be extracted as a job.

Continuous execution of jobs with short processing time

When continuous processing is done for a few seconds or a few tens of seconds for 1 job.
It is possible to avoid compression of resources by start and stop of Java process for 1 job, by using asynchronous execution (DB polling).
Further, since it leads to omission of start and end processing, it is possible to reduce execution time of the job.

Alternatively, since the access is concentrated in the database, scale is not likely to be like asynchronous execution (Web container).

Reasons not to use Spring Batch Integration

The same function can be implemented by using Spring Batch Integration.
However, when Spring Batch Integration is used, it is necessary to understand and fetch technical elements including the elements other than that of asynchronous execution.
Accordingly, application of Spring Batch Integration is deferred in order to avoid difficulty in understanding / use / customization of this function.

Precautions in asynchronous execution (DB polling)

When a large number of super short batches which are less than several seconds for 1 job are to be executed, database including JobRepository is accessed every time.
Since performance degradation can occur at this point of time, mass processing of super short batches is not suitable for asynchronous execution (DB polling).
This point must be adequately reviewed while using this function to check whether target performance is met.

4.3.2. Architecture

4.3.2.1. Processing sequence of DB polling

Processing sequence of DB polling is explained.

Processing sequence diagram of DB polling

Launch AsyncBatchDaemon from sh etc.

AsyncBatchDaemon reads all Bean definition files which defines the jobs at the startup.

JobRequestPollTask fetches a record for which the polling status is "not executed" (INIT), from Job-request-table.

Fetch a fixed number of records collectively. Default is 3 records.

When the target record does not exist, perform polling at regular intervals. Default is 5 seconds interval.

JobRequestPollTask allocates jobs to thread and executes them based on information of records.

JobRequestPollTask updates polling status of the Job-request-table to "polled" (POLLED).

When number of synchronous execution jobs is achieved, the record which cannot be activated from the fetched records is discarded and the record is fetched again at the time of next polling process.

Job assigned to the thread run a job with JobOperator.

Fetch job execution ID of executed jobs (Job execution id).

JobRequestPollTask updates the polling status of the Job-request-table to "Executed" (EXECUTED) based on job execution ID fetched at the time of job execution.

Supplement of processing sequence

Spring Batch reference shows that asynchronous execution can be implemented by setting AsyncTaskExecutor in JobLauncher.
However, when this method is adopted, AsyncTaskExecutor cannot detect the state wherein job execution cannot be performed.
This issue occurs when there is no thread assigned to the job and it is likely to lead to following events.

Even though the job cannot be executed, it tries to run the job and continues to perform unnecessary operation.

The job does not run in the polling sequence but appears to be started randomly on the Job-request-table depending on the time when the thread is free.

The processing sequence described earlier is used in order to avoid this phenomenon.

4.3.2.2. About the table to be polled

Explanation is given about table which performs polling in asynchronous execution (DB polling).

Following database objects are necessary.

Job-request-table (Required)

Job sequence (Required for some database products)

It is necessary when database does not support auto-numbering of columns.

4.3.2.2.1. Job-request-table structure

PostgreSQL from database products corresponding to TERASOLUNA Batch 5.x is shown.
For other databases, refer DDL included in jar of TERASOLUNA Batch 5.x.

Regarding character string stored in the job request table

Similar to meta data table, job request table column provides a DDL which explicitly sets character data type in character count definition.

batch_job_request (In case of PostgreSQL)

Column Name

Data type

Constraint

Description

job_seq_id

bigserial

(Use bigint to define a separate sequence)

NOT NULL
PRIMARY KEY

A number to determine the sequence of jobs to be executed at the time of polling.
Use auto-numbering function of database.

job_name

varchar(100)

NOT NULL

Job name to be executed.
Required parameters for job execution.

job_parameter

varchar(200)

-

Parameters to be passed to jobs to be executed.

Single parameter format is same as synchronous execution, however, when multiple parameters are to be specified,
each parameter must be separated by a comma (see below) unlike blank delimiters of synchronous execution.

4.3.2.2.2. Job request sequence structure

When the database does not support auto-numbering of database columns, numbering according to sequence is required.
A PostgreSQL from database products corresponding to TERASOLUNA Batch 5.x is shown.
For other databases, refer DDL included in jar of TERASOLUNA Batch 5.x.

A job request sequence is not defined in DDL included in jar of TERASOLUNA Batch 5.x, for databases supporting auto-numbering of columns.
When you want to change maximum value in the sequence, it is preferable to define the job request sequence besides changing data type
of job_seq_id from auto-numbering definition to numeric data type
(In case of PostgreSQL, from bigserial to bigint).

4.3.2.2.3. Transition pattern of polling status (polling_status)

Transition pattern of polling status is shown in the table below.

Transition pattern list of polling status

Transition source

Transition destination

Description

INIT

INIT

When the number of synchronous executions has been achieved and execution of job is denied, status remains unchanged.
It acts as a record for polling at the time of next polling.

INIT

POLLED

Transition is done when the job is successfully started.
Status when the job is running.

POLLED

EXECUTED

Transition occurs when job execution is completed.

4.3.2.2.4. Job request fetch SQL

Number to be fetched by job request fetch SQL is restricted in order to fetch job request for number of synchronously executed jobs.
Job request fetch SQL varies depending on the database product and version to be used.
Hence, it may not be possible to handle with SQL provided by TERASOLUNA Batch 5.x.
In that case, SQLMap of BatchJobRequestMapper.xml should be redefined
using Customising Job-request-table as a reference.
For SQL offered, refer BatchJobRequestMapper.xml included in jar of TERASOLUNA Batch 5.x.

4.3.2.3. About job running

Running method of job is explained.

Job is run by start method of JobOperator offered by Spring Batch in Job-request-table polling function of
TERASOLUNA Batch 5.x.

With TERASOLUNA Batch 5.x, guidelines explain the restart of jobs started by asynchronous execution (DB polling) from the command line.
Hence, JobOperator also contains startup methods like restart etc besides start, however,
only start method is used.

Arguments of start method

jobName

Set the value registered in job_name of Job-request-table.

jobParametrers

Set the value registered in job_parameters of Job-request-table.

4.3.2.4. When abnormality is detected in DB polling process.

Explanation is given for when an abnormality is detected in DB polling process.

4.3.2.4.1. Database connection failure

Describe behaviour for the processing performed at the time of failure occurrence.

When records of Job-request-table are fetched

JobRequestPollTask results in an error, however, JobRequestPollTask is executed again in next polling.

In the polling process performed after connection failure recovery, the job becomes a target for execution as there is no change in the Job-request-table and the job is executed at the next polling.

While changing polling status from POLLED to EXECUTED

JobRequestPollTask terminates with an error since the job execution ID cannot be updated in the Job-request-table. Polling status remains unchanged as POLLED.

It is out of the scope for the polling process to be performed after connection failure recovery and the job at the time of failure is not executed.

Since a job execution ID cannot be identified from a Job-request-table, final status of the job is determined from log or JobRepository and re-execute the job as a process of recovery when required.

Even if an exception occurs in JobRequestPollTask, it is not restored immediately. Reason is given below.

Since JobRequestPollTask is started at regular intervals, auto-restoration is possible (not immediate) by delegating the operation to JobRequestPollTask.

It is very rare to be able to recover after retrying immediately at the time of failure occurrence, in addition, it is likely to generate load due to attempt of retry.

4.3.2.4.2. Abnormal termination of asynchronous batch daemon process

When a process of asynchronous batch daemon terminates abnormally, transaction of the job being executed is rolled back implicitly.
State of the polling status is same as status at the time of database connection failure.

4.3.2.5. Stopping DB polling process

Asynchronous batch daemon (AsyncBatchDaemon) stops by generation of a file.
After confirming that the file has been generated, make the polling process idle, wait as long as possible to job being started and then stop the process.

A Bean necessary for asynchronous execution like JobRequestPollTask etc. is defined.

Job registration settings

Job executed as an asynchronous execution registers by org.springframework.batch.core.configuration.support.AutomaticJobRegistrar.
Context for each job is modularized by using AutomaticJobRegistrar.
When modularization is done, it does not pose an issue even of Bean ID used between the jobs is duplicated.

What is modularization

Modularization is a hierarchical structure of "Common definition - Definition of each job" and the Bean defined in each job belongs to an independent context between jobs.
If a reference to a Bean which is not defined in each job definition exists, it refers to a Bean defined in common definition.

4.3.2.6.2. Bean definition structure

Bean definition of a job can be same as Bean definition of synchronous execution. However, following precautions must be taken.

When job is to be registered by AutomaticJobRegistrar, Bean ID of the job is an identifier, and hence should not be duplicated.

It is also desirable to not to duplicate Bean ID of step.

Only the job ID should be uniquely designed by designing naming rules of Bean ID as {Job ID}.{Step ID}.

Import of job-base-context.xml in the Bean definition of job varies for synchronous and asynchronous execution.

In synchronous execution, launch-context.xml is imported from job-base-context.xml.

In asynchronous execution, launch-context.xml is not imported from job-base-context.xml.
Alternatively, import launch-context.xml from async-batch-daemon.xml which AsyncBatchDaemon loads.

This is because various Beans required for starting Spring Batch need not be instantiated for each job.
Only one bean should be created in common definition (async-batch-daemon.xml) which acts as a parent for each job, from various Beans required for starting Spring Batch.

4.3.3. How to use

4.3.3.1. Various settings

4.3.3.1.1. Settings for polling process

Use batch-application.properties for settings required for asynchronous execution.

Connection settings for database wherein Job-request-table is stored.JobRepository settings are used by default.

(2)

A path for DDL which defines Job-request-table.
It is auto-generated when Job-request-table does not exist at the time of starting asynchronous batch daemon.
This is primarily a test function and execution can be set by data-source.initialize.enabled of+
batch-application.properties.
For detailed definition, refer <jdbc:initialize-database>> in async-batch-daemon.xml.

(3)

Setting for records which are fetched collectively at the time of polling. This setup value is also used as a synchronous parallel number.

(4)

Polling cycle settings. Unit is milliseconds.

(5)

Polling initial start delay time settings. Unit is milliseconds.

(6)

Exit file path settings.

Changing setup value using environment variable

Setup value of batch-application.properties can be changed by defining environment variable with same name.
When an environment variable is set, it is prioritized over property value.
This happens due to Bean definition below.

For registering jobs, jobs which are designed and implemented on the premise that they are executed asynchronously should be specified.
If the jobs which are not supposed to be executed asynchronously are included, exceptions may occur due to unintended references at the time of job registration.

Example of Narrowing down

<beanid="automaticJobRegistrar"class="org.springframework.batch.core.configuration.support.AutomaticJobRegistrar"><propertyname="applicationContextFactories"><beanclass="org.springframework.batch.core.configuration.support.ClasspathXmlApplicationContextsFactoryBean"><propertyname="resources"><list><!-- For the async directory and below --><value>classpath:/META-INF/jobs/aysnc/**/*.xml</value><!-- For a specific job --><value>classpath:/META-INF/jobs/CASE100/SpecialJob.xml</value></list></property></bean></property><propertyname="jobLoader"><beanclass="org.springframework.batch.core.configuration.support.DefaultJobLoader"p:jobRegistry-ref="jobRegistry"/></property></bean>

Input value verification for job parameters

JobPollingTask does not validate the records obtained from Job-request-table.
Hence, the job name and job parameter must be verified for the table registration.
If the job name is incorrect, job is not detected even if it has started and an exception occurs.
If the job parameter is incorrect, an erroneous operation is performed even if the job has started.
Only job parameters can be verified once the job is started. For verification of job parameters, refer
"Validity verification of parameters".

Job design considerations

As a characteristic of asynchronous execution (DB polling), the same job can be executed in parallel. It is necessary to prevent the same job to create an impact when the jobs are run in parallel.

4.3.3.2. From start to end of asynchronous execution

Start and end of asynchronous batch daemon and how to register in Job-request-table are explained.

4.3.3.2.1. Start of asynchronous batch daemon

In this case, META-INF/spring/async-batch-daemon.xml is read and various Beans are generated.

Further, when async-batch-daemon.xml customised separately, it is implemented by specifying first argument and starting AsyncBatchDaemon.
Bean definition file specified in the argument must be specified as a relative path from the class path.
Note that, the second and subsequent arguments are ignored.

When customised META-INF/spring/customized-async-batch-daemon.xml is used,

Customisation of async-batch-daemon.xml can be modified directly by changing some of the settings.
However, when significant changes are added or when multiple settings are managed in Multiple runningsdescribed later,
it is easier to manage and create separate files.
It should be choosed according to user’s situation..

It is assumed that jar expressions necessary for execution are stored under dependency.

4.3.3.2.3. Stopping asynchronous batch daemon

When the exit file exists prior to starting asynchronous batch daemon, asynchronous batch daemon terminates immediately.
Asynchronous batch daemon must be started in the absence of exit file.

4.3.3.3. Confirm job status

Job status management is performed with JobRepository offered by Spring Batch and the job status is not managed in the Job-request-table.
Job-request-table has a column of job_execution_id and job status corresponding to individual requests can be confirmed by the value stored in this column.
Here, a simple example wherein SQL is issued directly and job status is confirmed is shown.
For details of job status confirmation, refer "Status confirmation".

4.3.3.4. Recovery after a job is terminated abnormally

For basic points related to recovery of a job which is terminated abnormally, refer "Re-execution of process". Here, points specific to asynchronous execution are explained.

4.3.3.4.1. Re-run

Job which is terminated abnormally is re-run by inserting it as a separate record in Job-request-table.

4.3.3.4.2. Restart

When the job which is terminated abnormally is to be restarted, it is executed as a synchronous execution job from the command line.
The reason for executing from the command line is "since it is difficult to determine whether the restart is intended or whether it is an unintended duplicate execution resulting in chaotic operation."
For restart methods, refer "Job restart".

4.3.3.4.3. Termination

When the process has not terminated even after exceeding the expected processing time, attempt terminating the operation from the command line.
For methods of termination, refer "Job stop".

If even an asynchronous batch daemon cannot be terminated, process of asynchronous batch daemon should be forcibly terminated.

Adequate care should be taken not to impact other jobs when an asynchronous batch daemon is being terminated.

4.3.3.5. About environment deployment

Building and deploying job is same as a synchronous execution. However, it is important to narrow down the jobs which are executed asynchronously as shown in Job settings.

4.3.3.6. Evacuation of cumulative data

If you run an asynchronous batch daemon for a long time, a huge amount of data is accumulated in JobRepository and the Job-request-table.
It is necessary to clear this cumulative data for the following reasons.

Performance degradation when data is retrieved or updated for a large quantity of data

Duplication of ID due to circulation of ID numbering sequence

For evacuation of table data and resetting a sequence, refer manual for the database to be used.

List of tables and sequences for evacuation is shown below.

List for evacuation

Table/Sequence

Framework offered

batch_job_request

TERASOLUNA Batch 5.x

batch_job_request_seq

batch_job_instance

Spring Batch

batch_job_exeution

batch_job_exeution_params

batch_job_exeution_context

batch_step_exeution

batch_step_exeution_context

batch_job_seq

batch_job_execution_seq

batch_step_execution_seq

Auto-numbering column sequence

Since a sequence is created automatically for an auto-numbering column, remember to include this sequence while evacuating data.

About database specific specifications

Note that Oracle uses database-specific data types in some cases, such as using CLOB for data types.

4.3.4. How to extend

4.3.4.1. Customising Job-request-table

Job-request-table can be customised by adding a column in order to change extraction conditions of fetched records.
However, only BatchJobRequest can be passed as an item while issuing SQL from JobRequestPollTask.

Extension procedure by customising the Job-request-table is shown below.

Since JobRequestPollTask performs exclusive control using optimistic locking, it can execute the job of the record fetched by asynchronous batch daemon which can update the polling status from INIT to POLLED.
Other exclusive asynchronous batch daemons fetch next job request record.

In Job1, ItemReader which reads from the database is defined by a Bean ID - reader.

(2)

In Job1, ItemWriter which writes in a file is defined by a Bean ID - writer.

(3)

In Job2, ItemReader which reads from the file is defined by a Bean ID - reader.

(4)

In Job2, ItemWriter which writes to a database is defined by a Bean ID - writer.

(5)

AutomaticJobRegistrar is set so as to read job definitions other than target jobs.

(6)

Use import of Spring and enable reading of target job definition.

In this case, if Job1.xml and Job2.xml are read in the sequence, reader and writer to be defined by Job1.xml will be overwritten by Job2.xml definition.
As a result, when Job1 is executed, reader and writer of Job2 are used and intended processing cannot be performed.

4.4. Asynchronous execution (Web container)

4.4.1. Overview

A method to execute the job asynchronously in Web container is explained.

The usage method of this function is same in the chunk model as well as tasklet model.

What is Asynchronous execution of jobs by Web container

Web application that contains a job is deployed in a Web container
and the job is executed based on information of sent request.
Since one thread is allocated for each job execution and operation is run in parallel,
it can be executed independent of processes for other jobs and requests.

Function offered

TERASOLUNA Batch 5.x does not offer implementation for asynchronous execution (Web container).
Only methods of implementation will be provided in this guideline.
This is because the start timing of the Web application is various such as HTTP / SOAP / MQ,
and hence it is determined that the implementation should be appropriately done by the user.

Usage premise

A Web container is required besides the application.

Besides implementation of job, required Web application and client are separately implemented according to the operation requirements.

Execution status and results of the job is entrusted to JobRepository.
Further, a permanently residing database is used instead of in-memory database to enable execution status and results
of job to be referred from JobRepository even after stopping Web container.

On the architecture front, immediacy at the time of asynchronous execution and presence or absence of request management table are different."Asynchronous execution (DB polling)" performs asynchronous execution of multiple jobs registered in the request management table.
On the other hand, this function does not require request management table and accepts asynchronous execution on the Web container instead.
It is suitable for a short batch which requires immediacy till the start of the operation in order to execute the operation immediately by sending a Web request.

4.4.2. Architecture

Asynchronous jobs by using this method are operated as applications (war) deployed on the Web container, however, the job itself runs asynchronously (another thread) from the request processing of Web container.

Process sequence diagram of asynchronous execution (Web container)

Running a job

Web client requests Web container to execute the job.

JobController asks JobOperator of Spring Batch to start the execution of the job.

JobController returns a response including job execution ID for the Web client.

Execute target job.

Job results are reflected in JobRepository.

Job returns execution results. It cannot be notified directly to the client.

Confirm job execution results

Web client sends job execution ID and JobController to Web container.

JobController asks JobExplorer for execution results of job by using a job execution ID.

JobExplorer returns job execution results.

JobController returns a response for Web client.

Set Job execution ID in the response.

After receiving a request using Web container, operation is synchronised with the request processing till job execution ID payout, however subsequent job execution is performed asynchronously in a thread pool
different from that of Web container.
As long as the query is not sent again by sending a request, it signifies that execution status of asynchronous job cannot be detected on Web client side.

Hence, the request should be sent once at the time of "running a job" on the Web client side during one job execution.
When "confirmation of results" is necessary, request must be sent once again to the Web container.
Abnormality detection which looks different from first "running a job" will be explained later in
About detection of abnormality occurrence at the time of running a job.

Job execution status can be checked by referring direct RDBMS, by using JobRepository and JobExplorer.
For details of the function which refer to job execution status and results, refer Job management.

About handling job execution ID (job execution id)

Job execution ID generates a different sequence value for each job even though job and job parameters are identical.
Job execution ID accepted by sending a request is persisted in external RDBMS by JobRepository.
However, when this ID is lost due to failure of Web client, specifying or tracking job execution status becomes difficult.
Hence, adequate preparations must be made on Web client side to cope with loss of job execution ID like logging the job execution ID returned as a response.

4.4.2.1. About detection of abnormality occurrence at the time of running a job

After sending a job run request from Web client, abnormality detection appearance varies along with job execution ID payout.

Abnormality can be detected immediately by the response at the time of running a job

Job to be activated does not exist.

Invalid job parameter format.

After running a job, queries regarding job execution status and results for Web container are necessary

Job execution status

Job start failure due to depletion of thread pool used in asynchronous job execution

"Job running error" can be detected as an exception occurring in Spring MVC controller.
Since the explanation is omitted here, refer
Implementation of exception handling of TERASOLUNA Server 5.x Development Guideline described separately.

Further, input check of the request used as a job parameter is performed in the Spring MVC controller as required.
For basic implementation methods, refer
Input check of TERASOLUNA Server 5.x Development Guideline.

Job start failure occurring due to depletion of thread pool cannot be captured at the time of running a job.

Job start failure due to depletion of thread pool is not generated from JobOperator, hence it must be checked separately.
One of the methods of confirmation include using JobExplorer while checking execution status of job and checking whether the following conditions are satisfied.

Status is FAILED

Exception stack trace of org.springframework.core.task.TaskRejectedException is recorded in
jobExecution.getExitStatus().getExitDescription().

4.4.2.2.1. ApplicationContext configuration

As described above, multiple application modules are included as application configuration of asynchronous execution (Web container).
It is necessary to understand respective application contexts, types of Bean definitions and their relationships.

ApplicationContext configuration

Bean definition file configuration

ApplicationContext of batch application is incorporated in the context, in ApplicationContext during asynchronous execution (Web container).
Individual job contexts are modularised from Web context using AutomaticJobRegistrar and
it acts as a sub-context of Web context.

Common Bean definition file.
It acts as a parent context in the application and is uniquely shared among jobs acting as sub-contexts.

(2)

Bean definition file which is always imported from job Bean definitions.
If Spring profile is async specified at the time of asynchronous execution, launch-context.xml of (1) is not read.

(3)

Bean definition file created for each job.
It is modularized by AutomaticJobRegistrar and are used as respective independent sub-contexts in the application.

(4)

It is read from DispatcherServlet.
Define the Beans unique to asynchronous execution such as AutomaticJobRegistrar which performs modularization of job Bean definition and taskExecutor which is a thread pool used in asynchronous and parallel execution of jobs.
Further, in asynchronous execution, launch-context.xml of (1) is imported directly
and uniquely shared as parent contexts.

(5)

It acts as a parent context shared within the Web application by using ContextLoaderListener.

4.4.3. How to use

Here, explanation is given using TERASOLUNA Server Framework for Java (5.x), as an implementation example of Web application.
Kindly remember that only explanation is offered and TERASOLUNA Server 5.x is not a necessary requirement of asynchronous execution (Web container).

Implementation of asynchronous execution is performed in accordance with Architecture wherein Spring
MVC controller in the Web application starts the job by using JobOperator.

About isolation of Web/batch application project

Final deliverable of application build is a war file of Web application, however,
a development project should be implemented by separating Web/batch applications.
Since it is a library which can be operated by a batch application alone, it helps in identifying work boundary
and library dependency besides making the development project testing easier to implement.

Web/batch development is explained now assuming the use of 2 components below.

Here, we will focus on starting a batch application from a Web application.

Here, explanation is given by creating a batch application project,
by using Maven archetype:generate.

How to create a job project

Name

Value

groupId

org.terasoluna.batch.sample

artifactId

asyncbatch

version

1.0-SNAPSHOT

package

org.terasoluna.batch.sample

A job registered from the beginning for a blank project is used for convenience of explanation.

Job used for explanation

Name

Description

Job name

job01

Job parameter

param1=value1

Precautions for asynchronous execution (Web container) job design

Individual jobs are completed in a short period of time as a characteristic of asynchronous execution (Web container)
and are operated in a stateless manner on the Web container.
Further, it is necessary to build a job definition with only a single step to avoid complexity and it is desirable not
to define flow branching by using exit codes of step and parallel/multiple processing.

Create a Web application as a state wherein a jar file including a job implementation can be created.

Here, similar to asynchronous execution application project, explanation is given below creating with the following names.

How to create a Web container project

Name

Value

groupId

org.terasoluna.batch.sample

artifactId

asyncapp

version

1.0-SNAPSHOT

package

org.terasoluna.batch.sample

About naming of groupId

Although naming a project is optional, when a batch application as a Maven multiproject is considered as a sub-module,
it is easy to manage if groupId is integrated.
Here, groupId of both is considered as org.terasoluna.batch.sample`.

4.4.3.2. Various settings

Include batch application as a part of Web application

Edit pom.xml and include batch application as a part of Web application.

Batch application is registered in NEXUS or Maven local repository as jar
This process is not required while setting a separate project from that of Web application.
However, target to be built by Maven is a separate project and it will not be reflected while building the web application even if the batch application is modified.
It should be registered in the same repository in order to reflect the modification of batch application in the Web application.

4.4.3.3.1. Web application settings

At first, add, delete and edit various configuration files from the blank project of Web application.

For the explanation, an implementation which use RESTful Web Service as an implementation status of batch application is given.
Procedure will be same even when conventional Web application (Servlet/JSP) or SOAP is used. Read accordingly.

Import launch-context.xml which is in the batch application and incorporate required Bean definition.

(2)

Describe package for dynamically scanning the controller.

(3)

Describe a Bean definition of AutomaticJobRegistrar which dynamically loads as a child or sub context by modularizing each Bean definition file.

(4)

Define TaskExecutor which executes the job asynchronously.
Asynchronous execution can be performed by setting AsyncTaskExecutor implementation class in TaskExecutor of JobLauncher.
Use ThreadPoolTaskExecutor which is one of the components of AsyncTaskExecutor implementation class.

Further, multiplicity of threads which can be operated in parallel can be specified.
In this example, 3 threads are assigned to the job execution and requests exceeding this number are queued upto 10.
Queued job is in "not started" state, however REST request is considered to be successful.+
In addition, job requests that exceed the queuing limit generate `org.springframework.core.task.TaskRejectedException
and job run request is rejected.

(5)

Override jobLauncher defined in launch-context.xml to enable taskExecutor of (4).

(6)

Specify spring-mvc-rest.xml described above as a Bean definition
read by DispatcherServlet.

(7)

Specify async which shows an asynchronous batch, as a profile of Spring Framework.

When async profile is not specified

In this case, a Bean defined in launch-context.xml which should be shared across Web applications is duplicated for each job.
Even in case of duplication, since the operation takes place at the functional level, it is difficult to notice an error and it may result in unexpected resource exhaustion and performance degradation.
Must be specified.

Thread pool sizing

When the upper limit of thread pool is in excess, an enormous amount of jobs run in parallel resulting
in deterioration of entire thread pool.
Sizing should be done and appropriate upper value must be determined.
Besides thread pool of asynchronous execution, request thread of Web container and
other applications working in the same enclosure must also be considered.

Further, a separate request must be sent from Web client for checking occurrence of TaskRejectException due to
thread pool exhaustion and its re-execution.
Hence, queue-capacity which waits for job to start must be set at the time of thread pool exhaustion.

Implementation of RESTful Web Service API

Here, "Running a job" and "Job status check" are defined as 2 examples of requests used in REST API.

4.4.3.3.3. Implementation of controller

A controller of RESTful Web Service is implemented by using @RestController.
in order to simplify, JobOperator is injected in the controller and running a job and execution status are fetched.
Of course, JobOperator can also be started by using Service from the controller in accordance with TERASOLUNA Server 5.x.

About job parameters that are passed at the time of running a job

The job parameter passed in the second argument of JobOperator#start() at running a job is String.
When there are multiple job parameters, they should be separated by using a comma unlike CommandLineJobRunner of synchronous execution.
Basically the format is as below.{Job parameter 1}={Value 1},{Job parameter 2}={Value 2},…​

Specify @RestController.
Further, when servlet mapping of web.xml is done by using @RequestMapping("job"),
base path of REST API is contextName/api/v1/job/.

(2)

Describe field injections of JobOperator and JobExplorer.

(3)

Use JobOperator and start a new asynchronous job.
Receive job execution ID as a return value and return to REST client.

(4)

Use JobExplorer and fetch job execution status (JobExecution) based on job execution ID.
Return it to REST client after converting it to a pre-designed message.

4.4.3.3.4. Integration of Web/batch application module setting

Batch application module (asyncbatch) operates as a stand-alone application.
Hence, batch application module (asyncbatch) consists of settings which are in conflict and overlapping with settings of Web application module (asyncapp-web).
These settings must be integrated as required.

Integration of log configuration file logback.xml
When multiple Logback definition files are defined in Web/batch, they do not work appropriately.
The contents of asyncbatch/src/main/resources/logback.xml are integrated into same file of asyncapp-env/src/main/resources/ and then the file is deleted.

Data source and MyBatis configuration file are not integrated
Definitions of data source and MyBatis configuration file are not integrated between Web/batch since the definition of application context is independent due to following relation.

asyncbatch module of the batch is defined in the servlet as a closed context.

asyncapp-domain and asyncapp-env modules of Web are defined as contexts used by entire application.

Cross-reference of data source and MyBatis settings by Web and batch modules

Since the scope of context for Web and batch modules is different,
data source, MyBatis settings and Mapper interface cannot be referred especially from Web module.
Since initialization of RDBMS schema is also carried out independently based on the different settings of respective
modules, adequate care must be taken not to perform unintended initialization due to mutual interference.

CSRF countermeasures specific to REST controller

When a request is sent for REST controller in the initialization settings of Web blank project, it results in a CSRF
error and execution of job is rejected.
Hence, explanation is given here assuming that CSRF countermeasures are disabled by the following method.

Web application created here is not published on the internet and CSRF countermeasures are disabled on the premise that
REST request is not sent from a third party who can exploit CSRF as a means of attack.
Please note that necessity may differ in the actual Web application depending on the operating environment.

Since exitCode=COMPLETED, it can be confirmed that the job is completed successfully.

When execution results of curl are to be determined by a shell script etc

In the example above, it is displayed upto the response message using REST API.
When only HTTP status is to be confirmed by curl command, HTTP status can be displayed in standard output by considering curl -s URL -o /dev/null -w "%{http_code}\n".
However, since job execution ID need to analyse JSON of response body part, REST
client application must be created as required.

4.4.4. How to extend

4.4.4.1. Stopping and restarting jobs

It is necessary to stop and restart asynchronous jobs from the multiple jobs that are being executed.
Further, when jobs of identical names are running in parallel, it is necessary to target only those jobs with the issues.
Hence, job execution to be targeted must be identified and the status of the job must be confirmed.
When this premise is met, an implementation for stopping and restarting asynchronous executions is explained here.

Re-execute from the step where the job has terminated abnormally or stopped by calling JobOperator#restart().

4.4.4.2. Multiple running

Multiple running signify that a Web container is started for multiple times and waits for respective job requests.

Execution of asynchronous jobs is controlled by external RDBMS so as to connect to each application.
By sharing an external RDBMS, it is possible to wait for an asynchronous job to be started across the same enclosure or another enclosure.

Applications include load balancing and redundancy for specific jobs.
However, as described in Implementation of Web application,
these effects cannot be obtained easily just by starting multiple Web containers or enhancing parallel operations.
Sometimes measures similar to a general Web application need to be taken in order to obtain the effect.
An example is given below.

1 request processing operates in a stateless manner according to the characteristics of Web application, however,
asynchronous execution of batch is likely to have a reduced failure tolerance unless it is designed in combination with job start results and confirmation.
For example, even when Web container for starting a job is made redundant, it is difficult to confirm the progress
and results of the job when the job execution ID is lost after starting a job due to failure on the client side.

A function to distribute request destinations on the client side must be implemented and a load balancer must be
introduced in order to distribute the load on multiple Web containers.

In this way, adequacy of multiple starts cannot be necessarily determined.
Hence, using load balancer and reviewing a control method to send requests by Web client should be considered based on the purpose and use.
A design which does not degrade the performance and fault tolerance of the asynchronous execution application is required.

4.5. Listener

4.5.1. Overview

A listener is an interface for inserting processing before and after executing a job or a step.

Since this function works differently for chunk model and tasket model, respective explanations are given.

A listener consists of multiple interfaces, respective roles are explained here.
Subsequently, how to set and implement a listener is explained.

4.5.1.1. Types of listener

A lot of listener interfaces are defined in Spring Batch.
All will not be explained here, however we will focus on the interface with highest usage frequency.

A listener is roughly divided into 2 types.

JobListener

An interface to insert the processing for execution of the job

StepListener

An interface to insert the processing for execution of the step

About JobListener

An interface called JobListener does not exist in Spring Batch.
It is conveniently described in this guideline for the comparison with StepListener.
Java Batch(jBatch) consists of an interface called javax.batch.api.listener.JobListener, hence care should be taken at the time of implementation to avoid mistakes.
Further, StepListener also consists of interface with same name but different signature (javax.batch.api.listener.StepListener), so it is necessary to take adequate precautions.

4.5.1.1.1. JobListener

JobListener interface consists of only one JobExecutionListener.

JobExecutionListener

Process is inserted prior to starting a job and after terminating a job.

It is used when the exception occurred is to be fetched by afterChunkError method.
If an error occurs during chunk process, Spring Batch uses sb_rollback_exception key in ChunkContext to call
ChunkListener after storing the exception which can be accessed as below.

These listeners are intended to be used for exception handling, however, the policy of these
guidelines is not to perform exception handling using these listeners.
For details, refer Exception handling.

4.5.2. How to use

Explanation is given about how to implement and set a listener.

4.5.2.1. Implementation of a listener

Explanation is given about how to implement and set a listener.

Implement the listener interface with implements.

Implement components with method-based annotation.

The type of implementation to use will be choosed on the role of the listener. Criteria will be described later.

4.5.2.1.1. When an interface is to be implemented

Various listener interfaces are implemented by using implements. Multiple interfaces can be implemented at the same time based on requirement.
Implementation example is shown below.

Implement afterJob method defined by JobExecutionListener.
In this example, job end log is output.

(4)

Set the listener implemented in (1), in <listeners> tag of Bean definition.
Details of setup method are explained in Listener settings.

Listener support class

When multiple listener interfaces are set to implements, blank implementation is required to be done for the components which are not necessary for the process.
Support classes wherein blank implementation is performed are provided in Spring Batch in order to simplify this operation.
Please note that support classes may be used instead of interfaces, and extends is used instead of implements.

Support class

org.springframework.batch.core.listener.ItemListenerSupport

org.springframework.batch.core.listener.StepListenerSupport

4.5.2.1.2. When annotations are assigned

Annotations corresponding to various listener interfaces are assigned. Multiple annotations can also be implemented as required.

When the annotation is to be used for implementation, only the annotations of the timing required for the processing should be assigned.
In this example, since no operation is required prior to processing of ItemProcess, the implementation wherein @beforeProcess is assigned, becomes unnecessary.

(2)

Implement the process to be performed after the processing of ItemProcess.
In this example, process results are output in a log.

(3)

Implement processing when an error occurs in ItemProcess.
Exception generated in this example is output in a log.

(4)

Set ItemProcess wherein the listener is implemented by using annotation in <chunk> tag.
Unlike listener interface, the listener is automatically registered even when it is not set in <listener> tag.

Constraints for the method which assigns the annotations

Any method cannot be used as a method to assign the annotation.
The signature must match with the method of corresponding listener interface.
This point is clearly mentioned in javadoc of respective annotations.

Precautions while implementing JobExecutionListener by an annotation

Since JobExecutionListener has a different scope than the other listeners, listener is not automatically registered in the configuration above.
Hence, it is necessary to explicitly set in the <listener> tag. For details ,refer Listener settings.

Implementation of a listener to Tasklet implementation by using annotation

When a listener is implemented in Tasklet implementation by using an annotation, Note that listener does not start with the following settings.

4.5.2.2. Listener settings

Listeners are set by <listeners>.<listener> tag of Bean definition.
Although it can be described at various locations by XML schema definition, some operations do not work as intended based on the type of interface.
Set it to the following position.

Since this function is different in usage between chunk model and tasklet model, each will be explained.

5.1.1.1. About the pattern of transaction control in general batch processing

Generally, since batch processing is processing a large number of cases, if any errors are thrown at the end of the processing and all processing need to be done again,
the batch system schedule will be adversely affected.
In order to avoid this, the influence at the time of error occurrence is often localized by advancing the process
while confirming the transaction for each fixed number of data within the processing of one job.
(Hereafter, we call the "intermediate commit method" as the method of commiting the transaction for every fixed number of data,
and the "chunk" as the one grouping the data in the commit unit.)

The points of the intermediate commit method are summarized below.

Localize the effects at the time of error occurrence.

Even if an error occurs, the processing till the chunk just before the error part is confirmed.

Only use a certain amount of resources.

Regardless of whether the data to be processed is large or small, only resources for chunks are used, so they are stable.

However, the intermediate commit method is not a valid method in every situation.
Processed data and unprocessed data are mixed in the system even though it is temporary.
As a result, since it is necessary to identify unprocessed data at the time of recovery processing, there is a possibility that the recovery becomes complicated.
In order to avoid this, all of the cases must be confirmed with one transaction, and not use the intermediate commit method.
(Hereinafter, the method of determining all transactions in one transaction is called "single commit method".)

Nevertheless, if you process a large number of such as tens of thousands of items in a single commit method,
you will get a heavy load trying to reflect all the databases when committing.
Therefore, although the single commit method is suitable for small-scale batch processing, care must be taken when adopting it in a large-scale batch.
So this method is not a versatile method too.

In other words, there is a trade-off between "localization of impact" and "ease of recovery".
Which one of "intermediate commit method" and "single commit method" is used depends on the nature of the job and decides which one should be prioritized.
Of course, it is not necessary to implement all the jobs in the batch system on either side.
It is natural to basically use "intermediate commit method" and use "single commit method" for special jobs (or the other way).

Below is the summary of advantages, disadvantages and adoption points of "intermediate commit method" and "single commit method".

Features list by method

Commit method

Advantage

Disadvantage

Adoption point

intermediate commit method

Localize the effect at the time of error occurrence

Recovery processing may be complicated

When you want to process large amounts of data with certain machine resources

single commit method

Ensure data integrity

There is a possibility of high work-load when processing a large number of cases

When you want to set the processing result for the persistent resource to All or Nothing
Suitable for small batch processing

Notes on inputting and outputting to the same table in the database

In terms of the structure of the database,
care is required when handling large amounts of data in processing to input and output to the same table regardless of the commit method.

As information that guarantees reading consistency is lost due to output (issuance of UPDATE),
errors may occur at the input (SELECT).

In order to avoid this, the following measures are taken.

Increase the area to secure information.

When expanding, please consider it carefully in resource design.

Since the extension method depends on the database to be used, refer to the manual.

5.1.2.1.1. Transaction control mechanism in chunk model

Transaction control in the chunk model is only the intermediate commit method.
A single commit method can not be done.

The single commit method in the chunk model is reported in JIRA.https://jira.spring.io/browse/BATCH-647
As a result, it is solved by customizing chunk completion policy and dynamically changing the chunk size.
However, with this method, since all data is stored in one chunk and memory is compressed, it can not be adopted as a method.

A feature of this method is that transactions are repeatedly performed for each chunk.

Transaction control in normal process

Transaction control in normal process will be explained.

Sequence diagram of normal process

Description of the Sequence Diagram

Steps are executed from the job.

The subsequent processing is repeated until there is no input data.

Start a framework transaction on a per chunk basis.

Repeat steps 2 to 5 until the chunk size is reached.

The step obtains input data from ItemReader.

ItemReader returns the input data to the step.

In the step, ItemProcessor processes input data.

ItemProcessor returns the processing result to the step.

The step outputs data for chunk size with ItemWriter.

ItemWriter will output to the target resource.

The step commits the framework transaction.

Transaction control in abnormal process

Transaction control in abnormal process will be explained.

Sequence diagram of abnormal process

Description of the Sequence Diagram

Steps are executed from the job.

The subsequent processing is repeated until there is no input data.

Start a framework transaction on a per chunk basis.

Repeat steps 2 to 5 until the chunk size is reached.

The step obtains input data from ItemReader.

ItemReader returns the input data to the step.

In the step, ItemProcessor processes input data.

ItemProcessor returns the processing result to the step.

The step outputs data for chunk size with ItemWriter.

ItemWriter will output to the target resource.

If any exception occurs between the process from 2 to 7,

The step rolls back the framework transaction.

5.1.2.1.2. Mechanism of transaction control in tasklet model

For transaction control in the tasklet model,
either the single commit method or the intermediate commit method can be used.

single commit method

Use the transaction control mechanism of Spring Batch

Intermediate commit method

Manipulate the transaction directly with the user

single commit method in tasklet model

Explain the mechanism of transaction control by Spring Batch.

A feature of this method is to process data repeatedly within one transaction.

Transaction control in normal process

Transaction control in normal process will be explained.

Sequence diagram of normal process

Description of the Sequence Diagram

Steps are executed from the job.

The step starts a framework transaction.

The step executes the tasklet.

Repeat steps 3 to 7 until there is no more input data.

Tasklet gets input data from Repository.

Repository will return input data to tasklet.

Tasklets process input data.

Tasklets pass output data to Repository.

Repository will output to the target resource.

The tasklet returns the process end to the step.

The step commits the framework transaction.

Transaction control in abnormal process

Transaction control in abnormal process will be explained.

Sequence diagram of abnormal process

Description of the Sequence Diagram

Steps are executed from the job.

The step starts a framework transaction.

The step executes the tasklet.

Repeat steps 3 to 7 until there is no more input data.

Tasklet gets input data from Repository.

Repository will return input data to tasklet.

Tasklets process input data.

Tasklets pass output data to Repository.

Repository will output to the target resource.

If any exception occurs between the process from 2 to 7,

The tasklet throws an exception to the step.

The step rolls back the framework transaction.

Intermediate commit method in tasklet model

A mechanism for directly operating a transaction by a user will be described.

Feature of this method is that resource transactions are handled only by user transactions, by using framework transactions that can manipulate resources.
Specify org.springframework.batch.support.transaction.ResourcelessTransactionManager without resources, in transaction-manager attribute.

Transaction control in normal process

Transaction control in normal process will be explained.

Sequence diagram of normal process

Description of the Sequence Diagram

Steps are executed from the job.

The step starts framework transaction.

The step executes the tasklet.

Repeat steps 3 to 10 until there is no more input data.

The tasklet starts user transaction via TransacitonManager.

Repeat steps 4 to 6 until the chunk size is reached.

Tasklet gets input data from Repository.

Repository will return input data to tasklet.

Tasklets process input data.

Tasklets pass output data to Repository.

Repository will output to the target resource.

The tasklet commits the user transaction via TransacitonManager.

TransacitonManager issues a commit to the target resource.

The tasklet returns the process end to the step.

The step commits the framework transaction.

In this case, each item is output to a resource, but like the chunk model,
it is also possible to update the processing throughput collectively by chunk unit and improve the processing throughput.
At that time, you can also use BatchUpdate by setting executorType of SqlSessionTemplate to BATCH.
This is the same behavior as using MyBatis' ItemWriter, so you can update it using MyBatis' ItemWriter.
For details of MyBatis' ItemWriter, refer to
Database access with ItemWriter.

Transaction control in abnormal process

Transaction control in abnormal process will be explained.

Sequence diagram of abnormal process

Description of the Sequence Diagram

Steps are executed from the job.

The step starts framework transaction.

The step executes the tasklet.

Repeat steps 3 to 11 until there is no more input data.

The tasklet starts user transaction from TransacitonManager.

Repeat steps 4 to 6 until the chunk size is reached.

Tasklet gets input data from Repository.

Repository will return input data to tasklet.

Tasklets process input data.

Tasklets pass output data to Repository.

Repository will output to the target resource.

If any exception occurs between the process from 3 to 8,

The tasklet processes the exception that occurred.

The tasklet performs a rollback of user transaction via TransacitonManager.

TransacitonManager issues a rollback to the target resource.

The tasklet throws an exception to the step.

The step rolls back framework transaction.

About processing continuation

Here, although processing is abnormally terminated after handling exceptions and rolling back the processing,
it is possible to continue processing the next chunk.
In either case, it is necessary to notify the subsequent processing by changing the status / end code of the step that an error has occurred during that process.

About framework transactions

In this case, although the job is abnormally terminated by throwing an exception after rolling back the user transaction,
it is also possible to return the processing end to the step and terminate the job normally.
In this case, the framework transaction is committed.

5.1.2.1.3. Selection policy for model-specific transaction control

In Spring Batch that is the basis of TERASOLUNA Batch 5.x, only the intermediate commit method can be implemented in the chunk model.
However, in the tasklet model, either the intermediate commit method or the single commit method can be implemented.

Therefore, in TERASOLUNA Batch 5.x, when the single commit method is necessary, it is to be implemented in the tasklet model.

5.1.2.2. Difference in transaction control for each execution method

Depending on the execution method, a transaction that is not managed by Spring Batch occurs before and after the job is executed.
This section explains transactions in two asynchronous execution processing schemes.

5.1.2.2.1. About transaction of DB polling

Regarding processing to the Job-request-table performed by the DB polling, transaction processing other than Spring Batch managed will be performed.
Also, regarding exceptions that occurred in the job, since correspondence is completed within the job, it does not affect transactions performed by JobRequestPollTask.

A simple sequence diagram focusing on transactions is shown in the figure below.

JobRequestPollTask will start a transaction other than Spring Batch managed.

JobRequestPollTask will retreive an asynchronous batch to execute from Job-request-table.

JobRequestPollTask will commit the transaction other than Spring Batch managed.

JobRequestPollTask will start a transaction other than Spring Batch managed.

JobRequestPollTask will update the status of Job-request-table’s polling status from INIT to POLLED.

JobRequestPollTask will commit the transaction other than Spring Batch managed.

JobRequestPollTask will execute the job.

Inside the job, transaction control for DB for Management(JobRepository) will be managed by Spring Batch.

Inside the job, transaction control for DB for Job will be managed by Spring Batch.

job_execution_id is returned to JobRequestPollTask

JobRequestPollTask will start a transaction other than Spring Batch managed.

JobRequestPollTask will update the status of Job-request-table’s polling status from INIT to EXECUTE.

JobRequestPollTask will commit the transaction other than Spring Batch managed.

About Commit at SELECT Issuance

Some databases may implicitly start transactions when SELECT is issued.
Therefore, by explicitly issuing a commit, the transaction is confirmed so that the transaction is clearly distinguished from other transactions and is not influenced.

5.1.2.2.2. About the transaction of WebAP server process

As for processing to resources targeted by WebAP, transaction processing outside Spring Batch managed is performed.
Also, regarding exceptions that occurred in the job, since correspondence is completed within the job, it does not affect transactions performed by WebAP.

A simple sequence diagram focusing on transactions is shown in the figure below.

Transaction of WebAP server process

Description of the Sequence Diagram

WebAP processing is executed by the request from the client

WebAP will start the transaction managed outside of Spring Batch.

WebAP reads from and writes to resources in WebAP before job execution.

WebAP executes the job.

Within a job, Spring Batch carries out transaction management to the Management DB (JobRepository).

Within a job, Spring Batch carries out transaction management to the Job DB.

job_execution_id is returned to WebAP

WebAP reads from and writes to resources in WebAP after job execution.

WebAP will commit the transaction managed outside of Spring Batch.

WebAP returns a response to the client.

5.1.3. How to use

Here, transaction control in one job will be explained separately in the following cases.

Set jobTransactionManager which is already defined in transaction-manager attribute of <batch:tasklet> tag.
The intermediate commit method transaction is controlled by the transaction manager set here.

(2)

Set chunk size to commit-interval attribute. In this sample, commit once for every 10 processing.

For the tasklet model

In the case of the tasklet model, the method of transaction control differs depending on whether the method is single commit method or the intermediate commit method.

Set jobResourcelessTransactionManager which is already defined in transaction-manager attribute of <batch:tasklet> tag.

(2)

Inject the transaction manager.
In the @Named annotation, specify jobTransactionManager to identify the bean to use.

(3)

Start transaction at the beginning of chunk.

(4)

Commit the transaction at the end of the chunk.

(5)

When an exception occurs, roll back the transaction.

(6)

For the last chunk, commit the transaction.

Updating by ItemWriter

In the above example, although Repository is used, it is possible to update data using ItemWriter.
Using ItemWriter has the effect of simplifying implementation, especially FlatFileItemWriter should be used when updating files.

5.1.3.1.2. Note for non-transactional data sources

In the case of files, no transaction setting or operation is necessary.

When using FlatFileItemWriter, pseudo transaction control can be performed.
This is implemented by delaying the writing to the resource and actually writing out at the commit timing.
Normally, when it reaches the chunk size, it outputs chunk data to the actual file, and if an exception occurs, data output of the chunk is not performed.

FlatFileItemWriter can switch transaction control on and off with transactional property. The default is true and transaction control is enabled.
If the transactional property is false, FlatFileItemWriter will output the data regardless of the transaction.

When adopting the single commit method, it is recommend to set the transactional property to false.
As described above, since data is written to the resource at the commit timing, until then, all the output data is held in the memory.
Therefore, when the amount of data is large, there is a high possibility that the memory becomes insufficient and an error will occur.

On TransacitonManager settings in jobs that only handle files

As in the following job definition, the transaction-manager attribute of batch: tasklet is mandatory in the xsd schema and can not be omitted.

Therefore, always specify jobTransactionManager. At this time, the following behaviors are obtained.

If transactional is true

Synchronize with specified TransacitonManager and output to resource.

If transactional is false

Transaction processing of the specified TransacitonManager is idle and it outputs to the resource regardless of the transaction.

At this time, transactions are issued to the resource (eg, database) referred to by jobTransactionManager,
but since there is no table access, there is no actual damage.

If you do not want to issue transactions to refer to even if it is idle or in case of actual damage, you can use ResourcelessTransactionManager which does not require resources.
ResourcelessTransactionManager is defined as jobResourcelessTransactionManager in launch-context.xml.

5.1.3.2. For multiple data sources

Transaction control of jobs input / output to multiple data sources will be described.
Since consideration points are different between input and output, they will be explained separately.

5.1.3.2.1. Input from multiple data source

When retrieving data from multiple data sources, the data that is the axis of the process and it’s additional data should be retrieved separately.
Hereinafter, the data as the axis of processing is referred to as the process target record, and the additional data accompanying it is referred to as accompanying data.

Because of the structure of Spring Batch, ItemReader is based on the premise that it retriedves a process target record from one resource.
This is the same way of thinking regardless of the type of resource.

Retriving process target record

Get it by ItemReader.

Retriving accompanying data

In the accompanying data, it is necessary to select the following retreiving method according to the presence or absence of change to the data and the number of cases. This is not an option, and it may be used in combination.

Batch retrieval before step execution

Retrieve each time according to the record to be processed

When retrieving all at once before step execution

Implement Listener to do the following and refer to data from the following Step.

Retrieve data collectively

Store the information in the bean whose scope is Job or Step

ExecutionContext of Spring Batch can be used,
but a diffferent class can be created to store data considering the readability and maintainability.
For the sake of simplicity, the sample will be explained using ExecutionContext.

This method is adopted when reading data that does not depend on data to be processed such as master data.
However, even if it is a master data, if there is a large number of items which may give an impact to the memory, retrieving each time should be considered.

Accompaniment data is retrieved from the Repository for input data(process target record).

(3)

Return data with processing target record and accompanying data together.
Notice that this data will be the input data to the next ItemProcessor.

(4)

Set ItemProcessor for retrieving every time.

(5)

Set ItemProcessor for business logic.

5.1.3.2.2. Output to multiple data sources(multiple steps)

Process multiple data sources throughout the job by dividing the steps for each data source and processing a single data source at each step.

Data processed at the first step is stored in a table, and at the second step, it is outputted to a file.

Although each step is simple and easy to recover, there is a possibility that it may be troublesome twice.

As a result, in the case of causing the following harmful effects, consider processing multiple data sources in one step.

Processing time increases

Business logic becomes redundant

5.1.3.2.3. Output to multiple data sources(single step)

Generally, when transactions for a plurality of data sources are combined into one, a distributed transaction based on 2 phase-commit is used.
However, it is also known that there are the following disadvantages.

Middleware must be compatible with distributed transaction API such as XAResource, and special setting based on it is required

In standalone Java like a batch program, you need to add a JTA implementation library for distributed transactions

Recovery in case of failure is difficult

Although it is possible to utilize distributed transactions also in Spring Batch, the method using global transaction by JTA requires performance overhead due to the characteristics of the protocol.
As a method to process multiple data sources collectively more easily, Best Efforts 1PC pattern is recommended.

What is Best Efforts 1PC pattern

Briefly, it refers to the technique of handling multiple data sources as local transactions and issuing sequential commits at the same timing.
The conceptual diagram is shown in the figure below.

Conceptual diagram of Best Efforts 1PC pattern

Description of figure

The user instructs ChainedTransactionManager to start the transaction.

Since this method is not a distributed transaction, there is a possibility that data consistency may not be maintained
if a failure(exception) occurs at commit / rollback in the second and subsequent transaction managers.
Therefore, although it is necessary to design a recovery method when a failure occurs at a transaction boundary, there is an effect that the recovery frequency can be reduced and the recovery procedure can be simplified.

When processing multiple transactional resources at the same time

Use it on cases such as when processing multiple databases simultaneously, when processing databases and MQ, and so on.

Process as 1 phase-commit by defining multiple transaction managers as one using ChainedTransactionManager as follows.
Note that ChainedTransactionManager is a class provided by Spring Data.

For files, setting FlatFileItemWriter’s transactional property to true provides the same effect as the "Best Efforts 1PC pattern" described above.
For details, refer to Note for non-transactional data sources.

This setting delays writing to the file until just before committing the transaction of the database, so it is easy to synchronize with the two data sources.
However, even in this case, if an error occurs during file output processing after committing to the database, there is a possibility that data consistency can not be maintained,
It is necessary to design a recovery method.

5.1.3.3. Notes on intermediate method commit

Although it is deprecated, when processing data is skipped in ItemWriter, the chunk size setting value is forcibly changed.
Note that this has a very big impact on transactions. Refer to
Skip for details.

5.2. Database Access

5.2.1. Overview

MyBatis3 (hereafter, called [MyBatis]) is used for database access in TERASOLUNA Batch 5.x.
Please refer below TERASOLUNA Server 5.x Development Guideline for basic usage of database access using MyBatis.

This chapter mainly explain how to use database access as TERASOLUNA Batch 5.x specifically.

Notes for how to use Oracle JDBC in Linux environment

While using Oracle JDBC in Linux environment, locking of random generator number of OS used by Oracle JDBC occurs.
Hence, even though jobs are attempted to be executed in parallel, events for sequential execution and events for one connection timeout occur.
2 patterns for how to avoid these events are shown below.

Set following in system properties while executing Java command.

-Djava.security.egd=file:///dev/urandom

Change securerandom.source=/dev/random in ${JAVA_HOME}/jre/lib/security/java.security to securerandom.source=/dev/urandom.

5.2.2. How to use

Explain how to use database access as TERASOLUNA Batch 5.x.

It must be remembered that how to access database varies for chunk model and tasklet model.

There are following 2 ways to use database access in TERASOLUNA Batch 5.x.
Please choose them based on the components accessing the database.

Connection information to database used by adminDataSource
H2 is used in this example.

(4)

Connection information to database used by jobDataSource
PostgreSQL is used in this example.

5.2.2.1.2. MyBatis Setting

Important points for setting MyBatis on TERASOLUNA Batch 5.x.

One of the important points in implementing batch processing is "to efficiently process large amounts of data with certain resources"
Explain the setting.

fetchSize

In general batch processing, it is mandatory to specify the appropriate fetchSize for the JDBC driver
to reduce the communication cost of processing large amounts of data.
fetchSize is a parameter that sets the number of data to be acquired by one communication between the JDBC driver and the database.
It is desirable to set this value as large as possible. However, if it is too large, it presses memory. So please be careful.
user has to tune the parameter.

In MyBatis, user can set defaultFetchSize as a common setting for all queries, and can override it with fetchSize setting for each query.

executorType

In general batch processing, the same SQL is executed within the same transaction for the number of total data count/fetchSize.
At this time, it is possible to process efficiently by reusing a statement instead of creating it each time.

In the MyBatis setting, it can reuse statements by setting REUSE in defaultExecutorType
and contributes to improved processing throughput.

When updating a large amount of data at once, performance improvement can be expected by using batch update of JDBC.
Therefore, SqlSessionTemplate used in MyBatisBatchItemWriter
is set to BATCH (not REUSE) in executorType.

In TERASOLUNA Batch 5.x, two different ExecutorType exists at the same time.
It is assumed that it is often implemented by one ExecutorType, but special attention is required when using them together.
The detail will be explained in Database Access other than ItemReader・ItemWriter.

When performing synchronous execution, SqlSessionFactory using adminDataSource is unnecessary and is not defined.
When performing Asynchronous execution(DB polling),
it is defined in META-INF/spring/async-batch-daemon.xml to access the Job-request-table.

5.2.2.1.4. MyBatis-Spring setting

When using ItemReader and ItemWriter provided by MyBatis-Spring, it is necessary to set Mapper XML used in Mapper’s Config.

As the setting method, there are following 2 methods.

Register Mapper XML to be used for all jobs as a common setting.

All Mapper XML has to be described in META-INF/spring/launch-context.xml.

Register Mapper XML to be used for each job as individual setting.

Mapper XML required by each job has to be described in bean definition under META-INF/jobs/

If common settings are made, the following adverse effects arise because not only Mapper XML of jobs executed, but also Mapper XML used by other jobs are also read when executing synchronous execution.

It takes time to start the job

Consumption of memory resources increases

To avoid it, TERASOLUNA Batch 5.x adopts a setting method that specifies only Mapper XML that the job requires for each job definition as individual setting.

For the basic setting method,
please refer to MyBatis-Spring settings in TERASOLUNA Server 5.x Development Guideline.

In TERASOLUNA Batch 5.x, since multiple SqlSessionFactory and SqlSessionTemplate are defined,
it is necessary to explicitly specify which one to use.
Basically, specify jobSqlSessionFactory

5.2.2.2. Database access with ItemReader

5.2.2.2.1. ItemReader of MyBatis

MyBatis-Spring provides the following two ItemReader.

org.mybatis.spring.batch.MyBatisCursorItemReader

org.mybatis.spring.batch.MyBatisPagingItemReader

MyBatisPagingItemReader is an ItemReader
that uses the mechanism described
in Pagination search for Entity (SQL refinement method) of TERASOLUNA Server 5.x Development Guideline
Since SQL is issued again after acquiring a certain number of cases, there is a possibility that data consistency may not be maintained.
Therefore, it is dangerous to use it in batch processing, so TERASOLUNA Batch 5.x does not use it in principle.
TERASOLUNA Batch 5.x uses only MyBatisCursorItemReader.

In TERASOLUNA Batch 5.x, as explained in MyBatis-Spring setting,
It adopts a method of dynamically registering Mapper XML with mybatis:scan.
Therefore, it is necessary to prepare an interface corresponding to Mapper XML.
For details, please refer to
Implementation of database access process in TERASOLUNA Server 5.x Development Guideline.

Specify the SQL ID defined in (6) with namespace + <method name> of (5) to the property of statementId.

(4)

Specify SessionTemplate of the database to be accessed in sqlSessionTemplate-ref property.
The specified SessionTemplate is mandatory that executorType is set to BATCH.

(5)

Define Mapper XML. Match the value of namespace with the FQCN of the interface.

(6)

Define SQL.

(7)

Define the method corresponding to the SQL ID defined in (6) for the interface.

5.2.2.4. Database Access other than ItemReader・ItemWriter

Explain database access except for ItemReader・ItemWriter.

To access the database except for ItemReader・ItemWriter, use the Mapper interface.
In using the Mapper interface, TERASOLUNA Batch 5.x has the following restrictions.

The available points of Mapper interface.

Process

ItemProcessor

Tasklet

Listner

Reference

Available

Available

Available

Update

Conditionally available

Available

Unavailable

Restrictions in ItemProcessor

There is a restriction that it should not be executed with two or more ExecutorType within the same transaction in MyBatis.
If "use MyBatisBatchItemWriter for ItemWriter" and "use ItemProcessor to update and reference the Mapper interface"
are satisfied at the same time, it conflicts with this restriction.
To avoid this restriction,
database is accessed by using Mapper interface that ExecutorType is BATCH in ItemProcessor.
In addition, MyBatisBatchItemWriter checks whether it is SQL issued by itself with the status check after executing SQL
but naturally it can not manage SQL execution by ItemProcessor and an error will occur.
Therefore, if MyBatisBatchItemWriter is used, updating with the Mapper interface will not be possible and only reference.

It can set to invalidate the error check of MyBatisBatchItemWriter, but the setting is prohibited because there is a possibility that unexpected behavior may occur.

Restrictions in Tasklet

In Tasklet, since it is basic to use the Mapper interface, there is no influence like ItemProcessor.
It is possible to use MyBatisBatchItemWriter by Inject, but in that case Mapper interface itself can be processed with BATCH setting.
In other words, there is basically no need to use MyBatisBatchItemWriter by Inject.

Restrictions in Listener

Even at the listener, the same restriction as that of ItemProcessor is established.
In addition, for listeners, use cases requiring updates are difficult to think. Therefore, update processing is prohibited at the listner.

Replacement of update processing assumed by the listner

Job state management

It is done by JobRepository of Spring Batch

Log output to database

It should be done in the log Appender. It is necessary to manage it separately from the transaction of the job.

Register Mapper XML.
By specifying batchModeSqlSessionTemplate set as BATCH in template-ref attribute,
database access with ItemProcessor is BATCH.
if you set factory-ref="jobSqlSessionFactory", it conflicts with the above restriction
and an exception is thrown when MyBatisBatchItemWriter is executed.

(3)

Define MyBatisBatchItemWriter
Specify batchModeSqlSessionTemplate set as BATCH in sqlSessionTemplate-ref property.

(4)

Set ItemProcessor that injected Mapper interface.

Supplement of MyBatisCursorItemReader setting

Different ExecutorType can be used for MyBatisCursorItemReader and MyBatisBatchItemWriter like the definition example below.
This is because the opening of the resource by MyBatisCursorItemReader is done before the start of the transaction.

If there are many updating processes with the tasklet model, set batchModeSqlSessionTemplate in factory-ref attribute.
As a result, batch update processing is performed, so performance improvement can be expected.
However, be aware that executing batch updates requires flush explicitly.
For details,
please refer to
Precautions when using batch mode Repository.

5.2.2.4.3. Database Access with Listener

Database access with listener is often linked with other components.
Depending on the listener to be used and the implementation method,
It is necessary to prepare additional mechanism to hand over to other components.

Show an example in which StepExecutionListener
acquires data before step execution and uses the data acquired by ItemProcessor.

Get data from the Mapper interface and cache it at the listener.
In this case, I/O is reduced and processing efficiency is improved by creating a cache
before step execution with StepExecutionListener#beforeStep and referring to the cache in the subsequent processing.

(4)

Inject the same bean as the cache set in (2).

(5)

Get corresponding data from the cache.

(6)

Reflect the data from the cache in the update data.

(7)

Implement the cache class as a component.
The Bean scope is singleton in here. Please set according to job.

In the above example, batchModeSqlSessionTemplate is set, but jobSqlSessionFactory also can be set.

For listeners that run outside the scope of chunks,
since it is processed outside the transaction, setting jobSqlSessionFactory does not matter.

5.2.3. How To Extend

5.2.3.1. Updating multiple tables with CompositeItemWriter

In a chunk model, when multiple tables are to be updated for 1 input data, it can be achieved by using CompositeItemWriter provided by Spring Batch
and linking MyBatisBatchItemWriter corresponding to each table.

An implementation example wherein two tables of sales plan and actual sales are updated is shown here.

Implement ItemProcessor with DTO as output which retains each entity for updating both the tables for input data.
Since different objects cannot be passed in ItemWriter for updating 2 tables, a DTO which consolidates objects necessary for update is defined.

(2)

Create an entity for creating a new actual sales record (SalesPerformanceDetail) and store in DTO.

(3)

Update input data for updating sales plan which is also input data (SalesPlanDetail) and store it in DTO.

(4)

Define DTO(SalesPlanDetail) so as to retain a sales plan.

(5)

Define DTO(SalesPerformanceDetail) so as to retain actual sales record.

Define MyBatisBatchItemWriter which creates a new actual sales table (sales_performance_detail).

(10)

Define CompositeItemWriter in order to execute (9) and (10) sequentially.

(11)

Set (9) and (10) in <list> tag. ItemWriter is executed in the specified order.

(12)

Specify the Bean defined in (10), in writer attribute of chunk. Specify ItemProcessor of (1) in processor attribute.

It can also be updated for multiple data sources by using it together with org.springframework.data.transaction.ChainedTransactionManager
which is explained in Output to multiple data sources (1 step).

Further, since CompositeItemWriter can be linked in case of ItemWriter implementation,
it can be done along with database output and file output by setting MyBatisBatchItemWriter and FlatFileItemWriter.

5.3. File Access

5.3.1. Overview

This chapter describes how to input and output files.

The usage method of this function is same in the chunk model as well as tasklet model.

5.3.1.1. Type of File which can be handled

Type of File which can be handled

The type of files that can be handled with TERASOLUNA Batch 5.x are ones decribed as below.
This is the same for which Spring Batch can handle.

Flat File

XML

Here it will explain how to handle flat file first,
and then explain about XML in How To Extend.

First, show the types of Flat File which can be used with TERASOLUNA Batch 5.x.
Each row inside the flat file will be called record,
and type of file is determined by the record’s format.

Record Format

Format

Overview

Variable-length Record

A record format which each items are separated by a delimiter, such as CSV and TSF. Each item’s length can be variable.

Fixed-length Record

A record format which each items are separeted by the items length(bytes). Each item’s length are fixed.

Single String Record

1 Record will be handled as 1 String item.

File Structure which can be handled

The basic structure for Flat File is constructed by these 2 points.

Record Division

Record Format

Elements to construct format of Flat File

Element

Overview

Record Division

A division will indicate the type of record, such as Header Record, Data Record, and Trailer Record.
Details will be described later.

Record Format

The format will have informations of the record such as how many rows there is for Header, Data, and Trailer, how many times eace record will repeat, and so on.
There is also Single Format and Multi Format.Details will be described later.

With TERASOLUNA Batch 5.x, Flat File with Single Format of Multi Format which includes each record division can be handles.

Here it willl explain about the record division and the record formats.

The overview of each record devision is explained as below.

Characteristic of each Record Division

Record Division

Overview

Header Record

A record that is mentioned at the beginning of the file(data part).
It has items such as field names, common matters of the file, and summary of the data part.

Data Record

It is a record having data to be processed as a main object of the file.

Trailer/Footer Record

A record that is mentioned at the end of the file if the file(data part).
It has items such as common matters of the file and summary of the data part.
For Single Format file, it is called a Fotter Record.

Footer/End Record

A record that is mentioned at the end of the file if the file is a Multi Format.
It has items such as common matters of the file and summary of the data part.

About the field that indicates the record division

A flat file having a header record or a trailer record may have a field indicating a record division.
In TERASOLUNA Batch 5.x, especially in the processing of multi-format files, the record division field is utilized, for example when different processing is performed for each record division.
Refer to Multi format for the implementation when selecting the processing to be executed by record classification.

About the name of file format

Depending on the definition of the file format in each system,
There are cases where names are different from this guideline such as calling Footer Record as End Record.
Must be read as appropriate.

A summary of Single Format and Multi Format is shown below.

Overview of Single Format and Multi Format

Format

Overview

Single Format

A format with Header N Rows + Data N Rows + Trailer N Rows.

Multi Format

A format with (Header N Rows + Data N Rows + Trailer N Rows) * N + Footer N Rows.
A format in which a Footer Record is added after repeating a Single Format a plurality of times.

The Multi Format record structure is shown in the figure as follows.

Multi Format Rrecord Structure Diagram

An example of a Single Format and Multi Format flat file is shown below.// is used as a comment-out character for the description of the file.

Example of Single Format, flat file(CSV format) without record division

For flat files having Multi Format or a structure including a footer part in the above structure, refer to How To Extend

5.3.1.2. A component that inputs and outputs a flat file

Describe a class for handling flat file.

Input

The relationships of classes used for inputting flat files are as follows.

Relationship of classes used for inputting flat files

The calling relationship of each component is as follows.

Calling relationship of each component

Details of each component are shown below.

org.springframework.batch.item.file.FlatFileItemReader

Implementation class of ItemReader to use for loading flat files. Use the following components.
The flow of simple processing is as follows.
1.Use BufferedReaderFactory to get BufferedReader.
2.Read one record from the flat file using the acquired BufferedReader.
3.Use LineMapper to map one record to the target bean.

org.springframework.batch.item.file.BufferedReaderFactory

Generate BufferedReader to read the file.

org.springframework.batch.item.file.LineMapper

One record is mapped to the target bean. Use the following components.
The flow of simple processing is as follows.
1.Use LineTokenizer to split one record into each item.
2.Mapping items split by FieldSetMapper to bean properties.

org.springframework.batch.item.file.transform.LineTokenizer

Divide one record acquired from the file into each item.
Each partitioned item is stored in FieldSet class.

org.springframework.batch.item.file.mapping.FieldSetMapper

Map each item in one divided record to the property of the target bean.

Output

Relationships of classes used for outputting flat files are as follows.

Relationship of classes used for outputting flat files

The calling relationship of each component is as follows.

Calling relationship of each component

org.springframework.batch.item.file.FlatFileItemWriter

Implementation class of ItemWriter for exporting to a flat file. Use the following components.
LineAggregator Mapping the target bean to one record.

org.springframework.batch.item.file.transform.LineAggregator

It is used to map the target bean to one record.
The mapping between the properties of the bean and each item in the record is done in FieldExtractor.

Sets the character code of the input file. Default value of the character code of the component offered by Spring Batch varies for ItemReader and ItemWriter (Default value of ItemWriter is "UTF-8").
Hence, it is recommended to explicitly set character code even while using default value.

JavaVM’s default character set

(3)

strict

If true is set, an exception occurs if the input file does not exist(can not be opened).

true

(4)

lineMapper

Set org.springframework.batch.item.file.mapping.DefaultLineMapper.DefaultLineMapper is LineMapper which provides the basic operation of converting records to the class to be converted using the defined LineTokenizer and FieldSetMapper.

Nothing

(5)

lineTokenizer

Set org.springframework.batch.item.file.transform.DelimitedLineTokenizer.DelimitedLineTokenizer is an implementation class of LineTokenizer that separates records by specifying delimiters.
It corresponds to the reading of escaped line feeds, delimiters, and enclosed characters defined in the specification of RFC-4180, which is a general format of CSV format.

Nothing

(6)

names

Give a name to each item of one record.
Each item can be retrieved using the name set in FieldSet used in FieldSetMapper.
Set each name from the beginning of the record with a comma separator.
When using BeanWrapperFieldSetMapper it is mandatory setting.

Nothing

(7)

delimiter

Set delimiter

comma

(8)

quoteCharacter

Set enclosing character

Nothing

(9)

fieldSetMapper

If special conversion processing such as character strings and numbers is unnecessary, use org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper,
and specify the class to be converted to property targetType.
By doing this, an instance that automatically sets the value in the field that matches the name of each item set in (5) will be created.
If conversion processing is necessary, set the implementation class of org.springframework.batch.item.file.mapping.FieldSetMapper.

Nothing

See How To Extend for the case of implementing FieldSetMapper yourself.

How to enter TSV format file

When reading the TSV file, it can be realized by setting a tab as a delimiter.

Sets the character code of the output file. Default value of character code of the components offered by Spring Batch varies for ItemReader and ItemWriter (Default value of ItemReader is "default character set of JavaVM").
Hence, it is recommended to explicitly set character code even while using default value.

UTF-8

(3)

lineSeparator

Set record break (line feed code).

line.separator of system’s property

(4)

appendAllowed

If true, add to the existing file. If true, it must be noted that setting value of shouldDeleteIfExists is invalidated.

false

(5)

shouldDeleteIfExists

If appendAllowed is true, it is recommended not to specify property since the property is invalidated.
If true, delete if the file already exists.
If false, throw an exception if the file already exists.

true

(6)

shouldDeleteIfEmpty

If true, delete file for output when output count is 0. Since unintended behaviour is likely to happen by combining with other properties, it is recommended not to set it to true. For details, refer Described later.

Set org.springframework.batch.item.file.transform.DelimitedLineAggregator.
To enclose a field around it, set org.terasoluna.batch.item.file.transform.EnclosableDelimitedLineAggregator.
Usage of EnclosableDelimitedLineAggregator will be described later.

Nothing

(9)

delimiter

Sets the delimiter.

comma

(10)

fieldExtractor

If special conversion processing for strings and numbers is unnecessary, you can use org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor.
If conversion processing is necessary, set implementation class of org.springframework.batch.item.file.transform.FieldExtractor.
An example for implementatiion of FieldExtractor, refer to Output of Fixed-length record where a sample is described using full-width character.

Nothing

(11)

names

Give a name to each item of one record. Set each name from the beginning of the record with a comma separator.

Nothing

It is recommended not to set true for shouldDeleteIfEmpty property of FlatFileItemWriter.

For FlatFileItemWriter, unintended files are deleted when the properties are configured by the combinations as shown below.

p:shouldDeleteIfEmpty="true"

p:shouldDeleteIfExists="false"

Reasons are as given below.
When shouldDeleteIfEmpty is set to true, file for output is deleted when output count is 0.
The "output count is 0" also includes a case wherein file for output already exists with shouldDeleteIfExists set to false.

Hence, when properties are specified by combinations above, file for output is deleted if it exists already.
This becomes the unintended behaviour when preferably an exception should be thrown and the process should be terminated in case a file for output exists.

It is recommended not to set shouldDeleteIfEmpty property to true since it results in unintended operation.

Further, when subsequent processing like deletion of file is to be done if output count is 0, implementation should be done by using OS command or Listener instead of shouldDeleteIfEmpty property.

How to use EnclosableDelimitedLineAggregator

To enclose a field around it, use org.terasoluna.batch.item.file.transform.EnclosableDelimitedLineAggregator provided by TERASOLUNA Batch 5.x.
The specification of EnclosableDelimitedLineAggregator is as follows.

Optional specification of enclosure character and delimiter character

Default is the following value commonly used in CSV format

Enclosed character: "(double quote)

Separator: , (comma)

If the field contains a carriage return, line feed, enclosure character, or delimiter, enclose the field with an enclosing character

When enclosing characters are included, the enclosing character will be escaped by adding an enclosing character right before this enclosing characters.

The org.springframework.batch.item.file.transform.DelimitedLineAggregator provided by Spring Batch does not correspond to the enclosing process of the field, therefore it can not satisfy the specification of RFC-4180.
Refer to Spring Batch/BATCH-2463 .

The format of the CSV format is defined as follows in RFC-4180 which is a general format of CSV format.

If the field does not contain line breaks, enclosing characters, or delimiters, each field can be enclosed in double quotes (enclosing characters) or not enclosed

Sets the character code of the input file. Default value of character code for the components offered by Spring Batch varies for ItemReader and ItemWriter (Default value of ItemWriter is "UTF-8").
Hence, it is recommended to explicitly set character code even while using default value.

JavaVM default character set

(3)

strict

If true is set, an exception occurs if the input file does not exist(can not be opened).

true

(4)

bufferedReaderFactory

To decide record breaks by line breaks, use the default value org.springframework.batch.item.file.DefaultBufferedReaderFactory.
BufferedReader generated by DefaultBufferedReaderFactory acquires up to a newline as one record.

To judge the delimiter of a record by the number of bytes, set org.terasoluna.batch.item.file.FixedByteLengthBufferedReaderFactory provided by TERASOLUNA Batch 5.x.
BufferedReader generated by FixedByteLengthBufferedReaderFactory acquires up to the specified number of bytes as one record.
Detailed specifications and usage of FixedByteLengthBufferedReaderFactory will be described later.

Give a name to each item of one record.
Each item can be retrieved using the name set in FieldSet used in FieldSetMapper.
Set each name from the beginning of the record with a comma separator.
When using BeanWrapperFieldSetMapper it is mandatory setting.

Nothing

(8)

ranges
(Constructor argument)

Sets the break position. Set the delimiter position from the beginning of the record, separated by commas.
The unit of each delimiter position is byte, and it is specified in start position - end position format.
The range specified from the record is acquired in the order in which the delimiter positions are set, and stored in FieldSet.
When names of (6) are specified, the delimiter positions are stored in FieldSet in correspondence with names in the order in which they are set.

Nothing

(9)

charset
(Constructor argument)

Set the same character code as (2).

Nothing

(10)

fieldSetMapper

If special conversion processing for character strings and numbers is unnecessary, use org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper,
and specify the conversion target class as property targetType.
By doing this, we create an instance that automatically sets the value in the field that matches the name of each item set in (6).
If conversion processing is necessary, set the implementation class of org.springframework.batch.item.file.mapping.FieldSetMapper.

Nothing

See How To Extend for the case of implementing FieldSetMapper yourself.

How to use FixedByteLengthBufferedReaderFactory

To read a file that judges record delimiter by byte count, use org.terasoluna.batch.item.file.FixedByteLengthBufferedReaderFactory provided by TERASOLUNA Batch 5.x.

By using FixedByteLengthBufferedReaderFactory, it is possible to acquire up to the number of bytes specified as one record.
The specification of FixedByteLengthBufferedReaderFactory is as follows.

Specify byte count of record as constructor argument

Generate FixedByteLengthBufferedReader which reads the file with the specified number of bytes as one record

Use of FixedByteLengthBufferedReader is as follows.

Reads a file with one byte length specified at instance creation

If there is a line feed code, do not discard it and read it by including it in the byte length of one record

The file encoding to be used for reading is the value set for FlatFileItemWriter, and it will be used when BufferedReader is generated.

The method of defining FixedByteLengthBufferedReaderFactory is shown below.

When dealing with Fixed-length files, it is based on using the component provided by TERASOLUNA Batch 5.x.

FixedByteLengthBufferedReaderFactory

BufferedReader generation class that reads one record from the fixed-length file without line break by the number of bytes of the specified character code

FixedByteLengthLineTokenizer

The FixedLengthTokenizer extension class, separated by the number of bytes corresponding to the multibyte character string

Processing records containing multibyte character strings

When processing records containing multibyte character strings, be sure to use FixedByteLengthLineTokenizer.
The FixedLengthTokenizer provided by Spring Batch separates the record by the number of characters instead of the number of bytes, so there is a possibility that the item will not be extracted as expected.

5.3.2.2.2. Output

An example of setting for writing the following output file is shown.

In order to write a fixed-length file, it is necessary to format the value obtained from the bean according to the number of bytes of the field.
The format execution method differs as follows depending on whether double-byte characters are included or not.

If double-byte characters is not included(single-byte characters only and the number of bytes of characters is constant)

Format using FormatterLineAggregator.

The format is set by the format used in the String.format method.

If double-byte characters is included(The number of bytes of characters is not constant depending on the character code)

Format with implementation class of FieldExtractor.

First, a setting example in the case where double-byte characters are not included in the output file is shown, followed by a setting example in the case where double-byte characters are included.

The setting when double-byte characters are not included in the output file is shown below.

Sets the character code of the output file. Default value of character code for the components offered by Spring Batch varies for ItemReader and ItemWriter (Default value of ItemReader is "Default character set of JavaVM").
Hence, it is recommended to explicitly set the character code even while using default value.

UTF-8

(3)

lineSeparator

Set the record break(line feed code).
To make it without line breaks, set (empty string).

line.separator of system’s property

(4)

appendAllowed

If true, add to the existing file. If true, it must be noted that setting value of shouldDeleteIfExists is invalidated.

false

(5)

shouldDeleteIfExists

If appendAllowed is true, it is recommended not to specify a property since this property is invalidated.
If true, delete the file if it already exists.
If false, throw an exception if the file already exists.

true

(6)

shouldDeleteIfEmpty

If true, delete the file for output if the output count is 0. Since unintended behaviour is likely to happen by combining with other properties, it is recommended not to set it to true. For details, refer Notes for how to output variable length record.

Set org.springframework.batch.item.file.transform.FormatterLineAggregator.

Nothing

(9)

format

Set the output format with the format used in the String.format method.

Nothing

(10)

fieldExtractor

If special conversion processing for strings and numbers is unnecessary, you can use org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor.

If conversion processing is necessary, set implementation class of org.springframework.batch.item.file.transform.FieldExtractor.
An example for implementatiion of FieldExtractor to format double-byte characters is written later on.

PassThroughFieldExtractor

(11)

names

Give a name to each item of one record. Set the names of each field from the beginning of the record with a comma.

Nothing

About PassThroughFieldExtractor

Deafult value for property fieldExtractor of FormatterLineAggregator is org.springframework.batch.item.file.transform.PassThroughFieldExtractor.

PassThroughFieldExtractor is a class to return the original item without processing anything, and is used when FieldExtractor will not process anything.

If the item is an array or a collection, it is returned as is, otherwise it is wrapped in an array of single elements.

Example of how to format a field with double-byte character

When formatting for double-byte characters, since the number of bytes per character differs depending on the character code, use the implementation class of FieldExtractor instead of FormatterLineAggregator.

Implementation class of FieldExtractor is to be done as follows.

Implement FieldExtractor and override extract method.

extract method is to be implemented as below

get the value from the item(target bean), and perform the conversion as needed

set the value to an array of object and return it.

The format of a field that includes double-byte characters is to be done in the implementation class of FieldExtractor by the following way.

Get the number of bytes for the character code

Format the value by trimming or padding it according to be number of bytes

Below is a setting example for formatting a field including double-byte characters.

Implement FieldExtractor class and override extract method.
Set the conversion target class as the type argument of FieldExtractor.

(2)

Define a Object type array to store data after the conversion.

(3)

Get the value from the item(target bean), and perform the conversion as needed, set the value to an array of object.

(4)

Format the field that includes double-byte character.
Refer to (5) and (6) for the details of format process.

(5)

Get the number of bytes for the character code.

(6)

Format the value by trimming or padding it according to be number of bytes.
In the implementation example, white space characters are added before the character string up to the specified number of bytes.

(7)

Returns an array of Object type holding the processing result.

5.3.2.3. Single String record

Describe the definition method when dealing with a single character string record file.

5.3.2.3.1. Input

An example of setting for reading the following input file is shown below.

Sets the character code of the input file. Default value of character code for the components offered by Spring Batch varies for ItemReader and ItemWriter (Default value of ItemWriter is "UTF-8").
Hence, it is recommended to explicitly set character code even while using default value.

JavaVM default character set

(3)

strict

If true is set, an exception occurs if the input file does not exist(can not be opened).

true

(4)

lineMapper

Set org.springframework.batch.item.file.mapping.PassThroughLineMapper.PassThroughLineMapper is a implementation class of LineMapper, and it will return the String value of passed record as it is.

Sets the character code of the output file. Default value of character code for the components offered by Spring Batch varies for ItemReader and ItemWriter (Default value of ItemReader is "Default character set of JavaVM").
Hence, it is recommended to explicitly set character code even while using default value.

UTF-8

(3)

lineSeparator

Set the record break(line feed code)

line.separator of system’s property

(4)

appendAllowed

If true, add to existing file. If true, it must be noted that setting value of shouldDeleteIfExists is invalidated.

false

(5)

shouldDeleteIfExists

If appendAllowed is true, it is recommended not to specify the property since the property is invalidated.
If true, delete the file if it already exists.
If false, throw an exception if the file already exists.

true

(6)

shouldDeleteIfEmpty

If true, delete file for output if output count is 0. Since unintended behaviour is likely to happen by combining with other properties, it is recommended not to set it to true. For details, refer Notes for how to output variable length records.

Set org.springframework.batch.item.file.transform.PassThroughLineAggregator.PassThroughLineAggregator is the implementation class of LineAggregator that will return the converted String value of the item(target Bean) as it is by processing item.toString().

Nothing

5.3.2.4. Header and Footer

Explain the input / output method when there is a header / footer.

Here, a method of skipping the header footer by specifying the number of lines will be explained.
When the number of records of header / footer is variable and it is not possible to specify the number of lines, use PatternMatchingCompositeLineMapper with reference to Multi format input

5.3.2.4.1. Input

Skipping Header

There are 2 ways to skip the header record.

Set the number of lines to skip to property linesToSkip of FlatFileItemReader

# Remove number of lines in header from the top of input file
tail -n +`expr 2 + 1` input.txt > output.txt

Use the tail command and get the 3rd line and after from input.txt, and then write it out to output.txt.
Please note that the value specified for option -n + K of tail command is the number of header records + 1.

OS command to skip header record and footer record

By using the head and tail commands, it is possible to skip the header record and footer record by specifying the number of lines.

How to skip the header record

Execute the tail command with option -n +K, and get the lines after K from the target file.

How to skip the footer record

Execute the head command with option -n -K, and get the lines befor K from the target file.

A sample of shell script to skip header record and footer record can be written as follows.

An example of a shell script that removes a specified number of lines from a header / footer

#!/bin/bash
if [ $# -ne 4 ]; then
echo "The number of arguments must be 4, given is $#." 1>&2
exit 1
fi
# Input file.
input=$1
# Output file.
output=$2
# Number of lines in header.
header=$3
# Number of lines in footer.
footer=$4
# Remove number of lines in header from the top of input file
# and number of lines in footer from the end,
# and save to output file.
tail -n +`expr ${header} + 1` ${input} | head -n -${footer} > ${output}

Arguments

No

Description

(1)

Input file

(2)

Output file

(3)

Number of lines to skip for header

(4)

Number of lines to skip for footer

Retrieving header information

Here shows how to recognize and retrive the header record.

The extraction of header information is implemented as follows.

Settings

Write the process for header record in implementation class of org.springframework.batch.item.file.LineCallbackHandler

Set the information retrieved in LineCallbackHandler#handleLine() to stepExecutionContext

Set implementation class of LineCallbackHandler to property skippedLinesCallback of FlatFileItemReader``

Set the number of lines to skip to property linesToSkip of FlatFileItemReader

Reading files and retrieving header information

For each line which is skipped by the setting of linesToSkip, LineCallbackHandler#handleLine() is executed

Header information is set to stepExecutionContext

Use retrieved header information

Get header information from stepExecutionContext and use it in the processing of the data part

An example of implementation for retrieving header record information is shown below.

Set implementation class of LineCallbackHandler.
An implementation sample will be described later.

Nothing

(2)

listener

Set implementation class of StepExecutionListener.
This setting is needed sinche the LineCallbackHandler set to property skippedLinesCallback of FlatFileItemReader will not be automatically registered as the Listener.
The detailed reason will be described later.

Nothing

About the listener

Since the following two cases are not automatically registered as Listener, it is necessary to add a definition to Listeners at the time of job definition.
(If listener definitions are not added, StepExecutionListener # beforeStep () will not be executed)

StepExecutionListener of LineCallbackHandler which is set to skippedLinesCallback of FlatFileItemReader

Implement beforeStep method and annotate it with @BeforeStep.
The signature will be void beforeStep(StepExecution stepExecution).
It is also possible to implement the StepExecutionListener class and override beforeStep method.

(4)

Get the StepExecution and save it to the class field.

(5)

Implement LineCallbackHandler class and override handleLine method.

(6)

Get stepExecutionContext from StepExecution, set header information to stepExecutionContext by using key header.
Here, for simplicity, only the last one line of two lines to be skipped is stored.

Here is a sample of getting the header information from stepExecutionContext and using it for processing of data part.
A sample of using header information in ItemProcessor will be described as an example.
The same can be done when using header information in other components.

The implementation of using header information is done as follows.

As like the sample of implementing LineCallbackHandler, implement StepExecutionListener#beforeStep()

Get StepExecution in beforeStep method and save it to the class field

Get stepExecutionContext and the header information from StepExecution and use it

Implement beforeStep method and annotate it with @BeforeStep.
The signature will be void beforeStep(StepExecution stepExecution).
It is also possible to implement the StepExecutionListener class and override beforeStep method.

(3)

Get the StepExecution and save it to the class field.

(4)

Get stepExecutionContext from StepExecution, set header information to stepExecutionContext by using key header.

About the use of ExecutionContext of Job/Step

In retrieving header (footer) information, the method is to store the read header information in ExecutionContext of StepExecution, and retrieves it from ExecutionContext when using it.

In the example below, header information is stored in ExecutionContext of StepExecution in order to obtain and use header information within one step.
If step is divided by retreiving and using the header information, use ExecutionContext of JobExecution.

Since Spring Batch nor TERASOLUNA Batch 5.x does not support skipping footer record, it needs to be done by OS command.

Input File Sample

000001,2016,1,0000000001,1000000000
000002,2017,2,0000000002,2000000000
000003,2018,3,0000000003,3000000000
number of items,3
total of amounts,6000000000

The last 2 lines is the footer record.

The setting for reading the above file is as follows.

Skipping by OS command

# Remove number of lines in footer from the end of input file
head -n -2 input.txt > output.txt

Use head command, get the lines above the second line from the last from input.txt, and write it out to output.txt.

It is reported to JIRA Spring Batch/BATCH-2539 that Spring Batch does not have a functions to skip the footer record.
Hence, there is a possibility that not only by OS command, but Spring Batch will be able to skip the footer record in the future.

Retrieving footer information

In Spring Batch and TERASOLUNA Batch 5.x, functions for skipping footer record retreiving footer information is not provided.

Therefore, it needs to be divided into preprocessing OS command and 2 steps as described below.

Divide footer record by OS command

In 1st step, read the footer record and set footer information to ExecutionContext

In 2nd step, retrive footer information from ExecutionContext and use it

Retreiving footer information will be implemented as follows.

Divide footer record by OS command

Use OS command to divide the input file to footer part and others

1st step, read the footer record and get footer information

Read the footer record and set it to jobExecutionContext

Since the steps are different in storing and using footer information, store it in jobExecutionContext.

The use of jobExecutionContext is same as the stepExecutionContext explained in Retrieving header information, execpt for the scope of Job and Step.

2nd step, use the retrieved footer information

Get the footer information from jobExecutionContext and use it for processing of data part.

An example will be described in which footer information of the following file is taken out and used.

Input File Sample

000001,2016,1,0000000001,1000000000
000002,2017,2,0000000002,2000000000
000003,2018,3,0000000003,3000000000
number of items,3
total of amounts,6000000000

The last 2 lines is footer record.

Divide footer record by OS command

The setting to divide the above file into footer part and others by OS command is as follows.

Explain how to get and use footer information from a footer record divided by OS command.

The step of reading the footer record is divided into the preprocessing and main processing.
Refer to Flow Controll for details of step dividing.

In the example below, a sample is shown in which footer information is retreived and stored in jobExecutionContext.
Footer information can be used by retreiving it from jobExecutionContext like the same way described in Retrieving header information.

Define ItemReader to read a file with footer record.
Used by injecting it to readFooterTasklet which is executed when retreiving footer information.

(2)

dataReader

Define ItemReader to read a file with data record.

(3)

preprocess step

Define a step to get the footer information.
Implemented at readFooterTasklet. Implementation sample is written later on.

(4)

main process step

A step of retreiving data information and using footer information is defined.
Use dataReader for reader.
In the sample, method to get footer information from jobExecutionContext such as ItemProcessor is not implemented.
Footer information can be retreived and used the same way described in Retrieving header information.

(5)

listeners

Set readFooterTasklet.
Without this setting, JobExecutionListener#beforeJob() implemented in readFooterTasklet will not be executed.
For details, refer to Retrieving header information.

Nothing

An example for reading a file with footer record and storing it to jobExecutionContextis shown below.

The way to make it as the implementation class of Tasklet is as follows.

Inject the bean defined footerReader by name using @Inject@ and @Named

Write the header information using Writer from the argument.
Write method of FlatFileItemWriter will be executed right after the execution of FlatFileHeaderCallback#writeHeader().
Therefore, printing line break at the end of header information is not needed.
The line feed that is printed is the one set when FlatFileItemWriter bean was defined.

When implementing FlatFileHeaderCallback, printing line feed at the end of header information is not necessary

Right after executing FlatFileHeaderCallback#writeHeader() in FlatFileItemWriter, line feed is printed according to the bean definition, so the line feed at the end of header information does not need to be printed.

Output footer information

To output footer information to a flat file, implement as follows.

Implement org.springframework.batch.item.file.FlatFileFooterCallback

Set the implemented FlatFileFooterCallback to property footerCallback of FlatFileItemWriter

By setting footerCallback, FlatFileHeaderCallback#writeFooter() will be executed at first when processing FlatFileItemWriter

A method of outputting footer information with a flat file will be described.

Implement FlatFileFooterCallback as follows.

Output footer information using Writer from the argument.

Implement FlatFileFooterCallback class and override writeFooter.

Below is an implementation sample of FlatFileFooterCallback class for a Job to get footer information from ExecutionContext and write it out to a file.

5.3.2.5. Multiple Files

Describe how to handle multiple files.

5.3.2.5.1. Input

To read multiple files of the same record format, use org.springframework.batch.item.file.MultiResourceItemReader. MultiResourceItemReader can use the specified ItemReader to read multiple files specified by regular expressions.

Implement MultiResourceItemReader as follows.

Define bean of MultiResourceItemReader

Set file to read to property resources

user regular expression to read multiple files

Set ItemReader to read files to property delegate

Below is a definition example of MultiResourceItemReader to read multiple files with the following file names.

Since property resource is set automatically from MultiResourceItemReader, it is not necessary to set it in Bean definition.

It is unnecessary to specify resource for ItemReader used by MultiResourceItemReader

Since resource of ItemReader delegated from MultiResourceItemReader is automatically set from MultiResourceItemReader, it is not necessary to set it in Bean definition.

5.3.2.5.2. Output

Explain how to define multiple files.

To output to a different file for a certain number of cases, use org.springframework.batch.item.file.MultiResourceItemWriter.

MultiResourceItemWriter can output to multiple files for each number specified using the specified ItemWriter.
It is necessary to make the output file name unique so as not to overlap, but ResourceSuffixCreator is provided as a mechanism for doing it.ResourceSuffixCreator is a class that generates a suffix that makes the file name unique.

For example, if you want to make the output target file a file name outputDir / customer_list_01.csv (01 part is serial number), set it as follows.

Set outputDir/customer_list_ to MultiResourceItemWriter

Implement a code to generate suffix 01.csv(01 part is serial number) at ResourceSuffixCreator

Serial numbers can use the value automatically incremented and passed from MultiResourceItemWriter

outputDir/customer_list_01.csv is set to the ItemWriter that is actually used

MultiResourceItemWriter is defined as follows. How to implement ResourceSuffixCreator is described later.

Define implementation class of ResourceSuffixCreator

Define bean for MultiResourceItemWriter

Set output file to property resources

Set the file name up to the suffix given to implementation class of ResourceSuffixCreator

Set implementation class of ResourceSuffixCreator that generates suffix to property resourceSuffixCreator

Set ItemWrite which is to be used to read file to property delegate

Set the number of output per file to property itemCountLimitPerResource

Use argument’s index to generate suffix to return.
index is an int type value with initial value 1, and will be incremented for each output file.

5.3.2.6. Control Break

How to actually do the Control Break will be described here.

What is Control Break

Control Break process(or Key Break process) is a process method to read sorted records one by one,
and handle records with a certain item(key item) as one group.
It is an algorithm that is used mainly for aggregating data,
continues counting while key items are the same value,
and outputs aggregate values when key items become different values.

In order to perform the control break processing, it is necessary to pre-read the record in order to judge the change of the group.
Pre-reading records can be done by using org.springframework.batch.item.support.SingleItemPeekableItemReader.
Also, control break can be processed only in tasklet model.
This is because of the premise that the chunk model is based on, which are "processing N data rows defined by one line" and "transaction boundaries every fixed number of lines",
does not fit with the control break’s basic algorithm, "proceed at the turn of group".

The execution timing of control break processing and comparison conditions are shown below.

Execute control break before processing the target record

Keep the previously read record, compare previous record with current record

Execute control break after processing the target record

Pre-read the next record by SingleItemPeekableItemReader and compare the current record with the next record

A sample for outputting process result from input data using control break is shown below.

Get branchId from argument’s FieldSet, and store it to conversion target class variable.
Conversion for branchId is not done in the sample since it is not necessary.

(4)

Get date from argument’s FieldSet, and store it to conversion target class variable.
Use SimpleDateFormat to convert Japanese calendar format date to Date type value.

(5)

Get customerId from argument’s FieldSet, and store it to conversion target class variable.
Conversion for customerId is not done in the sample since it is not necessary.

(4)

Get amount from argument’s FieldSet, and store it to conversion target class variable.
Use DecimalFormat to convert value with comma to BigDecimal type value.

(7)

Return the conversion target class holding the processing result.

Getting value from FieldSet class

The FieldSet class has methods corresponding to various data types for obtaining stored values such as listed below.
When generating FieldSet if data is stored in association with the field name, it is possible to get data by specifying that name or by specifying the index.

readString()

readInt()

readBigDecimal()

etc

5.3.3.2. XML File

Describe the definition method when dealing with XML files.

For the conversion process between Bean and XML (O / X (Object / XML) mapping), use the library provided by Spring Framework.
Implementation classes are provided as Marshaller and Unmarshaller using XStream, JAXB, etc. as libraries for converting between XML files and objects.
Use one suitable for your situation.

Below are features and points for adopting JAXB and XStream.

JAXB

Specify the bean to be converted in the bean definition file

Validation using a schema file can be performed

It is useful when the schema is defined externally and the specification of the input file is strictly determined

XStream

You can map XML elements and bean fields flexibly in the bean definition file

It is useful when you need to flexibly map beans

Here is a sample using JAXB.

5.3.3.2.1. Input

For inputting XML file, use org.springframework.batch.item.xml.StaxEventItemReader provided by Spring Batch.StaxEventItemReader can read the XML file by mapping the XML file to the bean using the specified Unmarshaller.

Get event information from argument’s event(ValidationEvent), and do any process needed.
In the sample, logging is proceeded.

(3)

Return false to end the search process.
Return true to continue the search process.
Return false to end this operation by generating appropriate UnmarshalException, ValidationException or MarshalException.

Adding dependency library

Library dependency needs to be added as below when using
Spring Object/Xml Marshalling provided by Spring Framework
such as org.springframework.oxm.jaxb.Jaxb2Marshaller.

5.3.3.2.2. Output

Use org.springframework.batch.item.xml.StaxEventItemWriter provided by Spring Batch for outputting XML file.StaxEventItemWriter can output an XML file by mapping the bean to XML using the specified Marshaller.

Implement StaxEventItemWriter as follows.

Do the below setting to conversion target class

Add @XmlRootElement to the class as it is to be the root element of the XML

Use @XmlType annotation to set orders for outputting fields

If there is a field to be excluded from conversion to XML, add @XmlTransient to the getter method of it’s field

Set below properties to StaxEventItemWriter

Set output target file to property resource

Set org.springframework.oxm.jaxb.Jaxb2Marshaller to property marshaller

Set below property to Jaxb2Marshaller

Set conversion target classes in list format to property classesToBeBound

Set character encoding for output file Default value of character code for the component offered by Spring Batch varies for ItemReader and ItemWriter (Default value of ItemReader is "Default character set of JavaVM").
Hence, it is recommended to explicitly set character code even while using default value.

UTF-8

(3)

rootTagName

Set XML root tag name.

(4)

overwriteOutput

If true, delete the file if it already exists.
If false, throw an exception if the file already exists.

true

(5)

shouldDeleteIfEmpty

If true, delete the file for output if output count is 0. Since unintended behaviour is likely to happen by combining with other properties, it is recommended not to set it to true. For details