This forum is now a read-only archive. All commenting, posting, registration services have been turned off. Those needing community support and/or wanting to ask questions should refer to the Tag/Forum map, and to http://spring.io/questions for a curated list of stackoverflow tags that Pivotal engineers, and the community, monitor.

Using 2 different datasources with Spring Batch

Jan 3rd, 2008, 10:31 AM

Hello,

I am completely new to Spring Batch and am just getting started with it, so I apologize if some of my questions are too obvious!

In my department we use Microsoft SQL Server databases. Also, we are restricted to the usage of stored procedures when interacting with the database. That's why I am trying to adapt the job and step incrementer to that policy and haven't succeeded so far.

Therefore, I was wondering if it is possible to use 2 databases simultaneously, the standard one for data processing and keep the embedded HSQLDB (or any other database) only for the framework?

You can choose any database that you like to run Spring Batch with. The framework does store metadata in 4 tables (will probably increase with m4 though). You could easily use a database to store metadata in, and another one for application concerns. However, I wouldn't recommend using an in-memory database for the metadata. The value of storing information about a batch run is being to able to look at it in real-time, and report off of it. For instance, you could look back and see what time a job was run two weeks ago, and how many records it processed, etc.

I'm curious about this 'everything has to be a stored procedure' policy at your company. In a batch scenario, this can cause some serious issues. For example, since the datasets are so large, and take long periods to process, we break them up into 'chunks' and commit after each chunk has been processed. This allows us to restart from the last chunk successfully processed, and keeps transaction sizes at a reasonable size, but not too small for performance reasons. It would not be possible to do this in a stored procedure, since the framework would have no control, and there would be no advantage in a stored procedure that processes one record. Furthermore, most of our 'stock' database item readers do not use stored procedures. The same is true for our metadata storage dao's.

Comment

I currently use two datasources, one for all the batch tables and sequences, and one for my data store. There is nothing in the framework that prevents you from doing this.

The only real issue I think you should be concerned about is transaction management across multiple datasources, since you can only register one transaction manager with your step executor.

I do agree that there is a problem with the policy of only using stored procedures. Unfortunately, as Lucas points out, this can potentially cause you to lose a great deal of the benefit of Spring Batch, but in a corporate environment I understand that you can't do much about this. I can't tell you what you should do about this, as I do not know your position. I would personally push back and say "this is a framework that is outside of our control - create a new username for it in the database that it will use solely and leave it as it is." In any case, my recommendation to you is to try using granularly-defined stored procedure definitions with Spring Batch but keep a close eye on performance and document any problems to review with your management.

Comment

In fact, the 'stored procedure only' policy is mainly enforced for historical reasons, and was introduced way before ORM frameworks emerged. Unfortunately, at my level I have no power to make an exception.

I would personally push back and say "this is a framework that is outside of our control"

I can't do this either, since this is the first project in my division to use Spring (I mean not only Spring Batch). In fact, introducing Spring has been a personal initiative, therefore I have to prove that it can be tailored to our architecture and adapt to our policies.

This is why dedicating a database to Spring Batch and thus having the freedom to choose the way of interacting with it, is a compelling alternative.

The only real issue I think you should be concerned about is transaction management across multiple datasources, since you can only register one transaction manager with your step executor.

Indeed, that's my biggest worry! Since I'm new to Spring, I'm not familiar with its transaction management model. I presume that using the DataSourceTransactionManager is no longer sufficient. Do I have to use JTA? With 2 data sources what would be the equivalent declaration of:
"<bean id="sqlTransactionManager" class="org.springframework.jdbc.datasource.DataSou rceTransactionManager"
lazy-init="true">
<property name="dataSource" ref="dataSource" />
</bean>" ?

Comment

If you wish to use JTA, spring has a specific JtaTransactionManager implemention that you can use. Please refer to the Spring reference documentation for more information.

Doug is partly right, the StepExecutor does take a transaction manager, which is for the 'business' transaction. However, the storage of metadata needs to have transactions applied around it using AOP, which would allow for the usage of a separate transaction manager. You can see examples of this in our samples. It should also be noted that the only place this intersects is at certain times when the StepExecution or Step is stored as part of the business transaction, which is the case for restart data, however, this model will change a bit in milestone 4.

Comment

1. You have a data source which represents your metadata tables - ie. the Spring Batch schema. This data source is managed using Spring AOP.

2. You have one or more other data sources which represent your business data. If you only have one, you do not need to use JTA, since you can create a Spring transaction manager and inject that into your StepExecutor. This is the case with my personal situation. However, if you have several "business" data sources, JTA may be necessary to properly control transactions between them, which is the warning I was trying to convey.

3. Ditto the exception Lucas mentioned.

Comment

As you said, I created 2 Spring transaction managers, the business one injected in StepExecutor and the second one in a transaction advice applied to batch repository methods, and it works like heaven!

Thanks a lot for all your help!

Comment

As you said, I created 2 Spring transaction managers, the business one injected in StepExecutor and the second one in a transaction advice applied to batch repository methods, and it works like heaven!

Thanks a lot for all your help!

Can someone please post an example of Spring Batch with two transaction managers? We will persiste our batch metric data to database_1 (Oracle) and our batch jobs will read and write to database_2 (SQL Server). Our batch environment will not run within an application container and we will be using Spring JDBC for persistence.

There's nothing stopping you from using a different transaction manager in your Dao from the one you wired in your step. However, without JTA, when the framework rolls back only the transaction manager wired into the step will rollback, not the one in your dao. Furthermore, you would have to find someway to commit on the dao transaction manager as well. The only way to safely get around this is to use JTA. Since you're running out of container, the only option I can think of is Jotm.

There's nothing stopping you from using a different transaction manager in your Dao from the one you wired in your step. However, without JTA, when the framework rolls back only the transaction manager wired into the step will rollback, not the one in your dao. Furthermore, you would have to find someway to commit on the dao transaction manager as well. The only way to safely get around this is to use JTA. Since you're running out of container, the only option I can think of is Jotm.

Hi,
I have similar kind of situation. In my Spring Batch application, I have defined two transaction manager and in transaction manager bean declarations, I have used <qualifier> tag to differentiate these transaction manger in my transactional DAO class.

During step execution, I need to update data into two databases.First database is same as what I use to store batch meta data. This one uses the transaction manager used by step execution which is "transactionManager". Second db one uses another transaction manager - cddprofileTxnManager. It is committing db updates in second DB, but roll back is not happening in case of exception. Is there anything I can do without using JTA?My application doesn't use any kind of Java EE container.