This forum is now a read-only archive. All commenting, posting, registration services have been turned off. Those needing community support and/or wanting to ask questions should refer to the Tag/Forum map, and to http://spring.io/questions for a curated list of stackoverflow tags that Pivotal engineers, and the community, monitor.

Running Jobs and JobParameters Uniqueness

Jan 14th, 2010, 11:34 AM

I'm new to Spring Batch, so forgive any incorrect assumptions I may have.

Just wondering, what was the thinking behind preventing users from running the same job more than once with the same set of parameters (which would throw a JobInstanceAlreadyCompleteException)? Was the goal to prevent, say, a nightly job from accidentally executing twice? Or if jobs were kicked off in async threads, to prevent overlapping jobs?

I'm designing a system that syncs one database to a different target database. This sync could be incremental, it could a full sync, or specific records, all based on a trigger file. It is possible this system might be running on 2+ servers in case we needed to run tasks in parallel.

I envisioned a single sync job which would be instantiated based on the sync-type parameters. At first I was thinking of using startNextInstance and using an incrementer to ensure job parameter uniqueness. For example, if you started a job on one machine, and then started the same job on another, you would be guaranteed a different run.id. But then I realized you can't pass in other parameters when you use startNextInstance(), which I believe is the only operation that uses the incrementer.

But in order to use start() which allows parameters, but doesn't use the incrementer, I have to pass in something like a timestamp in order to ensure job parameter uniqueness, which seems like a bit of a hack.

I don't really like the idea of having to define a separate job for each possible parameters combination. Also, the idea of restarting those executions over and over again and never getting an exit status of COMPLETED is kind of bothersome.

The solution I'm considering is keeping my owner parameters table with each record containing a unique identifier, and passing that identifier to my job as the parameter. That way a job always has a unique parameter that is being consistently incremented, and I can still pass in parameters.

Still, I would figure this would have been possible with just the framework. Like a version of startNextInstance which allows you to use an incrementer yet still takes in new parameters.

There are actually rather a large number of possible scenarios for job launching and re-launching, combining existing parameters, incremented parameters and new parameters from the user. Job and JobLauncher, JobExplorer are the stable, low-level interfaces that you can use to compose the behaviour you need.

Spring Batch has only really scratched the surface of the higher-order composed behaviours up to now. JobOperator is one example, and the JobService in Spring Batch Admin is another. Use those if you like them, or compose your own pattern if you need to.

You shouldn't have to store your own parameters. If you can explain why that is necessary, maybe that will lead to something new and interesting in Batch.

Comment

Could you still explain the thinking behind limiting a job with a given set of parameters from running more than once? I realize this is the design, but I'm trying to understand what use case it is handling.

Comment

Dave,
I think your reason would be ok if we were talking about production where processing data should be unique. However, in development you want to run your jobs more than once and inspect the outcomes each time.
The restriction you added made our life difficult. Now before I start my job I have to go and clean up the tables.You might suggest using the MapJobRepositoryFactoryBean and the resourcelessTransaction Manager for that, I can't I am writing jobs using partition and cannot use those classes.
I would like to see some ease in that condition, where you can allow a job to restart and if it ran before overwrite the status in the DB, after all in these kind of situation we don't really care about the previous execution.

Comment

Aren't there a whole class of problems where you don't necessarily know what data is going to be processed (or when), and therefore you need to delegate the responsibility of ensuring data doesn't get processed twice? In our case, it is up to the SQL query passed to our JdbcCursorItemReader to ensure we don't process duplicated data. So forcing developers to parametrize *and* ensure identity just leads people to working around this constraint. I do a sync similar to what is described above, and I will have to pass in a timestamp just to guarantee identity but for no other purpose.

That really isn't a big deal (unless two people somehow kick off a job at the same time in an async task environment). And I see the benefit of uniqueness if, say, you needed to ensure that a set of XXXX-[date].xml files came in and you wanted to ensure only today's files got processed once. But in the real world weird things happen, and there is a remote chance they *might* need to be processed again, and not just via the restart functionality.

What if it was optional to ensure parameter identity?

Code:

<job unique="false" ... > ... </job>

P.S.

Other than this concern I am very impressed with Spring Batch. Keep up the good work.