This forum is now a read-only archive. All commenting, posting, registration services have been turned off. Those needing community support and/or wanting to ask questions should refer to the Tag/Forum map, and to http://spring.io/questions for a curated list of stackoverflow tags that Pivotal engineers, and the community, monitor.

Significant API Change in 1.1 and 2.0

May 1st, 2008, 09:22 AM

We would like to ask Spring Batch developers for any significant API changes anticipated in 1.1 and 2.0.

The background is that we are writing on a in-house framework by customizing Spring Batch 1.0.0.FINAL version. We need rewrite some core classes (e.g. ItemOrientedStep / TaskletStep ) so to make the Spring Batch framework more feasible to integrate with our in-house built framework. Therefore, we would like to ask for any big changes in future.

Besides, we are deciding whether we implement namespace on our own. We knew that your team is doing similar things. We don't want to duplicate your work or have a big deviation from yours in future. Will it be appeared in 1.1 or 2.0? If no, any draft of namespace design can share with us?

Before answering this question fully, can you give some more details about the customizations you plan to make? I'm most interested in the changes to the two Step implementations, but I'm also curious about the overall changes and where you feel like Spring Batch falls short and requires significant customization.

Comment

1. Now an ItemOrientedStep mainly flow like this Reader -> Writer. We want to make it like a Reader -> Processor Chain (our own framework) -> Writer. Therefore, we need to perform some intialization for the processor chain part.

2. We want to add TransistentStepExecutionContext and pass it in Item Reader / Writer and Listener. The main drive is to allow a more simple communication across the components within a step.

Comment

1. Now an ItemOrientedStep mainly flow like this Reader -> Writer. We want to make it like a Reader -> Processor Chain (our own framework) -> Writer. Therefore, we need to perform some intialization for the processor chain part.

I'm really not sure that this requires a whole lot of modification to be honest. You could create your own FactoryBean if you wanted to wire them in separately, but I don't see how this is different from an ItemTransformer chain:

The general idea of adding TransistentConexts is that we believe that our developers could very possibly need some place for non-persistent communications across components.

These last two are a little more interesting of issues. The last one is something we're actually dealing with in 1.1. We had originally thought about creating something similar to a JobContext, which would be Transient (I'm assuming you meant transient) and not persisted. However, the problem is what happens when a job fails. For example, one of the primary use cases of this functionality is passing a file name from one step to another. So, step 1 would create the first file name, and step 2 would use that to grab the correct file. If that second step failed, it would need to be able to get that filename back so that it could restart from where it left off on the same file. Thus, the name would have to be persisted.

You're second enhancement I don't completely understand. I can't come up with a single solid use case in my head for having something like that. Usually, if it's needed at all by any of the components of a Step, then it will need to be persisted so that it can be repopulated in case of restart. If you don't think you will need to be restarting, then you can still store in the ExecutionContext, it won't really harm anything unless it's a huge object. But again, still interested in hearing some of the concrete use cases for the functionality.

Comment

For point 1, it is right and we are doing very similar things with item transformer but we want the item processing align with our in-house built framework for web request processing. So we want to use our own processor chains pattern. Also, the Spring Batch's chain do not support branching. Bsides, I want to ask that if we use ItemTransformer, how can we access ExecutionContext inside the public Object transform(Object item) method?

For point 2, the use case is:
- We want to stop a job (or some reading items) when the job is running over a time (e.g. exceed 2 hours). The strategy we thought is to save the start time at context in beforeStep event. Then, the item reader will check if now-start time > overrun limit. If so, it will write some alert logging and throw JobOverrunException (or return null).
- in fact, the variable can be other environment conditions besides start-time, e.g. current queue length, current no. of files in a directory, etc.

Finally, I would like to recapture my query on namespace in the first post. Any updates on that?

Comment

For point 1, it is right and we are doing very similar things with item transformer but we want the item processing align with our in-house built framework for web request processing. So we want to use our own processor chains pattern. Also, the Spring Batch's chain do not support branching. Bsides, I want to ask that if we use ItemTransformer, how can we access ExecutionContext inside the public Object transform(Object item) method?

Reusing something designed for web request processing seems like it could have some potential pitfalls, but as I'm not familiar with your framework I'll leave it at that. Even with that scenario I don't think you really need to be modifying the step implementations themselves, but rather wrapping them similar to what we already do with a step factory beans. As far as the framework knows it can give the object to what it thinks is an ItemWriter, even though you may be handing it off to your processing chain instead. Although, I still think you could make an ItemTransformer implementation delegate to your processing chain. It's no problem getting the ExecutionContext, just register it as a stream separately. Either way, not modifying the step classes is going to be a much better approach. Even without API changes, you will have to apply bug fixes manually, etc, and I really would avoid it if you can. (and I think there's multiple ways, as explained above, to do so)

For point 2, the use case is:
- We want to stop a job (or some reading items) when the job is running over a time (e.g. exceed 2 hours). The strategy we thought is to save the start time at context in beforeStep event. Then, the item reader will check if now-start time > overrun limit. If so, it will write some alert logging and throw JobOverrunException (or return null).
- in fact, the variable can be other environment conditions besides start-time, e.g. current queue length, current no. of files in a directory, etc.

I'm somewhat confused by this, as the framework already provides a way to do this. The StepExecution object already stores the start time. You can simply call getStartTime on a StepExecution and it will return a timestamp. You can get access to this from a StepExecutionListener and easily throw your exception. There's no need for a TransientExecutionContext at all.

Finally, I would like to recapture my query on namespace in the first post. Any updates on that?

Sorry for missing this. We did indeed plan on having a namespace for 1.0. Ben actually created one in February, and you can still find it in the repository (I'd have to dig for the exact location) The problem we ran into was that the step and job configurations contained so many options that we weren't saving anything with the namespace. It ended up being just as verbose as the none namespace version (especially if you used the bean namespace to cut down on the normal property declarations) and was really only saving you from typing the classname. After looking at that result we decided that we really needed to take a look at overall step and job configuration to see how we can find better ways to deal with this complexity that will make a namespace more natural. Of course we won't be able to do anything drastic until 2.0 though. We really didn't want to provide a namespace that provided little value in 1.0, just to completely change it in 2.0 so that it did add value.