Question

My assumption

In my understanding "chunk oriented processing" in Spring Batch helps me to efficiently process multiple items in a single transaction. This includes efficient use of interfaces from external systems. As external communication includes overhead, it should be limited and chunk-oriented too. That's why we have the commit-level for the ItemWriter. So what I don't get is, why does the ItemReader still have to read item-by-item? Why can't I read chunks also?

Problem description

In my step, the reader has to call a webservice. And the writer will send this information to another webservice. That's why I wan't to do as few calls as necessary.

The interface of the ItemWriter is chunk-oriented - as you know for sure:

public abstract void write(List<? extends T> paramList) throws Exception;

But the ItemReader is not:

public abstract T read() throws Exception;

As a workaround I implemented a ChunkBufferingItemReader, which reads a list of items, stores them and returns items one-by-one whenever its read() method is called.

But when it comes to exception handling and restarting of a job now, this approach is getting messy. I'm getting the feeling that I'm doing work here, which the framework should do for me.

Question

So am I missing something? Is there any existing functionality in Spring Batch I just overlooked?

In another post it was suggested to change the return type of the ItemReader to a List. But then my ItemProcessor would have to emit multiple outputs from a single input. Is this the right approach?

I'm graceful for any best practices. Thanks in advance :-)

Was it helpful?

Solution

This is a draft for an implementation of the read() interface method.

public T read() throws Exception {
    while (this.items.isEmpty()) {
        final List<T> newItems = readChunk();
        if (newItems == null) {
            return null;
        }
        this.items.addAll(newItems);
    }
    return this.items.pop();
}

Please note, that items is a buffer for the items read in chunks and not requested by the framework yet.

OTHER TIPS

Spring Batch uses 'Chunk Oriented' processing style. (Not just chunk read or write, full process including read, process and write)

Chunk oriented processing refers to

  1. Read an item using ItemReader (Single Item)
  2. Process it using ItemProcessor, and aggregate the result (Result List is updated one by one).
  3. Once the commit interval is reached, the entire aggregated result (Result List) is written out using ItemWriter and then the transaction is committed.

Here is the code representation from SpringBatch doc

List items = new Arraylist();
for(int i = 0; i < commitInterval; i++){
    Object item = itemReader.read()
    Object processedItem = itemProcessor.process(item);
    items.add(processedItem);
}
itemWriter.write(items);

As you said, if you need your reader to return multiple Items, make it a List. And if your processor also returns a List. Finally, your Writer will get a List of List.

Here is the code representation of the new case

List<List<Object>> resultList = new Arraylist<List<Object>>();
for(int i = 0; i < commitInterval; i++){
    List<Object> items = itemReader.read()
    List<Object> processedItems = itemProcessor.process(items);
    resultList.add(processedItems);
}
itemWriter.write(resultList);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top