Spring Batch Partitioned Step stopped after hours from when a non-skippable exception occured

StackOverflow https://stackoverflow.com/questions/23560233

  •  18-07-2023
  •  | 
  •  

質問


I want to verify a behaviour of Spring Batch...
When running a partitioned step of a Job I got this exception:

org.springframework.batch.core.JobExecutionException: Partition handler returned an unsuccessful step
at org.springframework.batch.core.partition.support.PartitionStep.doExecute(PartitionStep.java:111)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195)
at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:137)
at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:64)
at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60)
at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:152)
at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:131)
at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135)
at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:301)
at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:134)
at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50)
at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:127)

only this - no previous exceptions that might have triggered this, and then got a FAILED result for my job.

When searching the logs from previous hours-days I noticed these exceptions(3 of them in different partitioned steps):

06/05/2014 21:50:51.996 [Step3TaskExecutor-12] [] ERROR                   AbstractStep - Line (222) Encountered an error executing the step
org.springframework.retry.RetryException: Non-skippable exception in recoverer while processing; nested exception is java.io.FileNotFoundException: Source 'blabla....pdf' does not exist
    at org.springframework.batch.core.step.item.FaultTolerantChunkProcessor$2.recover(FaultTolerantChunkProcessor.java:281)
    at org.springframework.retry.support.RetryTemplate.handleRetryExhausted(RetryTemplate.java:435)
    at org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:304)
    at org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:188)
    at org.springframework.batch.core.step.item.BatchRetryTemplate.execute(BatchRetryTemplate.java:217)
    at org.springframework.batch.core.step.item.FaultTolerantChunkProcessor.transform(FaultTolerantChunkProcessor.java:290)
    at org.springframework.batch.core.step.item.SimpleChunkProcessor.process(SimpleChunkProcessor.java:192)
    at org.springframework.batch.core.step.item.ChunkOrientedTasklet.execute(ChunkOrientedTasklet.java:75)
    at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:395)
    at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
    at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:267)
    at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:77)
    at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:368)
    at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:215)
    at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:144)
    at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:253)
    at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195)
    at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:139)
    at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:136)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.FileNotFoundException: Source 'blabla....pdf' does not exist

It seemed weird to me that after those exceptions, job continued to run, so I'm thinking that only the slave-steps that this exception occured have failed and master step waited for the rest of slave steps to finish in order to return the first error mentioned.
Can someone verify that this is the problem? it's been driving me crazy for days

役に立ちましたか?

解決

That is correct behavior for Spring Batch's partitioning. The PartitionHandler in the master step evaluates the results of all steps at once when they have all returned (or timed out). With regards to what happened in the slaves, those logged errors would be a leading cause to me. However, the definitive answer should be in the job repository (assuming you're using a database backed implementation). When a step fails (even a partitioned slave), the exception is stored there.

他のヒント

I got this error when the utilization of my CPU is very high. When i added this bean in configuration, it worked for me:

@Bean
    public TaskExecutor asyncTaskExecutor() {
        SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
        taskExecutor.setConcurrencyLimit(numberOfCores);
        return taskExecutor;
    }

I use it here:

@Bean
    public Step masterStep() throws Exception {
        return stepBuilderFactory.get("masterStep")
                .partitioner(slaveStep().getName(), partitioner())
                .step(slaveStep())
                .gridSize(gridSize)
                .taskExecutor(asyncTaskExecutor())
                .build();
    }
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top