Question

I have batch steps, where I read the file, process the file and save it to DB. But as it is a full load, hence I have to remove the table from the DB before inserting new records.

My question is that what place is best for writing the code to delete the existing table in spring batch (reader, processor, writer), below are the multiple scenarios:

  1. Do this in open() method of ItemReader<> in reader class: Problem: if somehow the file which I'm reading is corrupt or blank, then in that case I will end up with empty table.

  2. Create a flag, set it once and on basis of that flag delete the table in processor class: This can be done, but is there any other better way or better pace to do this

  3. Create another temp table, copy all the records from file to this table and then in ´@AfterStep´ method, delete all the records from actual table and move all the records from temp table to this table.

Is there any method which just gets called once before the Itemprocessor, anything other then doing it using a flag? Please suggest

Was it helpful?

Solution

It sounds to me like you have 2 steps here:

  1. Truncate existing table.
  2. Insert data from file.

You need to decide which step should execute first.

If you want to execute the 'truncate' step first, you can add checks to validate the file before performing the truncate. For example, you could use a @BeforeStep to check that the file exists, is readable, and is not size 0.

If you need to guarantee that the entire file is parsed without error before loading the database table, then you will need to parse the data into some temporary location as you mention and then in a second step move the data from the temporary location to the final table. I see a few options there:

  1. Create a temporary table to hold the data as you suggest. After reading the file, in a different step, truncate the target table and move from the temp table to the target table.
  2. Add a 'createdDateTime' column or something similar to the existing table and an 'executionDate' job parameter. Then insert your new rows as they are parsed. In a second step, delete any rows that have a created time that is less than the executionDate (This assumes you are using a generated ID for a PK on the table).
  3. Add a 'status' column to the existing table. Insert the new rows as 'pending'. Then in a second step, delete all rows that are 'active' and update rows that are 'pending' to 'active'.
  4. Store the parsed data in-memory. This is dangerous for a few reasons; especially if the file is large. This also removes the ability to restart a failed job as the data in memory would be lost on failure.

OTHER TIPS

You can:

  • Create 2 tables Data and DataTemp with the same schema
  • Have table Data with existing data
  • Copy the new data on DataTemp

If the new incoming data stored on DataTemp is valid, then you can transfer it to table Data. If you don't have so much data you could do this transfer in a single transaction on another Tasklet. If you have a lot of data you could use a chunk processing tasklet. As you have inserted it, you know it is not correpted and respect the db contraints, so if the transfer failed (probably because db is down), you can restart it later without loosing any data.

You can also avoid that transfer part by using 2 tables, and using in your configuration which table of the 2 is supposed to be used (active table). Or you could use a table alias/view that you update to reference the active table.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top