Question

I am having this big database on one MSSQL server that contains data indexed by a web crawler. Every day I want to update SOLR SearchEngine Index using DataImportHandler which is situated in another server and another network.

Solr DataImportHandler uses query to get data from SQL. For example this query

SELECT * FROM DB.Table WHERE DateModified > Config.LastUpdateDate

The ImportHandler does 8 selects of this types. Each select will get arround 1000 rows from database.

To connect to SQL SERVER i am using com.microsoft.sqlserver.jdbc.SQLServerDriver

The parameters I can add for connection are:

  • responseBuffering="adaptive/all"
  • batchSize="integer"

So my question is:

What can go wrong while doing this queries every day ? ( except network errors ) I want to know how is SQL Server working in this context ?

Further more I have to take a decicion regarding the way I will implement this importing and how to handle errors, but first I need to know what errors can arise.

Thanks!

Later edit

My problem is that I don't know how can this SQL Queries fail. When i am calling this importer every day it does 10 queries to the database. If 5th query fails I have to options: rollback the entire transaction and do it again, or commit the data I got from the first 4 queries and redo somehow the queries 5 to 10. But if this queries always fails, because of some other problems, I need to think another way to import this data.

Can this sql queries over internet fail because of timeout operations or something like this?

Was it helpful?

Solution

The only problem i identified after working with this type of import is:

  • Network problem - If the network connection fails: in this case SOLR is rolling back any changes and the commit doesn't take place. In my program I identify this as an error and don't log the changes in the database.

Thanks @GuidEmpty for providing his comment and clarifying out this for me.

OTHER TIPS

There could be issues with permissions (not sure if you control these).

Might be a good idea to catch exceptions you can think of and include a catch all (Exception exp).

Then take the overall one as a worst case and roll-back (where you can) and log the exception to include later on.

You don't say what types you are selecting either, keep in mind text/blob can take a lot more space and could cause issues internally if you buffer any data etc.

Though just a quick re-read and you don't need to roll-back if you are only selecting.

I think you would be better having a think about what you are hoping to achieve and whether knowing all possible problems will help?

HTH

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top