How to avoid duplicate entry and get the ID of corresponding originial entry using NHibernate

StackOverflow https://stackoverflow.com/questions/22125545

  •  18-10-2022
  •  | 
  •  

Question

I am a beginner in NHibernate. I am using NHibernate 3.3 and C Sharp. I am inserting data from file to mysql database. My file has duplicate entries. I have a PERSON table with primary key as id and is auto incremented as:

    id | firstname | lastname | fullname | dob | city

Now I want to avoid the duplicate save to the database and want to get the id of the original entry from database.

For example : There are 4 records in file:

firstname|lastname|fullname|dob|city
-------------------------------------------
a|b|ab|1|c
c|d|cd|2|e
a|b|ab|1|c
f|g|fg|4|h

Now suppose I already saved first two records using NHibernate 3.3. The person table will be as :

id|firstname|lastname|fullname|dob|city
------------------------------------------
1|a|b|ab|1|c
2|c|d|cd|2|e

In my current implementation 3'rd row from file which is same as 1'st row of file is being saved in table. Which I don't want to. I want to avoid duplicate entry in table. I also don't want to execute query each time to check whether the records previously exist in database or not. Also if such duplicate entry are already in table I want to populate its ID. For this case it should be 1.

So it would really be a great if someone could possibly suggest me a way around on this.

Was it helpful?

Solution

You should create a unique index on the fields:

CREATE UNIQUE INDEX "PERSON_idx" ON "PERSON"
  USING btree (firstname, lastname, fullname, dob, city);

Then everytime you attempt to insert a row with these fields equal to some row already present, the database server will raise an exception.

To get the ID of the conflicting row, in some cases it would be feasible to parse the error text returned from the database, but I would recommend just retrieving the record from the DB when this happens.

Now there may be two general paths in approaching the problem of getting the conflicting rows. In particular, using NHibernate's session.Save will not necessarily throw an exception because of violating the unique constraint since the INSERTs may not issued until committing the transaction. In that case, it's difficult to guess which of the rows caused the error (without looking at the database log).

This issue with delayed INSERTs can be remedied by using IStatelessSession instead of ISession which will make the INSERTs be issued immediately (AFAIK). Then it would be possible to have something like this:

using (var tx = statelessSession.BeginTransaction())
{
    foreach (var person in persons)
    {
        try
        {
            statelessSession.Insert(person);
        }
        catch (GenericADONetException e)
        {
            // Further check that it's really caused by violating the unique constraing (database-specific) and handling the situation
        }
    }
    tx.Commit();
}

If you need to use ISession or for some other reason do not like this solution, you can fetch all the duplicates before inserting the rows by issuing a single SELECT like:

var conflictingRows = session.Query<Person>().Where(p =>
      (p.FirstName == persons[0].FirstName && p.LastName == persons[0].LastName && ...)
   || (p.FirstName == persons[1].FirstName && p.LastName == persons[1].LastName && ...)
   ...
   || (p.FirstName == persons[persons.Count - 1].FirstName && p.LastName == persons[persons.Count - 1].LastName && ...));

Then getting the IDs corresponding to the records you want to insert can be easily done in memory. You would have to build this expression dynamically, but that's not a big deal either.


P.S.: Since you require the combination of all the fields to be unique, you could drop the id column altogether and use all the fields as a composite primary key.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top