Question

I have a large (3 million rows) table of transactional data, which can be simplified thus:

ID  File        DOB
--------------------------
1   File1       01/01/1900
2   File1       03/10/1978
3   File1       03/10/1978
4   File2       15/07/1997
5   File2       01/01/1900
6   File2       15/07/1997

In some cases there is no date. I would like to update the date field so it is the same as the other records for a file which has a date. So record 1's DOB would become 03/10/1978, because records 2 and 3 for that file have that date. Likewise record 5 would become 15/07/1997.

What is the most efficient way to achieve this?

Thanks.

Was it helpful?

Solution

Supposing your table is called "Files", then this will work:

UPDATE f1 SET f1.DOB=f2.MaxDOB
  FROM files f1
  JOIN (SELECT File, MAX(DOB) AS MaxDOB FROM files GROUP BY File) f2 ON
    f2.File=f1.File;

As far as performance is concerned, it probably won't get much more efficient than this, but you do need to insure there is an index on the (File, DOB) column set. 3 million records is a lot and this query will also update records that do not need it, but filtering those out would require a much more complex join. Anyway... you better check the query plan.

OTHER TIPS

I dont know about most efficient way, but i can think of one solution...create a temp table with following query. Though i am not sure about exact keywords of sqlserver 2008, but this might work or you may need to change key word like to_date and its format.

create table new_table as ( select file,min(DOB) as default_date, max(DOB) as fixed_date from three_million_table group by file having min(dob)= to_Date('01/01/1900','dd/mm/yyyy') )

so your new table will have
column headers: file, default_date,fixed_date
values: File1, 01/01/1900, 03/10/1978

Now it may not be wise to run update on three_million_table, but if you think it is ok then:

update T1 SET T1.DOB = T2.fixed_date FROM three_million_table T1 INNER JOIN new_table T2 ON T1.file = T2.file

Hope this help... having 3 million records will surely take it toll to update the table by scanning each record

;WITH testCTE ([name],dobir,number)
     AS (SELECT [File],DOB, ROW_NUMBER() OVER (PARTITION BY [FILE],DOB
      ORDER BY ( SELECT 0)) RowNumber                                    
         FROM   test)
UPDATE TEST
SET DOB = tcte.dobir
FROM testCTE as tcte
LEFT JOIN TEST t on tcte.name = t.[FILE]
WHERE tcte.number > 1 and [FILE] = tcte.[name]

sql fiddle

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top