Frage

im using SSIS in visual studio 2008 to perform some fuzzy grouping on a customer table.

columns ID Name Email etc

I have some duplicate customers in the table with the same email address im currently able to use the Fuzzy grouping to identify the duplicates for manual checking.

I also have some records which are almost duplicates but have some extra punctuation.

eg

    ID   Name  Email
    1    bob   bob.bob@bob.com 
    2    bob   bob.bob@bob.com 
    3    bob   bob..bob@bob.com
    7    tom   tom@tom.com 
    9    frog  tom@tom..com 

currently i can get id 1 and 2 to match but i would want 1, 2 and 3 to match and be grouped on the same key

and 7 and 9 to also match because i want to ignore the double full stops and see it as only one full stop. Also name does not matter, only the email address column is currently important.

any suggestions and help please.

War es hilfreich?

Lösung

Use a derived column transformation before your fuzzy grouping transformation to remove unwanted characters:

REPLACE([Email], "..", ".")
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top