Pergunta

We got a database with some redundant, bad data. As example some names of articles have an uppercase lower-case difference, other a problem of accent, others a missing letter and so on. The idea is to merge the db records that are actually the same.

Is there nice tools out there that allow to easily clean-up a database, ideally this would be not done automatically but would require a user confirmation

Foi útil?

Solução

There are quite a few tools out there for Data Cleansing. Also there are many more companies that offer data cleansing as a service.

I have performed data cleansing for several large corporations and it is not an easy task, or as straightforward as it seems and de duplicating data is also fraught with all sorts of issues that do not become apparent until you have begun the excercise.

IMHO, if your legacy data is in a relatively poor state and you have no in-house expertise in this (quite specialised) area, I'd look into employing a third party to do this for you as they are likely to perform it faster and at a lower total cost than starting from scratch.

If you want to build the in-house skills to do this then I have done a couple of quick Google searches and seen many software packages on offer, you might want to look into the relative strengths of these against each other for the specific types of data you are looking to cleanse as some will be better in certain areas than others.

Hope this helps, Ollie.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top