Question

I need to import tabular data into my database. The data is supplied via spreadsheets (mostly Excel files) from multiple parties. The format of each of these files is similar but not the same and various transformations will be necessary to massage the data into the final format suitable for import. Furthermore the input formats are likely to change in the future. I am looking for a tool that can be run and administered by regular users to transform the input files.

Now let me list some of the transformations I am looking to do:

  • swap columns:

Input is:

|Name|Category|Price|
|data|data    |data |

Output is

|Name|Price|Category|
|data|data |data    |
  • rename columns

Input is:

|PRODUCTNAME|CAT |PRICE|
|data       |data|data |

Output is

|Name|Category|Price|
|data|data    |data |
  • map columns according to a lookup table, like in the above examples: replace every occurrence of the string "Car" by "automobile" in the column Category
  • basic maths: multiply the price column by some factor
  • basic string manipulations Lets say that the format of the Price column is "3 x $45", I would want to split that into two columns of amount and price
  • filtering of rows by value: exclude all rows containing the word "expensive"
  • etc.

I have the following requirements:

  • it can run on any of these platform: Windows, Mac, Linux
  • Open Source, Freeware, Shareware or commercial
  • the transformations need to be editable via a GUI
  • if the tool requires end user training to use that is not an issue
  • it can handle on the order of 1000-50000 rows

Basically I am looking for a graphical tool that will help the users normalize the data so it can be imported, without me having to write a bunch of adapters.

What tools do you use to solve this?

Was it helpful?

Solution

The Mule Data Integrator will do all of this from a csv file. So you can export your spreadsheet to a CSV file, and load the CSV file ito the MDI. It can even load the data directly to the database. And the user can specify all of the transformations you requested. The MDI will work fine in non-Mule environments. You can find it here mulesoft.com (disclaimer, my company developed the transformation technology that this product is based on).

OTHER TIPS

The simplest solution IMHO would be to use Excel itself - you'll get all the Excel built-in functions and macros for free.

Have your transformation code in a macro that gets called via Excel controls (for the GUI aspect) on a spreadsheet. Find a way to insert that spreadsheet and macro in your client's Excel files. That way you don't need to worry about platform compatibility (it's their file, so they must be able to open it) and all the rest. The other requirements are met as well. The only training would be to show them how to enable macros.

You didn't say which database you're importing into, or what tool you use. If you were using SQL Server, then I'd recommend using SQL Server Integration Services (SSIS) to manipulate the spreadsheets during the import process.

I tend to use MS Access as a pipeline between multiple data sources and destinations - but you're looking for something a little more automated. You can use macros and VB script with Access to help through a lot of the basics.

However, you're always going to have data consistency problems with users mis-interpreting how to normalize their information. Good luck!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top