Question

i am making an SSIS package in which I transfer several Excel files in a unique Excel file.

The problem is that the input files may have different structures (some colulns might not be present).

my SSIS structure is like this one :

  1. a foreach container that take each file one by one
  2. A Script that shows a windowsform that allow me to enter the default values for my columns in Variables
  3. my dataflow task

In My dataflow task, i'd like to take the values from the Excel file and put them to the destination file. If the column does not exist, i want the dataflow to make a new column using the default value.

I managed to do it but I have a problem with the "if the column does not exist" thing. I added column with a Derived Column but how can he first check if the column does not exists in the Source File. i don't want to put default value each time. if the column is present i want to use it instead of a default value.

Thanks for your answers,

Was it helpful?

Solution

I got my answer by using a Script Component as Source of my Dataflow.

here is the structure i used

  1. windowsForm to enter default value and the file to read
  2. dataflow
  3. Scriptcomponent as a source
  4. read the file manually (via OleDB connector) and put it in a datatable
  5. check with if conditions if the column exist in the file
  6. if the column does not exist add it to the datatable
  7. get the column number to write in it later
  8. do it for each column that might not be present
  9. for loop that create a outputbuffer row for each row of the datatable
  10. in each row, write the corresponding value : default if the column was created manually otherwise the original one with the datatable.Rows[row][column] procedure

At the end i have the structure fully filled with either original columns of default values.

I first use a windowsform in a script task before the dataflow task in this form i let the user select the file via an openfiledialog and then on a windowsform he can enter the default values needed in the end.

here is an extract of the code i wrote in the scriptcomponent. i have default values in variables i need the uniqueRef, EAN and name to be filled : mandatory but sometimes i may have value1 and/or value2 and/or color filled in the file. if they are i want to get their good value. if not i want to put a default value for all the column.

it may not be optimized but does the job (8years without developping and first time in C# was quite a difficult thing for me but i had no choice as SSIS don't want to do it with UI).

    using System;
    using System.Data;
    using System.Windows.Forms;
    using System.Data.OleDb;
    using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
    using Microsoft.SqlServer.Dts.Runtime.Wrapper;

    [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
    public class ScriptMain : UserComponent
    {
        DataTable dt;
         public override void PreExecute()
        {
            base.PreExecute();
            /*
      get the xls file based on the inputPath and put its data in a datatable
            */

    String connString = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + Variables.InputPath + ";Extended Properties=\"Excel 8.0;IMEX=1\"";
    OleDbConnection connection = new OleDbConnection(connString);

    var DataAdapter = new OleDbDataAdapter("SELECT * FROM [ToImport$]", connection);
    dt = new DataTable();
    DataAdapter.Fill(dt);
    base.CreateNewOutputRows();
        }

        public override void PostExecute()
        {
            base.PostExecute();

        }
    public override void CreateNewOutputRows()
        {

    // initialize booleans which will tell if a column was present in the orignial file or has been created
    // data that don't have bool are considered as mandatory, program will fail if missing
    bool value1bool = false;
    bool value2bool = false;
    bool colorbool = false;

    // check if the column is in the original file. if not, set boolean to true and add the missing column to the datatable
    if ((dt.Columns.Contains("Value1")) == false)
    {
        dt.Columns.Add(new DataColumn("Value1", typeof(string)));
        value1bool= true;
    }
    if ((dt.Columns.Contains("Value2")) == false)
    {
        dt.Columns.Add(new DataColumn("Value2", typeof(string)));
        value2bool = true;
    }
    if ((dt.Columns.Contains("Color")) == false)
    {
        dt.Columns.Add(new DataColumn("Color", typeof(string)));
        colorbool = true;
    }
    //get the column number of each matching column name to be able to get values later
    int colRef = dt.Columns.IndexOf("UniqueRef");
    int colEan = dt.Columns.IndexOf("EAN");
    int colName = dt.Columns.IndexOf("Article Name");
    int colValue1 = dt.Columns.IndexOf("Value1");
    int colValue2 = dt.Columns.IndexOf("Value2");
    int colColor = dt.Columns.IndexOf("Color");

    //for each row of the datatable
    for (int i = 0; i < dt.Rows.Count; i++)
    {
        // adds values of each line to the output buffer corresponding to LOTS output columns
        //generate a new row in the buffer
        LotsOutputBuffer.AddRow();
        //fill rows datas taking datas from the good column with a name matching
        LotsOutputBuffer.RefUnique = dt.Rows[i][colRef].ToString();
        LotsOutputBuffer.EAN= dt.Rows[i][colEan].ToString();
        // if Value1 column is not in the input file, default value from windowsform is taken
        if (Value1bool)
            LotsOutputBuffer.Value1 = Variables.Value1Var;
        else LotsOutputBuffer.Value1 = dt.Rows[i][colValue1].ToString();
        // if Value2 column is not in the input file, default value from windowsform is taken
        if (Value2bool)
            LotsOutputBuffer.Value2 = Variables.Value2Var;
        else    LotsOutputBuffer.Value2=dt.Rows[i][colValue2].ToString();
        ArticlesBuffer.Name = dt.Rows[i][colName].ToString();
        // if Color column is not in the input file, default value from windowsform is taken
        if (colorbool)
            ArticlesBuffer.Color = Variables.colorVar;
        else ArticlesBuffer.Color = dt.Rows[i][colColor].ToString();
    }

        }

    }

OTHER TIPS

For your question: The foreach loop will not work if you want to do foreach container. Instead I have another idea which will help you to achieve the goal.

1) The source will be Excel source in your Dataflow task, Drag Derived column after Excel source > Destination will be a physical table in stagging table

Note: This stagging table will be dropped after your process is completed.

2) Second Data flow will read the file from your stagging table and you can write a case statement or define full column name ( SELECT COL1, COL12 from table) instead of (SELECT * FROM TABLE)

3) Drag a destination source will be what you were doing on original task.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top