Domanda

I need to develop a SSIS Package where I will need to import/use a flat file(has only 1 column) to compare each row against existing table's 2 columns(start and end column).

Flat File data -

110000
111112
111113
112222
112525
113222
113434
113453
114343
114545

And compare each row of the flat file against structure/data -

id  start   end
8   110000  119099
8   119200  119999
3   200000  209999
3   200000  209999
2   300000  300049
2   770000  779999
2   870000  879999

Now, If need to implement this in a simple stored procedure that would farely simple, however I am not able to get my head around this if I have do it in SSIS package.

Any ideas? Any help much appreciated.

È stato utile?

Soluzione

At the core, you will need to use a Lookup Component. Write a query, SELECT T.id, T.start, T.end FROM dbo.MyTable AS T and use that as your source. Map the input column to the start column and select the id so that it will be added to the data flow.

If you hit run, it will perform an exact lookup and only find values of 110000 and 119200. To convert it to a range query, you will need to go into the Advanced tab. There should be 3 things you can check: amount of memory, rows and customize the query. When you check the last one, you should get a query like

SELECT * FROM 
(SELECT T.id, T.start, T.end FROM dbo.MyTable AS T`) AS ref 
WHERE ref.start = ?

You will need to modify that to become

SELECT * FROM 
(SELECT T.id, T.start, T.end FROM dbo.MyTable AS T`) AS ref 
WHERE ? BETWEEN ref.start AND ref.end

It's been my experience that the range queries can become rather inefficient as it seems to cache what's been seen already so if the source file had 110001, 110002, 110003 you would see 3 unique queries sent to the database. For small data sets, that may not be so bad but it led to some ugly load times for my DW.

An alternative to this is to explode the ranges. For me, I had a source system that only kept date ranges and I needed to know by day what certain counts were. The range lookups were not performing well so I crafted a query to convert the single row with a range of 2010-01-01 to 2013-07-07 to many rows, each with a single date 2013-01-01, 2013-01-02... While this approach lead to a longer pre-execute phase (it took a few minutes as the query had to generate ~30k rows per day for the past 5 years), once cached locally it was a simple seek to find a given transaction by day.

Preferably, I'd create a numbers table, fill it to the max of int and be done with it but you might get by with just using an inline table valued function to generate numbers. Your query would then look something like

SELECT
    T.id
,   GN.number 
FROM 
    dbo.MyTable AS T
    INNER JOIN
        -- Make this big enough to satisfy your theoretical ranges
        dbo.GenerateNumbers(1000000) AS GN
        ON GN.number BETWEEN T.start and T.end;

That would get used in a "straight" lookup without the need for any of the advanced features. The lookup is going to get very memory hungry though so make the query as tight as possible. For example, cast the GN.number from a bigint to an int in the source query if you know your values will fit in an int.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top