RhinoETL - Join two tables as input, write to two tables on output

https://stackoverflow.com/questions/4442210

10-10-2019
|

Question

I am writing an ETL job, in c#, using Rhino ETL

I have a database on ServerA. This has 2 tables:

(example)

tblOrder

OrderID
CustomerName
CustomerEmailAddress
Transferred

tblOrderLine

OrderID
ProductID
ProductName
Price

On ServerB, it has a an identical table (orders are transferred from Web, to our backend system)

Using RhinoETL, my InputCommandOperation currently looks like:

class ReadOrdersFromWebDB : InputCommandOperation
{
    public ReadOrdersFromServerA(ConnectionStringSettings connectionStringSettings)
        : base(connectionStringSettings) { }

    protected override Row CreateRowFromReader(IDataReader reader)
    {
        return Row.FromReader(reader);
    }

    protected override void PrepareCommand(IDbCommand cmd)
    {
        cmd.CommandText = "SELECT TOP 10 * FROM tblOrders WHERE Transferred = 0";
    }
}

Since there are no transforms to do at this stage, my OutputCommandOperation will look like this:

class WriteOrdersToServerB : OutputCommandOperation
{
    protected override void PrepareCommand(IDbCommand cmd, Row row)
    {
        cmd.CommandText =
@"INSERT INTO etc...........";
    }
}

What I want to do is modify this process, to also get the tblOrderLine detail from ServerA - if possible, without doings a seconds query on the db (join) I'm keen to avoid having a "Transferred" column on the tblOrderLine table, and would prefer modifying the InputCommand to include a join..

How does the insert operation work after having a Join in the InputCommand? Is this even possible?

Solution

My understanding is that you have 2 tables that you wish to upload from ServerA to ServerB and if possible, to merge the 2 tables from ServerA together and split them again into 2 tables in ServerB.

If the relation between tblOrder and tblOrderLine is 1 to many then simply forget about JOINing them. It will create redundant data not to mention that a SELECT TOP n will cause some items from tblOrderLine to be ignored. If the relation between the 2 tables is 1 to 1 then it's possible but I'm not sure it would be more efficient than querying the 2 tables individually.

You can avoid using a Transferred flag on tblOrderLine by saving the OrderIDs you extract from tblOrder into a list and then query tblOrderLine for those specific OrderIDs.

SELECT TOP 10 * 
FROM tblOrder
WHERE Transferred = 0

Save the list of OrderID found in this data and query tblOrderLine with it.

SELECT *
FROM tblOrderLine
WHERE OrderID IN /* list of saved OrderID */

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow