RhinoETL - Join two tables as input, write to two tables on output
Question
I am writing an ETL job, in c#, using Rhino ETL
I have a database on ServerA. This has 2 tables:
(example)
tblOrder
- OrderID
- CustomerName
- CustomerEmailAddress
- Transferred
tblOrderLine
- OrderID
- ProductID
- ProductName
- Price
On ServerB, it has a an identical table (orders are transferred from Web, to our backend system)
Using RhinoETL, my InputCommandOperation currently looks like:
class ReadOrdersFromWebDB : InputCommandOperation
{
public ReadOrdersFromServerA(ConnectionStringSettings connectionStringSettings)
: base(connectionStringSettings) { }
protected override Row CreateRowFromReader(IDataReader reader)
{
return Row.FromReader(reader);
}
protected override void PrepareCommand(IDbCommand cmd)
{
cmd.CommandText = "SELECT TOP 10 * FROM tblOrders WHERE Transferred = 0";
}
}
Since there are no transforms to do at this stage, my OutputCommandOperation will look like this:
class WriteOrdersToServerB : OutputCommandOperation
{
protected override void PrepareCommand(IDbCommand cmd, Row row)
{
cmd.CommandText =
@"INSERT INTO etc...........";
}
}
What I want to do is modify this process, to also get the tblOrderLine detail from ServerA - if possible, without doings a seconds query on the db (join) I'm keen to avoid having a "Transferred" column on the tblOrderLine table, and would prefer modifying the InputCommand to include a join..
How does the insert operation work after having a Join in the InputCommand? Is this even possible?
Solution
My understanding is that you have 2 tables that you wish to upload from ServerA to ServerB and if possible, to merge the 2 tables from ServerA together and split them again into 2 tables in ServerB.
If the relation between tblOrder and tblOrderLine is 1 to many then simply forget about JOINing them. It will create redundant data not to mention that a SELECT TOP n will cause some items from tblOrderLine to be ignored. If the relation between the 2 tables is 1 to 1 then it's possible but I'm not sure it would be more efficient than querying the 2 tables individually.
You can avoid using a Transferred flag on tblOrderLine by saving the OrderIDs you extract from tblOrder into a list and then query tblOrderLine for those specific OrderIDs.
SELECT TOP 10 *
FROM tblOrder
WHERE Transferred = 0
Save the list of OrderID found in this data and query tblOrderLine with it.
SELECT *
FROM tblOrderLine
WHERE OrderID IN /* list of saved OrderID */