You can always transform the data any way you want using a .NET client application. I just loaded data at 430GB/hour with a trivial setup:
I just spun up 8 .NET threads and made them stream data using SqlBulkCopy
in 10m row batches. Each thread inserted into its own heap table. This is kind of the simplest possible setup. This ran on my 4*2 Core i7 desktop with SQL Server on the same machine. SQL had about 50% CPU usage and my app had the other 50%. So the throughput could easily be doubled by using two machines and a fast network.
This allows you to insert the final data directly into a target table (which ideally is partitioned so you can load into separate partitions).