Yep, thrift is used.
it is impossible to do bulk loading with good performance via CQL
Not really true, its just that the functionality has already been implemented and there's no reason for re-implementing it in CQL because thrift wont be getting dropped (allows for good backwards compatibility)
To sum it up, the sstables are read in, a thrift client is created for streaming said data and then a LoaderFuture task is created to coordinate the streaming.