Thanks to "indeterminately sequenced" and "oakad", the above makes sense. Just to conclude and ensure the understanding, I ran below tests, that proved your thoughts:
In task thread, removed cout code and added code to write the calculated factorial in a new file with arbitrary name. By this the shared resource "displaying to computer screen" is omitted. This resulted in T1 around 10 times less than T2. Did a quick test to write to same file after acquiring lock to it, this resulted in T1>T2, due to share resource.
Removed all the cout and the file i/o, instead of calculating factorial, ran a blind loop a million times in each task thread. This resulted in T1 far less than T2. Due to that fact that there is no shared resource in all the 8 threads.
In case you find me mistaken or need more clarification, please feel free to add.