The compiler would not be able to inline the virtual call because, yes, it would be unable to know which object d1 or d2 would be used, therefore allowing for two possible inline results. Additionally, as a virtual call, there may be additional overhead for vtable lookups.
My recommendation, if you want to try to optimize yourself, would be to instead write something similar to
if(__rdtsc() & 1 != 0){
for(unsigned long long i =0; i<9000000000; ++i){
sum += d1[0].process2();
}
}
else{
for(unsigned long long i =0; i<9000000000; ++i){
sum += d2[0].process2();
}
}
though this still may be unable to optimize if process2 is a virtual call, and there is always a chance that inlining will not occur.
All in all, virtual calls always add overhead, and if clock cycles are important, it might do well to avoid. You may look into Static Polymorphism which loses some flexibility but can transfer costs from runtime to compile time.
Edit in response to user997112 below:
Static Polymorphism doesn't work for exactly the situation described above, but could be used to simplify my example a bit, but putting the for loop in a function:
void iterate_a_bunch( Parent<Child> &f )
{
for(unsigned long long i =0; i<9000000000; ++i){
f.process2();
}
}
This function would compile twice, once for Child1 and once for Child2, leading to larger code size, but potentially boosted runtimes.