I would break this up into 3 loops:
float t1[9];
float t2[9];
for (i = 0; i < 9; ++i) // (1) - gather input terms
t1[i] = x1(a1+a0*i);
for (i = 0; i < 9; ++i) // (2) - do expensive log/fabs operations
t2[i] = std::log(fabs(t1[i])); // with minimum redundancy
for (i = 1; i < 9; ++i) // (3) - wrap it all up
b2 += a0*0.5f*(t2[i-1] + t2[i]);
I suspect that (1) may not be vectorizable (unless you have AVX2 with gathered loads), but (2) and (3) have a reasonable chance.