Frage

I'm writing a compiler that's generating LLVM IR instructions. I'm working extensively with vectors.

I would like to be able to sum all the elements in a vector. Right now I'm just extracting each element individually and adding them up manually, but it strikes me that this is precisely the sort of thing that the hardware should be able to help with (as it sounds like a pretty common operation). But there doesn't seem to be an intrinsic to do it.

What's the best way to do this? I'm using LLVM 3.2.

War es hilfreich?

Lösung

First of all, even without using intrinsics, you can generate log(n) vector additions (with n being vector length) instead of n scalar additions, here's an example with vector size 8:

define i32 @sum(<8 x i32> %a) {
  %v1 = shufflevector <8 x i32> %a, <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
  %v2 = shufflevector <8 x i32> %a, <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %sum1 = add <4 x i32> %v1, %v2
  %v3 = shufflevector <4 x i32> %sum1, <4 x i32> undef, <2 x i32> <i32 0, i32 1>
  %v4 = shufflevector <4 x i32> %sum1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
  %sum2 = add <2 x i32> %v3, %v4
  %v5 = extractelement <2 x i32> %sum2, i32 0
  %v6 = extractelement <2 x i32> %sum2, i32 1
  %sum3 = add i32 %v5, %v6
  ret i32 %sum3
}

If your target has support for these vector additions then it seems highly likely the above will be lowered to use those instructions, giving you performance.

Regarding intrinsics, there are no target-independent intrinsics to handle this. If you're compiling to x86, though, you do have access to the hadd instrinsics (e.g. llvm.x86.int_x86_ssse3_phadd_sw_128 to add two <4 x i32> vectors together). You'll still have to do something similar to the above, only the add instructions could be replaced.

For more information about this you can search for "horizontal sum" or "horizontal vector sum"; for instance, here are some relevant stackoverflow questions for a horizontal sum on x86:

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top