Note that the 16 byte data type is not a "trivial 16 byte data type", it states that:
The non-trivial data type is made of two longs and has very stupid assignment operator and copy constructor that just does some maths (totally meaningless but costly). One may argue that is not a common copy constructor neither a common assignment operator and one will be right, however, the important point here is that it is costly operators which is enough for this benchmark.
So this has nothing to do with 16 bytes being a magic number that makes list go slow, but its about the operator overloading on the non-trivial type that makes it go slow.