The compiler is just trying to follow the calling convention as specified by the System V Application Binary Interface AMD64 Architecture Processor Supplement, section 3.2.3 Parameter Passing.
The relevant points are:
We first define a number of classes to classify arguments. The
classes are corresponding to AMD64 register classes and defined as:
SSE The class consists of types that fit into a vector register.
SSEUP The class consists of types that fit into a vector register and can
be passed and returned in the upper bytes of it.
The size of each argument gets rounded up to eightbytes.
The basic types are assigned their natural classes:
Arguments of types float, double, _Decimal32, _Decimal64 and __m64 are
in class SSE.
The classification of aggregate (structures and arrays) and union types
works as follows:
If the size of the aggregate exceeds a single eightbyte, each is
classified separately.
Applying the above rules means that the x, y
and z, w
pairs of the embedded struct get separately classified as SSE
class, which in turn means they must be passed in two separate registers. The presence of the m
member in this case doesn't have any effect, you can even delete it.