Answered by Evgueni Petrov in the Intel forums:
__m512i a1 = (__m512i)_mm512_mask_blend_pd(0x33, B, _mm512_swizzle_pd(A, _MM_SWIZ_REG_BADC));
__m512i a0 = (__m512i)_mm512_mask_blend_pd(0xcc, A, _mm512_swizzle_pd(B, _MM_SWIZ_REG_BADC));
__m512i a3 = (__m512i)_mm512_mask_blend_pd(0x33, D, _mm512_swizzle_pd(C, _MM_SWIZ_REG_BADC));
__m512i a2 = (__m512i)_mm512_mask_blend_pd(0xcc, C, _mm512_swizzle_pd(D, _MM_SWIZ_REG_BADC));
__m512d C_new = (__m512d)_mm512_mask_alignr_epi32(a2, 0x00ff, a0, a0, 8);
__m512d A_new = (__m512d)_mm512_mask_alignr_epi32(a0, 0xff00, a2, a2, 8);
__m512d D_new = (__m512d)_mm512_mask_alignr_epi32(a3, 0x00ff, a1, a1, 8);
__m512d B_new = (__m512d)_mm512_mask_alignr_epi32(a1, 0xff00, a3, a3, 8);
As of this writing, the _mm512_mask_blend_pd() intrinsic isn't mentioned in the Intel C++ User Guide but should be corrected soon. It is present in the "zmmintrin.h" header file.