@chux makes a good call about initializing they value of Yt
before using it. You might do something like this:
for(k=0;k<4;++k){
for(i=0;i<8;++i){
for(j=0;j<8;++j){
double temp = 0.0;
for(l=0;l<8;++l){
temp += A[i][l]*Y8[l][j][k];
}
Yt[i][j][k] = temp;
}
}
}
This would allow the compiler to use a register for the accumulation, then access Yt
(with its three indexing operations) just once.