Just multiply pix_vals by the mask and sum. So just add:
p[y * stride + x] = pix_val[0] * Gx[0] + ... + pix_val[8] * Gx[8];
EDIT: Watch out for the corner cases though, you should really change your offsets to [-1, 0, 1] instead of [0, 1, 2] and handle the boundary conditions.