The problem actually was not with x264. I assumed x264 is fine and checked every other single piece of pipeline. The problem was - I used to send NAL singly over the network to avcodec decoder - which is exactly the thing the decoder can't handle (explanation). Took me a while to figure it out.
Once I have recomposed NAL units into original groups, deriving from same frames, the problem disappeared. This also explains why the problem was more easily reproduced with a combination of small NALs and a lot of movement - it produced a lot of single NALs, which avcodec was unable to decode properly.