It is to detect half open connections: http://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
The other side could be sending you data, but unable to get your data. So being able of interleave pings and pongs, it is possible to check that at least the other end can understand your messages and reply to them.
It does not make it much more complex. You have to read delimited frames anyway, when you find a control frame, take action and continue reading more frames.
http://www.whatwg.org/specs/web-apps/current-work/multipage/network.html#ping-and-pong-frames
.3.4 Ping and Pong frames
The WebSocket protocol specification defines Ping and Pong frames that can be used for keep-alive, heart-beats, network status probing, latency instrumentation, and so forth. These are not currently exposed in the API.
User agents may send ping and unsolicited pong frames as desired, for example in an attempt to maintain local network NAT mappings, to detect failed connections, or to display latency metrics to the user. User agents must not use pings or unsolicited pongs to aid the server; it is assumed that servers will solicit pongs whenever appropriate for the server's needs.