The most common simple handshake for a single clock domain would be the valid/ready handshake.
If 'X' is sending to 'Y', then X has outputs of data
and valid
, and Y has an output of ready
.
When X has data to send, it asserts valid, and looks at ready. If valid and ready are high on posedge clock, then X will consider the data to be sent, and Y will consider the data to be received.
The advantage of this scheme is that you can send one data per clock cycle without any down time. If valid is still high the next cycle after valid/ready was high, then this is considered a second packet.
Also there is no requirement that Y waits to see valid before it asserts ready, Y can assert ready anytime it is available to receive a data.
The scheme you describe is what I would call a 'req/ack 4-phase handshake', in that it takes four clock cycles to send one data
1. req=1 ack=0
2. req=1 ack=1
3. req=0 ack=1
4. req=0 ack=0
This kind of interface would be better when doing an asynchronous request across a clock boundary, as it eliminates any possibility of interpreting a packet twice. But you don't need this for a fully synchronous interface.