Yes, of course you can use TCP. UDP is often used in telephony applications as the lower overhead makes everything faster, and for this application it doesn't matter if packets are dropped or arrive in the wrong order. But since UDP isn't an option, you can use TCP.
As you have suspected, it seems to me that your problem is buffer underruns. Either your connection to the server is not fast enough (or at least consistently fast enough), or you are not providing data from the encoder at a fast enough rate. This can happen if you are recording data in real time, and trying to play it back in real time.
A solution is to buffer data server-side before sending it to the client. Have as large of buffer as your latency requirements allow. For internet radio purposes, I usually pick a 30-second buffer, as latency doesn't matter. For your purposes, you will probably want a buffer of at least 64KB. This is the maximum size allowed in a TCP packet. This packet will get fragmented along the way, but that is okay.
You might also look into how your server is sending data. Experiment with disabling the Nagle algorithm so that your server isn't waiting for ACKs before sending more data.