Imagine a video conferencing web-app, which users A and B originally access from some webserver. Suppose that web app supports presence, so the web server knows who's currently on-line. Imahine the UI allows A to try and place a video call to B. Via say XMLHttpRequest(), A's browser informs the server this is wanted, and B's javascript pops up something saying that A wants to call B. No WebRTC has happened at all yet. But at this stage, A can indirecttly communicated with B by sending messages using e.g. XMLHttpeRequest. In WebRTC parlance, this is the "signalling channel". So, A and B can both interact with their ICE agents to discover candidate addresses, and SDP descriptions, and send these to each ot6her, via the server, over this signallinh channel. E.g. the web app on A calls a WebRTC API to get its ICE candidates, and packages these up as it sees fit, to send to B. B's reader receives this message from the server (e.g over a WebSocket or long poll) and hyence it can unpack this, and format as needed to send to the ICE agent on B, using the RTCPeerConnection object. Similalrly, SDP offer/answer can be sent betweent he two apps, and passe through into the ICE agnet in the browsers, to get agreed media formats etc. At that stage, media connections can get set uo by the browser (meida streams are added to the RTCPeerConnection initially (which aren't communicating, but whihc have attributes that can be queried to describe the codec etc, and when the API is asked to create an SDP description, it does that using these attributes, but adjust the IP address and port based on how the ICE agent on each local browser has figured out what addresses can reach that local browser / port (NAT traversal).