
Does anyone know how to implement voice/video over IP in a webapplication using HTML5 websockets?

It would be nice if I could implement this with PHP or Python since I (unfortunately) don't know any other programming language at the moment.

A good tutorial will do, as well as an already-build-solution which I have to pay for.

Added video because it's not only audio/voip related.

First HTML5 Video Conference App is already created. See my own answer

Was it helpful?


If you want to go with HTML5 only, you will need a browser implementing the HTML Media Capture draft (available here) in order to access the raw data from the microphone.

Once you have this data in hand, you need to send it over the network. Websockets would be the HTML5 option to have fast enough round trips with the server (sending local audio data and receiving remote audio data at the same time)

Since you mention python, I would recommend looking around the twisted implementation of websockets.

You can have all your clients "register" on the websocket server with a callerID, so the server knows where to find a given callerID.

Then your server will need an "invite" API where caller1 "invites" caller2.

Once the call is setup and each client starts sending its audio data, the server will be able to send this audio data to the other party.

Upon receiving audio data, the browser will need to play this audio data on the speakers, probably using the HTML5 audiotag.

To do this, you may be forced to use a "trick" : instead of having the websocket server forward the raw audio data to the client, you may need to simulate 2 "infinite" files :

  1. caller1.wav : sound captured on caller1 mic
  2. caller2.wav : sound captured on caller2 mic

caller1 browser would add caller2.wav in the audio.src attribute once the call is setup (caller1 would be informed of this event via websocket) and hopefully if the python server appends the raw audio data to the caller2.wav as it receives it, it would start playing.

This sounds like a cool prototype you're going to hack up !

Good luck on your journey,

Jerome Wagner


Seems like Ericsson created the first HTML5 Video Conference App.

The technique they used:

  • Implemented the device element and the Stream API (device element GUI is currently written in JavaScript/CSS)
  • Added MediaStreamManager to map Stream URLs to the corresponding pipeline in the media backend
  • Added MediaStreamTransceiver to control the related media processing and transport
  • Added support for binary data in the WebSocket protocol


Video on YouTube: Beyond HTML5: Conversational Voice and Video demo | Ericsson Labs

Unfortunately Ericsson doesn't want to share device_dialog.js (yet).

WebRTC might be an answer: (currently only Chrome Canary with MediaStream flag enabled)

See demo: (make sure you watch in a proper browser) and code

The reason I'm writing is... I got really cheap Android tablet and cannot intall Skype nor Vtok nor Google Voice is available outside the US. I need to find HTML5 based solution as I'm able to run Opera Mobile 12 and got working properly

@work/gotta be quick

Check out the javaScript getUserMedia(CanIUse) - API (W3)

webrtc is the answer now.

for node.js stack - you can look at . Note that IE has not yet built support for the APIs that make webrtc work.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top