Encode WebCam frames with H.264 on .NET

https://stackoverflow.com/questions/9549203

04-12-2019
|

Question

What i want to do is the following procedure:

Get a frame from the Webcam.
Encode it with an H264 encoder.
Create a packet with that frame with my own "protocol" to send it via UDP.
Receive it and decode it...

It would be a live streaming.

Well i just need help with the Second step. Im retrieving camera images with AForge Framework.

I dont want to write frames to files and then decode them, that would be very slow i guess.

I would like to handle encoded frames in memory and then create the packets to be sent.

I need to use an open source encoder. Already tryed with x264 following this example

How does one encode a series of images into H264 using the x264 C API?

but seems it only works on Linux, or at least thats what i thought after i saw like 50 errors when trying to compile the example with visual c++ 2010.

I have to make clear that i already did a lot of research (1 week reading) before writing this but couldnt find a (simple) way to do it.

I know there is the RTMP protocol, but the video stream will always be seen by one peroson at a(/the?) time and RTMP is more oriented to stream to many people. Also i already streamed with an adobe flash application i made but was too laggy ¬¬.

Also would like you to give me an advice about if its ok to send frames one by one or if it would be better to send more of them within each packet.

I hope that at least someone could point me on(/at?) the right direction.

My english is not good maybe blah blah apologies. :P

PS: doesnt has to be in .NET, it can be in any language as long as it works on Windows.

Many many many many thanks in advance.

Solution

You could try your approach using Microsoft's DirectShow technology. There is an opensource x264 wrapper available for download at Monogram.

If you download the filter, you need to register it with the OS using regsvr32. I would suggest doing some quick testing to find out if this approach is feasible, use the GraphEdit tool to connect your webcam to the encoder and have a look at the configuration options.

Also would like you to give me an advice about if its ok to send frames one by one or if it would be better to send more of them within each packet.

This really depends on the required latency: the more frames you package, the less header overhead, but the more latency since you have to wait for multiple frames to be encoded before you can send them. For live streaming the latency should be kept to a minimum and the typical protocols used are RTP/UDP. This implies that your maximum packet size is limited to the MTU of the network often requiring IDR frames to be fragmented and sent in multiple packets.

My advice would be to not worry about sending more frames in one packet until/unless you have a reason to. This is more often necessary with audio streaming since the header size (e.g. IP + UDP + RTP) is considered big in relation to the audio payload.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow