Question

A while ago I came across an interesting array of video presentations on a German company's website. They deal with modifying a video stream while it's playing, and I remained pleasantly impressed by the accuracy and smoothness of this technique. Out of all these presentations, I considered one of them quite fascinating in terms of text blending within dynamic, playing video. It allows you to type in a string in a text box while the video is playing and embeds transformed variants of the text you wrote withing the video, with realistic accuracy. My question is if you would happen to know what kind of algorithm is required for such a feature, how could I programmatically embed real-time text and images in a video stream? Is there any research paper or library I should look into for details?

PS. Don't flame me for the contents of the video, it's the programming technique that I'm interested about, the video is the best example I could find.

Was it helpful?

Solution

It's called augmented reality, and there's numerous libraries and toolkits available for doing so, such as artoolkit http://www.hitl.washington.edu/artoolkit/

OTHER TIPS

To do this you would just need to intercede the frame before it is rendered.

Basically:

  1. Read Frame
  2. Modify frame
  3. Render

There really isn't an algorithm to do this.

Okay so I actually looked at your example. Since this is prerecorded video, they could have just hand traced the four corners of a box onto the target surface. Then to render, you just do a perspective transform of your text, to the target rectangle. To make it blend was probably just the art of choosing good colors, layering, color transforms and transparency. Nothing particularly magic here, just standard photoshop style graphics algorithms, most of which are probably just built into flash.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top