How to get 3D Video from 2D Video

Over at Ubergizmo, they ran into JVC’s 2D to 3D video conversion demo at CEATEC 2008 and posted a few comments about it here:…

JVC's 2D Video to 3D Video Demo

JVC's 2D Video to 3D Video Demo

I was out at NAB2009 a couple of weeks ago, and had a chance to eyeball this for myself — it’s pretty impressive.  Now, 2D to pseudo-3DTV conversion has been around for a while, most notably from outfits like Razor3D and Dynamic Digital Depth.  The most egregious of these just use a video field- or frame-delay to offset the video itself an inch or two in front of or behind the monitor — for example, the flat, 2D video seems to “float” in front of the monitor.  However, this is not true 3D — the objects in the video have no depth in and of themselves.

Others of these technologies are nothing more than simply delaying the right eye view a frame or two behind the right eye, and hoping that things in the video move horizontally.  In effect, the motion creates stereo disparity. 

However, this approach is really lousy because if things move vertically, the viewer gets pretty cross-eyed, and the viewer’s visual system has a hard time fusing the images.  Any time you confuse the visual system this way, it can lead to a headache, or even in severe cases, motion sickness.  (Sea sickness is caused by one’s visual system sensing a constant horizon on the ocean, while your inner ear is reporting anything BUT constancy). 

More sophisticated approaches try and ferret only the horizontal motion and warp the right eye video frame objects in that direction.  This alleviates some of the “motion sickness” effect, but we’re still left with a big problem:  if the camera stops panning left-to-right, the depth stops being there as well — the video “flattens out”.  Also, if both the background and the foreground objects are moving or changing direction, the depth can actually flip (what was coming out of the monitor 5 inches is suddenly 10 inches behind it) — and then motion sickness is a problem again.

It’s certainly very difficult to build a solid, profitable business on products that have the primary effect of inducing motion sickness and headaches (roller coasters notwithstanding).

With the addition of some clever processing, some of these problems can be largely alleviated — in fact, the Dynamic Digital Depth company has licensed their 2D to pseudo-3D technology to some of the major consumer electronics concerns who are now producing “3D-Ready” TVs.

In the same vein of my previous rants about Super-resolution, my problems with some of these technologies is that while they’re clever, they’re still not the “real article”.   The only way to show true 3D is by shooting the scene with two cameras, or using some very heavy math to calculate what’s called the “depth map” by photogrammetric or other means.

I spoke with the JVC folks, and while they were understandably cagey, a few themes emerged from my conversation.  What seems to be interesting here is that they are using the motion vectors of the video (when it’s moving) plus some image segmentation of some sort to calculate a depth map, and using this to warp the video views for the left and right eyes.  This actually acquits itself well — the images look very nice in 3D, although in many segments of the video, I felt as if I was watching cardboard cutouts moving around in front of a flat background.  This is closer, but it’s pretty far from calculating an accurate depth map (and then synthesizing a right-eye view) from 2D video.  I would imagine this approach could have problems with animation, where many objects’ fine grade motions are hard to accurately detect.

For a glimpse of the future, the folks at Stanford University give us a view:

This is a pretty neat site, you can freely use their tool (for personal use), and they even provide source code.

This is using some pretty intensive math to get the scene geometry out of an image, using a statistical approach with what’s called a Markov Random Field model.  The one thing to know about Markov Random Field algorithms if nothing else, is that “real time” and “markov random field” rarely appear together , and only in academic papers, and only as a wish (or is achieved only for toy problems).

While this is a neat tool for images, it’s slow, and therefore of no help (at least for now) for converting 2D video to 3D video — we’re talking about converting 30, or even 60 frames per second at at least 720×480, or for HD at a resolution of 1920×1080.

However, this sort of mathematical approach will become feasible when one has a teraflop or two of computing power in your TV or cellphone.  Then again, the GPU’s in your PC’s video card are beginning to go beyond this benchmark!

To bring this all full circle, I have more than a casual interest in this sort of thing since my company is involved with similar bits of technology.  Actually, super-resolution as described in my previous articles and the extraction of depth from 2D video is a related mathematical problem.

Much of the problem of the 2D to 3D technologies being shown (or even being shipped currently) is that they are all ad-hoc ways of trying to use incomplete information to fill in pixel information — much like scaling.  Many of the same mathematical formalisms and techniques used by the military to upscale satellite photos and surveillance video can be used on this problem as well…

…if only they could run in real-time.

More on this subject, later.


~ by opticalflow on May 1, 2009.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: