Why get a Lytro if you have a Kinect?

(Or, fun with Depthmaps and Rangemaps)

It’s been a while since my last post, so I’ll share some fun stuff. I’ve seen a lot about the Lytro camera in the last few months, and while clever, it occurred to me that everything the Lytro can do can be accomplished with a depth camera.  In fact, beyond refocusing shots after the fact, ANY lens can be simulated effectively if given a proper and accurate range map.  However, one does not even need a Lytro, or a Kinect, or a $200K military-grade LIDAR video camera to perform these feats.

If you have a depth or range map, such as what one can obtain with a Kinect camera, or a LIDAR video camera (if you can find one), there are some very interesting things you can do.

But let’s say you don’t have a fancy LIDAR camera, or the subject is not 15 feet away (which is the Kinect’s effective range). What can you do? Well, fancy math to estimate depth from focus in a single image works, and depth from motion and parallax from multiple images of a video sequence can help you get there. However, a lot of 2D-to-3D conversion post-production houses solve this problem by manual rotoscoping.
Take, for example the following 1920×1080 still image:

Some patient soul has to trace all of the edges of each player, their limbs, the ball, the field, in fact, ALL of the elements of the shot.  Most of the time, for each frame, many dozen patient souls from India, Hungary, or China will be recruited for the task.  Then, some other patient soul (usually in LA) has to assign some sort of distance, range, or depth value to all of these.  This is why 2D-to-3D conversion, if done WELL, is fabuously expensive.

However, if you have some clever math and image processing at your disposal (and a big GPU), instead of needing an articulate rotoscoping of the shot, a supervisior in LA can just write “rotopaint” scribbles on the shot, like so (original frame on the left, the rotopaint scribbles for the depth are on the right):

And the aforementioned clever math fills in the rest in a single step:

This did a pretty good job for the foreground objects, but does nothing for the field, ground plane, or far background.  A simple maximum merge with a ground plane fixes this:

Now that we have something at least approximating a decent range field there are many clever things one can do with it. There’s not a lot of dynamic range in the range field for each player — also called “cardboarding” in 2D-to-3D parlance.  There are fixes for this (the most patient reader is directed to read up on Geodesic Transforms). However, while this range map may not be suitable for 2D-to-3D conversion, it’s plenty useful for other purposes.

Now that we have an approximate range map, all sorts of things are possible.  As a simple example, one can amp up the saturation proportionally in areas that are more distant — this simluates “haze”:

You can also “re-focus” the shot, if you’re careful not to let the foreground objects blur and bleed into the background.  This is harder than it sounds.  A naive implementation of “just blur the image depending what the range value is” gives you this:

A more intelligent refocus gives you this:

…and this is basically what the Lytro camera allows you to do:  take a photo (or sometime in the future, a video) and refocus it.

Since we’re on the subject of refocus, this brings up the subject of simulating different lenses.  Can we not only simulate focus effects, but apperture and actual focal length in a picture AFTER it was taken?  The answer is yes, but there is a major problem to solve.  Without going into a furious hand-wavy whiteboard diagram with a lens and diagonal lines depicting focal point, aperture, and other things — suffice it to say that a shorter longer focal length lens will magnify objects in the foreground.  One can simulate this if there is a range map.  Below is a still image simulation of a lens’ focal length-coupled magnification:

As can be seen above, as objects are selectively magnified, holes are created (shown as black) and must be filled in.  Like when performing 2D-to-3D conversion, a good occlusion inpainting method is very handy here. Below, you can see an animation that shows the focal length being moved up and down where just such an inpainting method is being applied:

In principle, this means that if one has a good depth or range map, it is possible to simulate any type of lens — AFTER the shot is taken.  This applies just as much to video as still images.

The moral of the story — a good range or depth map has many more uses beyond refocus, or even the typical 2D-to-3D conversion applications — so I’d rather have a high resolution LIDAR camera than a plenoptic camera…


~ by opticalflow on May 14, 2012.

5 Responses to “Why get a Lytro if you have a Kinect?”

  1. hi opticalflow, thanks for an interesting read! does a shorter focal length really magnify objects in the foreground as you say? i would’ve thought it shrinks the foreground… cheers!

  2. also, you say you’d rather have a high-res LIDAR camera than a plenoptic camera – sure, i agree with that. but is it fair to say that (a) plenoptic cameras are a lot cheaper than LIDAR and (b) plenoptic cameras return a depth map just as LIDAR does, it just happens to be the case that the only application Lytro are currently marketing is after-shot refocusing (which i agree, is probably one of the less interesting things that you can do with a depth map). In that sense, plenoptic has the potential to be pretty good as a cheaper alternative to laser for getting depth maps? Cheers!

  3. Samiam — good observation — you are correct. As Willy Wonka would say — “scratch that, reverse it”.

    As to your points about LIDAR vs. Plenoptic, one can obtain a real-time flash LIDAR camera for about E350, which is perhaps modestly more expensive than a Lytro, it is not obscenely so. High-power scanning LIDAR systems ARE obscenely expensive, but in the last couple of years a great deal of progress has been made with flash LIDAR chips. Your second point about plenoptic cameras returning depthmaps CAN be true, but obtaining a depth or range field from plenoptic light-field is an ill-posed problem (like motion estimation in video). There will always be unknowns that have to be “filled in” in the depth map field due to ambiguities in the plenoptic field.

    That being said, LIDAR has its blind-spots as well. LIDAR won’t do a bit of good if the LIDAR camera is staring at a retroflector (think in terms of a stop sign — basically an array of retroflectors) — these tend to “blind” LIDAR cameras. Also, LIDAR video typically operates in the IR range. If the objects absorb IR, no return will be present and the LIDAR system will be “blind” in those areas of the image.

    Finally, I think plenoptic systems have better resolution capability than current flash LIDAR systems, but the complexity of solving plenoptic light-field equations for depth preclude this from happening in-camera. This is why Lytro does “cloud-based” processing.

  4. i watched a lecture that pointed out that the depth maps gotten out of a plenoptic camera loose their resolution as a result of the main lens blur aspect and the distance of the objects from the lens… i.e. you cant relly on it to give you a good z-depth map for compositing a new BG behind your talent.

  5. After one day of playing around with Lytro, this was exactly my first thought; this whole light field hype is basically non-stereoscopic z-cam which could be emulated with any Z-cam, such as a Kinect or even with a Leap. And here we go! However, still one advantage of Lytro, unlike the Kinect, it’s mobile and can fit into your pocket.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: