Kinect — Insanely Interesting

When I was first at the IBM TJ Watson R&D lab in Hawthorne, NY in the mid-90’s, a team within our group was playing with what at the time were called “shape cameras“.  These were essentially, in retrospect, hyperexpensive early versions of the Kinect.  Then, tantalizingly, Minolta came up with another similar shape-camera using structured Infrared light around 1999 with an MSRP of $5000, which was utterly cheap by those days’ standards.  Unfortunately, they never went into anything resembling production, and it was effectively killed by a company who licensed the distribution of the camera in the US and then didn’t have the wherewithal (or the lack of paranoia) to sell it widely.  Today’s professional version is the Minolta Range5.

Among those people that follow this blog or at least find it occasionally interesting, I’m certain that 100.00 percent of you have seen the videos of the XBOX Kinect camera from Microsoft being hacked into purposes other than intended, and others like it. For those who have not been introduced to what the Kinect is doing, you can see this illustrated graphically when using a camcorder in “night-shot” mode in a dark room when the Kinect is “on”:

So, what’s really going in here is you have a very clever “structured-light” infrared shape camera. Permutations of this include depth cameras like “time-of-flight” and “flash-LIDAR” cameras, which are Much more expensive. These sorts of cameras are used for things like driving automation, critical measurement of military strike candidates (French for “measuring the height of the window the laser-guided bomb will fly into”). The Kinect is doing its measurements by projecting a very cleverly arranged dot field, whose pattern is known and predictable. Then an IR camera records the image, and the relative spacing of the dot field gives an indication of the distance from the IR emitter. In the case of Kinect, it doesn’t have to be particularly accurate — NASA isn’t using this to pick up rocks on the surface of Mars from 80 million miles away, nor is the Air Force using it for precision bombing strike selection. Kinect is used to track body movements, indoors, with limited range typical of a living room — something that doesn’t require especially precise range-to-target measurements.

Nevertheless, there are many interesting things one can use a shape camera for, even if range-to-target isn’t particularly well-calibrated. One of the first public early efforts was this:

One problem with using any of these is that Kinect does not provide what is technically called a “dense depth map” — there are lots of holes (black) where the Kinect IR sensor can’t “see”, whether due to lighting, objects being out of range, reflection, transparency, occlusion, and objects absorbing and not reflecting Infrared (which is how Kinect sees depth).  It also won’t work too well in broad-daylight, but it was never intended to.  Here’s my own example of a sparse depth map from the Kinect. Note the abundance of holes and occlusions (shown in white):

My company has been playing with this sort of thing predating Kinect by a fair bit, including using ToF (time-of-flight) cameras. We have a lot of technology to do GPU-based statistical temporal prediction, inpainting and hole-filling, in real-time — and it happens to work for Kinect, too. Here’s the previous depthmap with very rudimentary hole-filling and statistical prediction applied, and now it’s a dense depth map:

Once you have a dense, high-quality depthmap, you can render a high quality 3D video from it. You also need to be able to fill in disocclusions (a process known in the 3D film industry as “inpainting”), which we also do well, in real-time. So here is the same video in 3D using our multiview rendering engine. You will need anaglyph glasses to view it:

With a little of our magic fairy dust, the Kinect could become a truly workable real-time high-quality 3D camera, if only the Kinect RGB camera was higher quality. One interesting thing about the Kinect is that the RGB camera does not match the IR camera, so the depthmap has to be rectified to the RGB image. Even more interesting, our automatic rectification doesn’t care about the resolution or how far away the RGB camera is from the IR — in fact, the capture could come from different cameras. Like, let’s say hypothecially a 2K, or even 4K camera.

One thing is certain, people will find many, many interesting uses for Kinect and other depth cameras within a variety of industries. The Kinect is a great example of a disruptive permuation of an old technology because it is fostering an explosion of experimentation that was previously out-of-reach of many people.

Advertisements

~ by opticalflow on February 27, 2011.

One Response to “Kinect — Insanely Interesting”

  1. Sir: our engineers are very interested in your 2D-3D conversion software, however it is very difficult to get ahold of anyone from your company to discuss it. Is there a phone number or email address where you can be contacted? Please advise.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: