Correcting 360 Degree Stereo Video Capture (Part 1 of 3)

image1

ScorpionCam: Why Dust Off Something From 2010?

There is an incredible back-shelf of technologies at my company.  ScorpionCam was the codename of an internal R&D project we conducted from 2010 into mid-2011, when we were heavily involved in stereoscopic 3D.  I believe it has particular relevance today with the keen interest in multi-camera systems for spherical, stereoscopic live video capture to feed the new generation of VR devices such as Oculus Rift and Samsung Gear VR.  We’ve had this in our back pocket for a long time; I hope that this inspires people to understand that many problems associated with real-time stereoscopic spherical capture are in fact, tractable, and solvable – they were solved in 2011!

Facebook’s Open Source Camera and the Nokia OZO

With all of the current interest in Virtual Reality and Augmented Reality, 360-degree video has enjoyed something of a renaissance after a false start in the mid-to-late 1990s.  The availability of cheap(er) VR headsets and a semblance of the beginning of standardization for viewing has also re-sparked interest in stereoscopic 360-degree video, something that was previously confined to the realm of ultra-high-end CGI, games, and graphics because of the lack of a viable, capable, live-action capture system.

Facebook has had a keen interest in spherical stereo video for some time, most notably registered with their acquisition of Oculus Rift, and most recently, their announcement of an open source camera hardware design fit-for-purpose of shooting 360-degree, spherical, stereoscopic video at a high enough resolution so as to be actually viable for media and entertainment purposes.

There are, however, several technical challenges to be overcome which is probably why Nokia’s OZO camera and instances of Facebook’s Surround 360 camera are yet to be seen in the wild.  The primary one is that while capturing simple two-eye stereoscopic video is not a challenge, doing so with spherical video runs into problems of physics, namely placing several fisheye camera pairs pointing in several different directions isn’t very practical (or impossible due to physical constraints of the camers and/or lenses). Perhaps we can help fix that!

The ScorpionCam Project

Why the name ScorpionCam? Scorpions have between 4 and 7 eyes, depending on the species.  The long-range goal of this project was to create camera systems that fuse and combine the data from many different sensors and sensor-types to achieve a robust, multi-environment “depth” and “range” camera, including 360-degree stereoscopic video. It was anticipated that the minimum number of “eyes” for such a system would be 4, and possibly more.  For spherical, 360-degree stereoscopic capture, naïve approaches such as simply arranging two (or more) ultra-wide angle lenses next to each other for stereo 3D are not workable because the left camera lens obscures a significant portion of the right camera lens field of view and vice-versa:

image2

Camera rig with two Nikkor 6MM/f2.8 FL220 degree FOV lenses

Imagine two of these next to each other.  Two basketball-sized lenses mounted side by side doesn’t help anything for stereoscopic spherical video.  Yes, the lens can actually see “behind itself”, but it (they, all 6 of them in existence) are not practical for the purposes at hand.

Other Fundamental Problems

Multiple sensors, rolling shutter, stitching… and what about 3D?

The Facebook Surround 360 system appears to have solved most of these.  How?  By arranging several monocular cameras in a radial array, and then generating “virtual camera” views for both eyes in a continuous fashion.  Doing so is not easy, however.  There’s a lot of overlap between this and what we did with ScorpionCam.

image3

The Facebook Surround 360 prototype

 

Rolling Shutter and Genlock: Without rehashing their entire announcement and specs, they appear to be using very solid Point Grey industrial cameras, which have very precisely known and calibrated rolling shutter characteristics.  They’re not the cheapest option, but if you need to account for rolling shutter and genlock synchronization of multiple cameras, it’s hard to find a better choice.

Stitching for multiple cameras is a well-worn issue (I cut my teeth on that particular problem at IBM Research back in the mid-1990s), but having multiple cameras with industrially-repeatable optical and sensor characteristics is a good help.

3D: But what about 3D?  You need two cameras to capture 3D, a left eye and a right eye, ostensibly.  All of the Surround 360’s cameras are pointing in different directions… how does one correct for that?  There’s a lot of missing pixel information! How do you fill it in?

Next I’ll share snippets of our 2011 research report, where we explore the successful use of un-genlocked, non-focus-matched, non-focal-length matched, rolling-shutter CMOS cameras and lenses to generate stereoscopic virtual camera views.

 

ScorpionCam Project – July 2011

Technology Gap

There does not presently exist a camera system that can capture both depth information and color information in real-time, at HD or cinema resolutions (1080 lines up to 3840 lines).  Various solutions have been tried by others by combining HD cameras and separate low resolution depth camera systems or methods, but none produce robust enough results to have any utility in the television or theatrical production markets.

A key problem with a multisensor approach is that, to-date, all existing attempts at this process suffer from two or more of the following problems: 1) slow speed, 2) very low quality and/or 3) objectionable artifacts in the final product of the camera system: namely, the images.  This is typically due to the mismatch between the primary imagery camera (say, an Arri Alexa at 2K @ 30fps) and the depth camera systems (typically, no more than 176×144@10fps).

Additionally, all present depth or range camera systems suffer from significant “blind-spots” or deal-breaking omissions, such as being inoperable in broad daylight, very limited range, or incomplete data return (vast holes in the data).

Advantages

We can take low-resolution depthmaps and match them to a higher-resolution center camera, without loss of dynamic range or edge detail, with zero artifacts.  This is our Depthmap Superresolution Process.

Depthmap Superresolution

Our depthmap superresolution process is unique to us, and it is believed that our software processes can turn into a fully realized, robust, and market-ready camera system months or perhaps even years ahead of the fastest-following, well-funded competition.  Essentially the very-low-resolution depthmaps are upscaled (via our superresolution processes) to match the high resolution of the color imagery taken by a central or primary camera.  Additionally, via our motion tracking and optical flow, there is no need for genlock or even to match disparate framefrates between the low resolution depth camera systems and the primary camera.

Data Fusion from Multiple Sensor Systems

Our approach further envisions using multiple depth-capture methods simultaneously, whereby one depth camera system’s “blind-spot” may be overlapped by another’s good capture data.  This approach follows closely to the “data fusion” approach for any superresolution system, and allows us a key inherent flexibility and advantage to exploit  any new and upcoming depth capture methods, cameras, and sensors.

Depthmap Occlusion Inpainting

Even in total combination, it is possible that such a combined camera system will still leave “holes” or blind spots in the range or depth map.  Our extensive research and product deployment of “hole-filling” and “inpainting” technologies fill in the gaps that even the multi-sensor approach cannot address alone.

Depth Image Based Rendering System

Once a sound depthmap is computed, one more problem remains: rendering a “virtual camera” view where the objects in images are shifted to create parallax.  Our DIBR system uses a “scatter-based” approach, which ensures that we can identify and utilize occlusion information to reliably “in-paint” or predict the contents of revealed background pixels without creating artifacts.

Fine Tuned, Real-Time GPU Pipeline

Furthermore, by leveraging our extensive investment in our GPU processing pipeline and already shipping, successful processes and products, real-time full-frame-rate live performance of all of the above gaps are within reach.

 

Reason for the R&D Project

The project was inspired by our success in obtaining high-quality depthmaps from stereo camera systems.  Our depth-from-stereo process was put through several bakeoffs that proved our ability to correct badly shot 3D footage that no other technology in the market could correct.  These included Occula (from Foundry), Mistika (from SGO), and Pablo (from Quantel).  Notably, our correction technology is largely automated, unlike other high-touch tools. The shots we processed and corrected and were aired in commercial 3D broadcast programs as well as Hollywood features.

First, we start with a stereoscopic input (in this case 1080P SbS):

image4

Then, our optical flow system is used, twice, to calculate bi-directional disparity, shown here in this diagnostic:

image5

Green shows right-to-left displacement , and red opposite.  The non-green/red colors are indicators – these show there are some regions where there are “aperture problems” or noise problems – areas where good disparity cannot be estimated.  A Blue component would indicate vertical disparity – there is very little here.  We apply a little of our magic processing to make the disparity map statistically sound (again, a diagnostic visualization):

image6

Whatever problems that remain, are taken care of in the next phase where we model the depth.  This is the final computed depthmap.  Of note here is that this is a 100% dense, complete, and accurate depthmap with only minor issues that will not materially affect our final render of a virtual-camera stereoscopic image:

image7

So, in an ideal case, this is how our depth-from-disparity system works for well-aligned stereo images.  Given a good depthmap such as this, a nearly-flawless “virtual camera” synthesis is assured via our DIBR (depth image based rendering) system that automatically fills in occlusion holes.  Not all stereo images are well-aligned, however.

Inspired by this success, we decided to explore if a completely unaligned set of witness cameras could provide enough information to give us a high-resolution depthmap when used in conjunction with a high-resolution central camera.

History of the R&D Project:  “ScorpionCam” Camera Tests

In the first phase of our camera efforts we experimented primarily with three-camera rigs (“Tricam”).  This included data taken at the October 2010 Alice in Chains concert, Halloween 2010 Dallas vs Jacksonville game at Cowboys Stadium, and the Janurary 2011 Nets vs. Mavericks and Nets vs Pistons games at Prudential Center in Newark.  We began with some tests at Red Digital Cinema’s Orange County facility in September, 2010, where we verified the approach would work, by using three identical Red One MX cameras with 40mm prime lenses mounted on a custom machined slidebar.

We will continue the series on our work with stereoscopic cameras and its contribution to stereoscopic 360 degree video capture in Part  2.  In the next part of the series we cover Steps 1 and 2 out of 6:

  • Depth From Disparity Using Matched Witness and Center Cameras and Lenses
  • Depth From Disparity Using MISMATCHED Witness and Center Cameras and Lenses
Advertisements

~ by opticalflow on May 11, 2016.

3 Responses to “Correcting 360 Degree Stereo Video Capture (Part 1 of 3)”

  1. Hello,

    My name is Rafael Lino I am RND on a VR company in London. We’d love to chat about how we are making stereo 360 video right now and perhaps we can cook up a few ideas together. My email is rafael@visualise.com

    You have very little information about yourself and your company in this site (like contacts!) so hopefully you will get this

  2. Hi Rafael. I’ve actually just changed my “About” page to have recent contact information.

    Cheers

  3. […] a previous post, I described in a 3 part series the research project that we did a few years ago (2011) to capture stereoscopic 3D video at […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: