Correcting 360 Degree Stereo Video Capture (Part 3 of 3)

We continue the series on our work with stereoscopic cameras and its contribution to stereoscopic 360 degree video capture in Part  2.  In this last part of the series we cover Steps 3 through 6 and conclude:

  • Depth From Disparity Using HANDHELD Witness and Center Cameras and Lenses, multiple camera positions
  • Baseline Camera Interop Tests
  • Witness Cameras at Slower Framerates Than The Primary Camera
  • Final step: Mixed Models Including Motion Depth Plus Stereo Depth
  • Interesting Observations and Highlights


Step 3: Depth From Disparity Using HANDHELD Witness and Center Cameras and Lenses, multiple camera positions

During the Nets vs. Pistons and Nets vs. Mavericks games in January 2011, we ran three different camera positions, and two camera systems.  The Red cameras were provided courtesy of OffHollywood (Mark Peterson). The first was a three-Red One MX system as used in Dallas, with a variety of camera positions.  The second was a handheld position, where two Panasonic POV cameras were mounted on a live, tethered shoulder-mount Sony broadcast HD camera.  It’s important to note that this was an actual camera position used in the broadcast.  The witness camera footage was recorded on SD cards for later processing.  No cameramen nor broadcasts were harmed in the process of our taking test footage.

Here are some samples from the shoot, showing center take on left, depthmap on right (RGB+Z).  These are not color corrected and represent the raw color data:


We were happy enough with this footage to include a color corrected version of it in our 2011  technical reel.

Step 4: Baseline Camera Interop Tests

As a result of these outings, we recognized the need to take calibrated data at a well-equipped  third party location where various camera systems could be readily acquired on-hand.  The Camera House (Rufus Burnham) was selected and we have performed controlled tests with a variety of camera systems, both for the witness (side-car) cameras and for the center primary camera, namely:

  • Red One
  • Red Epic
  • Arri Alexa
  • Weisscam
  • Sony F23
  • GoPro
  • Panasonic AG-HCK10G

Additionally, a variety of Prime and Zoom lens combinations with the above were tested.  Most of this was necessary but mundane work, and involved shooting charts, and plain but carefully constructed sets.  The idea here was to construct a more precise model and understanding of the limitations of our existing “ScorpionCam” idea, and to begin to incorporate and test some new features and research work including:

  • The use of witness cameras at slower framerates and lower resolutions than the primary camera
  • The use of LIDAR or Time-Of-Flight camera to augment the process
  • The use of mixed models including motion depth plus stereo depth

We have successfully accomplished all of the above goals to-date.

Step 5: Witness Cameras at Slower Framerates Than The Primary Camera

For these tests, we primarily ran either the Weisscam at 2000fps along with Panasonic witness cameras at 60 fps, or the Red One at 120 fps with the other cameras at 30 fps.  For example, in our Pillowfight shoot, the primary camera was at 120fps, witnesses at 30fps.  This also incorporated newly developed depth hole-filling methods (note the letterbox bars at bottom actually have depth values assigned to them), and several improvements to the robustness of our stereo disparity measurement system:


Step 6: LIDAR or Time-Of-Flight camera to augment the process

LIDAR cameras are quite expensive, but thankfully, Microsoft released the Xbox Kinect camera contemporaneously to when we were sourcing LIDAR cameras.  Therefore, to prove out our data fusion approach, and to prove we could inpaint the depth holes inherent to such range camera systems, we opted to simply use a hacked Kinect camera to prove out our theory before we committed a great deal more.

To take advantage of this situation, we underwent a staged process of development. First, we proved out our ability to handle the large holes inherent to typical depth and range mapping:

Before (raw Kinect data, with our zoom tracker).  Holes and missing data are in white:


After  our depth inpainting and occlusion-handling, all depth values are filled in reliably, in spite of massive holes due to the reflectivity of the computer screen, the skylight, and the IR absorption of the black filing cabinets in the background:


Next, with help from the Engineering Team, we proved we can capture Kinect via USB (640×480 @ variable frame rate from 8-30 fps) simultaneously to an Arri Alexa running 1080P@24fps via HD-SDI, with our image processing system providing “virtual genlock”.  The raw footage consisted of YUV data from the Alexa, and the RGB+Depth data from the Kinect.  This is a significant addition in that we’re the only vendor known who is able to show conjoined, multi-bus video capture at multiple frame rates.

Final step: Mixed Models Including Motion Depth Plus Stereo Depth

The final examples incorporate everything in a unified framework.  This is footage taken with a Weisscam at 1080P2000fps.  The witness cameras were used to obtain depth-from-disparity, and were running at 720P60fps.  Our opticalflow was used to match framerate, and our depthmapping process was augmented by our opticalflow to drive depth-from-motion-and-structure from the individual tiny droplets of water, which were moving too fast for the witness cameras to register:


(Note: You can see full motion renders of RGB+depth, and full stereoscopic renders of the above Here)

In another example, we used a Weisscam running 1080p1000fps, with a similar strategy, using both depth-from-disparity and optical flow again for depth-from-motion.  Note that handling transparency (such as the water stream here) is extremely problematic for all existing systems, but we managed it just fine:


Highlights and Observations of R&D ScorpionCam Results To-Date

  • Between two choices: a) a sloppily aligned set of witness cameras, and b) a well-aligned, fixed, but lower resolution set of witness cameras, b) works better with our system
  • Between two choices: a) a sloppily aligned set of witness cameras at high or matched framerate, and b) a well-aligned, fixed, but lower framerate set of witness cameras, b) works better with our system
  • Unenhanced by additional sensor data, a ScorpionCam system works in most, but not all shooting situations and setups. Examples of ScorpionCam gaps include:
    • Camera operator error – if the witness cameras are set to have a much larger focal length (>5%) than the center camera, performance is hit-and-miss (e.g. not robust)
    • Camera operator error – if the witness cameras are set to have very different focal lengths from each other (>5%), in most cases the disparity is not resolvable
    • Camera operator error – if the witness cameras are set to very different focal points (e.g. one camera is out of focus relative to the other, >5%), performance is hit-and-miss (e.g. not robust), especially if the cameras are at differing exposures
    • Intercamera exposure differences can be accommodated in real-time
    • Intercamera misalignments can be accommodated in near-real time, up to a point (less than 10% horizontal or vertical) – it requires severe toe-in or wall-eyed alignment, or zonk-eyed (up-down) alignment to break our system
    • Near objects in concert with wide camera interaxial are not resolvable (e.g. 2 meter near-limit when interaxial is 20 cm)
    • There seems generally to be a limit of 10X focal length difference between center camera and witnesses with our current zoom tracking system (e.g. center camera at 250mm and witnesses at 18mm)
    • As we suspected, stereo disparity estimation completely fails for the center of objects with completely homogenous color and no texture (think of a clear blue sky)
  • Expecting on-set DITs to capture 3 cameras worth of data when they are used to capturing one is not realistic. We must provide a depth capture capability on-set.  We fought with data overload ourselves, especially with Red cameras; so did GreyMatter when shooting KOTR
  • Best choice for ad-hoc witness cameras: the Panasonic POV cameras. They are compact, light, and high-quality.  Easy workflow.
  • GoPro 3D is not a good choice for contained witness camera system, primarily due to battery life, no live HD-SDI or video preview options, and reliability issues.

Conclusions From R&D Project To-Date

Our approach has been vindicated — we have successfully identified essential elements of the process of utilizing low resolution depth cameras and stereo cameras in concert with high resolution color cameras to produce high resolution, high framerate, matched color plus depth/range capture data.  Furthermore, our approach allows us to do so at realtime framerates.

Back to 2016: How Does This Help Stereoscopic 360 VR?

In essence, cameras like the Facebook Surround 360 and Nokia OZO have to solve several significant problems:

Each of the series of camera pairs in their radial array(s) of sensors are “wall-eyed”, which means large inter-axial disparities.  This is challenging.  We showed a workable solution for this with ScorpionCam.

Not only does this “wall-eyed” configuration between any two given sensors cause large inter-axial disparities, but the occlusion areas will be large as well — not only in the depth data, but in the actual image data.  We showed a solution for this.

Finally, we also showed a way to use less expensive cameras (e.g. GoPro) than industrial grade for these purposes, in spite of these cheaper cameras’ variances with optics, rolling shutter variations from sensor-to-sensor, and large mounting tolerances, which would go a long way toward using an “open source camera” design that is easier for the masses to obtain.

In closing, I’d like to thank the contributors who helped this research project with their time, energy, devotion (and equipment):

Vidhya Seran — Chief Scientist

Steve Nowalk — Chief Architect

Brian Conway — SR VP of Hollywood

Ernest Forsyth — Jack of all trades

Simon “Bring Cash” Tidnam — The guy who made me pay all the Trades with wads of cash

Jeff Cronenweth — Priceless advice and counsel

Dan McDonough — Tireless 1st CA “Wait, what? How many cameras am I pulling focus for?”

Liam Mulvey — Camera and workflow post

Sean Ruggeri — RED Digital Cinema, Orange County, CA

Leo Vezzali — IndentityFX, Hollywood, CA

John “Pliny” Eremic — OffHollywood, NYC

Mark Peterson — OffHollywood, NYC

Rufus Burnham — The Camera House, Hollywood, CA

Max Penner — ParadiseFX, Hollywood, CA  (Yes Max, I still have the pennyloafers)




~ by opticalflow on May 11, 2016.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: