Depth from Dual Differential Defocus and Stereo (D³S) Consensus

Luo, Junjie; Xu, Wei; Chu, Dylan; Alexander, Emma; Guo, Qi

Depth from Dual Differential Defocus and Stereo (D³S) Consensus

Junjie Luo^a,*, Wei Xu^a,*, Dylan Chu^a, Emma Alexander^b, Qi Guo^a

Under Review
^*Indicates Equal Contribution
^aPurdue University, West Lafayette, IN, USA
^bNorthwestern University, Evanston, IL, USA

Paper arXiv Code

Abstract

We introduce D³S Consensus, a physics-based, closed-form algorithm that unifies depth-from-defocus (DfD) and stereo to achieve highly accurate depth estimation throughout an extended working range beyond the depth-of-field (DoF) of cameras. Given a pair of dual-defocus stereo images, the method estimates an overdetermined set of depth using a novel DfD theory, Dual Differential Defocus (D³), and (S)tereo in a coupled fashion. It then picks the most confident depth prediction from the set by enforcing consensus between these physically independent cues to reject unreliable estimates.

Analysis shows that D³S achieves a comparable working range under the same error tolerance with 10× smaller baseline than previous triangulation-based depth estimation systems. This enables compact passive binocular rangefinders with substantially smaller form factors than conventional stereo and DfD designs. We demonstrate the first D³S prototype with only 4 mm baseline and 12 mm EFL. It generates up to 900×1800-pixel depth maps with 1-cm mean absolute error over 0.3–1.64 m from a snapshot acquisition. This has surpassed the reported accuracy of certain commercially available stereo cameras with much larger form factors.

Overview of D3S Consensus and comparison with commercial stereo cameras

Overview of D³S Consensus. A compact binocular camera captures two images with different defocus and perspective, from which a purely physics-based algorithm produces a highly accurate sparse depth map (optionally densified with an off-the-shelf completion model). The prototype's accuracy and working range are competitive with commercial stereo cameras at a fraction of the size.

Real-world sparse and densified depth maps from the D3S prototype

Real-world scenes captured by the D³S prototype. Sparse depth maps (filtered at a constant confidence threshold) show reliable estimation across diverse objects and textures; densified maps are obtained with an off-the-shelf depth completion model.

With a confidence threshold of 0.8, the prototype achieves a working range of 0.3–1.64 m while keeping mean absolute error below 1 cm, and outperforms representative DfD and dual-defocus stereo baselines on the same captured images.

Sample synthetic scenes. Accurate depth estimation is maintained across depth boundaries and gradient-depth surfaces; inset numbers report mean absolute error in cm over valid pixels.

BibTeX

@misc{luo2026d3s,
  title  = {Depth from Dual Differential Defocus and Stereo ({D$^3$S}) Consensus},
  author = {Luo, Junjie and Xu, Wei and Chu, Dylan and Alexander, Emma and Guo, Qi},
  year   = {2026},
  note   = {Under review}
}

More Works from Our Lab

Compact Single-Shot Ranging and Near-Far Imaging Using Metasurfaces

Focal Split: Untethered Snapshot Depth from Differential Defocus

Depth from Coupled Optical Differentiation

Depth from Dual Differential Defocus and Stereo (D³S) Consensus

Abstract

Real-world scenes captured by the D³S prototype. Sparse depth maps (filtered at a constant confidence threshold) show reliable estimation across diverse objects and textures; densified maps are obtained with an off-the-shelf depth completion model.

With a confidence threshold of 0.8, the prototype achieves a working range of 0.3–1.64 m while keeping mean absolute error below 1 cm, and outperforms representative DfD and dual-defocus stereo baselines on the same captured images.

Sample synthetic scenes. Accurate depth estimation is maintained across depth boundaries and gradient-depth surfaces; inset numbers report mean absolute error in cm over valid pixels.

BibTeX