Abstract
We introduce D³S Consensus, a physics-based, closed-form algorithm that unifies depth-from-defocus (DfD) and stereo to achieve highly accurate depth estimation throughout an extended working range beyond the depth-of-field (DoF) of cameras. Given a pair of dual-defocus stereo images, the method estimates an overdetermined set of depth using a novel DfD theory, Dual Differential Defocus (D³), and (S)tereo in a coupled fashion. It then picks the most confident depth prediction from the set by enforcing consensus between these physically independent cues to reject unreliable estimates.
Analysis shows that D³S achieves a comparable working range under the same error tolerance with 10× smaller baseline than previous triangulation-based depth estimation systems. This enables compact passive binocular rangefinders with substantially smaller form factors than conventional stereo and DfD designs. We demonstrate the first D³S prototype with only 4 mm baseline and 12 mm EFL. It generates up to 900×1800-pixel depth maps with 1-cm mean absolute error over 0.3–1.64 m from a snapshot acquisition. This has surpassed the reported accuracy of certain commercially available stereo cameras with much larger form factors.
Overview of D³S Consensus. A compact binocular camera captures two images with different defocus and perspective, from which a purely physics-based algorithm produces a highly accurate sparse depth map (optionally densified with an off-the-shelf completion model). The prototype's accuracy and working range are competitive with commercial stereo cameras at a fraction of the size.
Real-world scenes captured by the D³S prototype. Sparse depth maps (filtered at a constant confidence threshold) show reliable estimation across diverse objects and textures; densified maps are obtained with an off-the-shelf depth completion model.
With a confidence threshold of 0.8, the prototype achieves a working range of 0.3–1.64 m while keeping mean absolute error below 1 cm, and outperforms representative DfD and dual-defocus stereo baselines on the same captured images.
Sample synthetic scenes. Accurate depth estimation is maintained across depth boundaries and gradient-depth surfaces; inset numbers report mean absolute error in cm over valid pixels.
BibTeX
@misc{luo2026d3s,
title = {Depth from Dual Differential Defocus and Stereo ({D$^3$S}) Consensus},
author = {Luo, Junjie and Xu, Wei and Chu, Dylan and Alexander, Emma and Guo, Qi},
year = {2026},
note = {Under review}
}