Nextstereo: directionally driven channel expansion gives adaptive real-time stereo

Date

2026-01

Editor(s)

Advisor

Öğüz, Salih Özgür

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats
8
views
29
downloads

Series

Abstract

We present NeXtStereo, a lightweight stereo disparity estimation network designed for real-time depth perception. NeXtStereo builds on Widened ConvNeXtV2 blocks that strengthen cost aggregation while leveraging the scalability and generalization behavior of the ConvNeXt family. In addition, we introduce Directionally Modulated Attention (DMA), a novel attention mechanism that incorporates geometric priors to modulate features using directional cues. Together, these components improve structural detail recovery in challenging regions such as object boundaries, thin structures, and texture-weak areas, without relying on heavy 3D aggregation stacks. We evaluate NeXtStereo on SceneFlow, KITTI 2012/2015, and Middlebury, where it achieves a favorable accuracy/efficiency trade-off among real-time models and improves cross-domain robustness, with NeXtStereo-L achieving the lowest > 2px error among the compared methods. We also study adaptation to the MS2 outdoor driving dataset and observe reliable transfer under fine-tuning. Furthermore, NeXtStereo demonstrates strong compatibility with convolutional Low-Rank Adaptation (LoRA), enabling parameterefficient domain adaptation with improved stability compared to relevant realtime stereo matching baselines. Finally, we analyze selective 3D cost aggregation via a targeted ablation that replaces the first 1/4-scale aggregation block with a 3D ConvNeXt-style cost aggregation operator, characterizing the resulting accuracy/ efficiency trade-offs.

Source Title

Publisher

Course

Other identifiers

Book Title

Degree Discipline

Computer Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Citation

Published Version (Please cite this version)

Language

English

Type