[BMVC 2021] ''Self-Supervised Monocular Depth Estimation with Internal Feature Fusion''

brandleyzhou, updated 🕥 2022-09-29 12:23:23


This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

A new backbone for self-supervised depth estimation.


If you think it is a useful work, please consider citing it. ``` @inproceedings{zhou_diffnet, title={Self-Supervised Monocular Depth Estimation with Internal Feature Fusion}, author={Zhou, Hang and Greenwood, David and Taylor, Sarah}, booktitle={British Machine Vision Conference (BMVC)}, year={2021} }



  • [16-05-2022] Adding cityscapes trainining and testing based on Manydepth.

  • [22-01-2022] A model diffnet_649x192 uploaded (slightly improved than that of orginal paper)

  • [07-12-2021] A multi-gpu training version availible on multi-gpu branch.

Comparing with others

Evaluation on selected hard cases:

Trained weights on KITTI

  • Please Note: the results of diffnet_1024x320_ms are not reported in paper *

| Methods |abs rel|sq rel| RMSE |rmse log | D1 | D2 | D3 | | :----------- | :-----: | :----: | :---: | :------: | :--------: |:--------: |:--------: | 1024x320|0.097|0.722|4.345|0.174|0.907|0.967|0.984| 1024_320_ms|0.094|0.678|4.250|0.172|0.911|0.968|0.984| 1024x320_ms_ttr|0.079|0.640|3.934|0.159|0.932|0.971|0.984 | 640x192|0.102|0.753|4.459|0.179|0.897|0.965|0.983| 640x192_ms|0.101|0.749|4.445|0.179|0.898|0.965|0.983|

Setting up before training and testing


sh start2train.sh


sh disp_evaluation.sh

Infer a single depth map from a RGB:

sh test_sample.sh


Thanks the authors for their works: - monodepth2 - HRNet


run-time FPS

opened on 2023-02-16 15:48:22 by echo0916

This is a great work. I have a question about "run-time FPS". In Table.3 of your paper, you claim that the run-time is 87FPS. Under what circumstances do you get this value? It takes at least 53ms for me to use GPU Nvidia RTX3090 to process a picture (640x192).


opened on 2022-08-28 07:07:01 by ljy199712

Thanks for your working. Here are something detials i want 2 ask you . Here are my torch torch 1.7.1+cu110 torchaudio 0.7.2 torchsummary 1.5.1
torchvision 0.8.2+cu110 I found when i set the initial learning rate as 10−4 for the first 14 epochs and then 10−5 for last 5 epochs ,my experimental results are very different from yours . Is it the reason for different PyTorch versions?Or my training process wrong?

Environment file

opened on 2022-08-04 20:20:21 by jaroslawjanas


Can we still get that environment file, though?

test file missing

opened on 2022-07-18 10:07:03 by Renatusphere

Thanks for your work of DIFFNet! I want to evaluate the results of the training in my PC, but the file "splits/eigen/gt_depths.npz" is required. I can't find it in the document. Could you please provide this file? Thanks!

Cityscapes model

opened on 2022-07-15 12:11:20 by seb-le

Hi. First, thank you for opening your nice paper and source code.

Could you share checkpoints that were pretrained on Cityscapes and fine-tuned on KITTI (i.e., CS → K)?

I would like to know whether DiffNet that I pretrained on Cityscapes is correct.


About torch::jit::trace

opened on 2022-04-28 10:04:52 by Hugo699

Hello, Thank you for sharing your work, and I want to use libtorch to deploy this network in C++, but when using torch::jit::trace(), I get this error(executing test_sample.py can run successfully): image image Because torch::jit::trace() cannot handle dictionary, I changed the output of depth_decoder to list, and there is a line "import hr_networks" in test_sample.py, but I did not find hr_networks, I don't know if this affectstorch::jit::trace().

Thank you very much!

self-supervised monocular-depth-estimation representation-learning bmvc cityscapes kitti