:boom:Update:
- Thanks to the issue#4, the implementation of HSIC is incorrect due to the misunderstanding of torch.diag
. Note that the issue even exists in our referred work ReBias (ICML'21). We have corrected this issue here mmaction/models/heads/debias_head.py#L137 but do not guarantee good performance.
- For a reasonable AUC evaluation, threshold is not necessary in practice. We recommend use the updated evaluation code script here: experiments/compare_openness_new.py
, but do not guarantee a good performance.
International Conference on Computer Vision (ICCV Oral), 2021.
We propose the Deep Evidential Action Recognition (DEAR) method to recognize actions in an open world. Specifically, we formulate the action recognition problem from the evidential deep learning (EDL) perspective and propose a novel model calibration method to regularize the EDL training. Besides, to mitigate the static bias of video representation, we propose a plug-and-play module to debias the learned representation through contrastive learning. Our DEAR model trained on UCF-101 dataset achieves significant and consistent performance gains based on multiple action recognition models, i.e., I3D, TSM, SlowFast, TPN, with HMDB-51 or MiT-v2 dataset as the unknown.
The following figures show the inference results by the SlowFast + DEAR model trained on UCF-101 dataset.
UCF-101 (Known) |
![]() |
![]() |
![]() |
![]() |
HMDB-51 (Unknown) |
![]() |
![]() |
![]() |
![]() |
This repo is developed from MMAction2 codebase. Since MMAction2 is updated in a fast pace, most of the requirements and installation steps are similar to the version MMAction2 v0.9.0.
Here we only list our used requirements and dependencies. It would be great if you can work around with the latest versions of the listed softwares and hardwares on the latest MMAction2 codebase. - Linux: Ubuntu 18.04 LTS - GPU: GeForce RTX 3090, A100-SXM4 - CUDA: 11.0 - GCC: 7.5 - Python: 3.7.9 - Anaconda: 4.9.2 - PyTorch: 1.7.1+cu110 - TorchVision: 0.8.2+cu110 - OpenCV: 4.4.0 - MMCV: 1.2.1 - MMAction2: 0.9.0
The following steps are modified from MMAction2 (v0.9.0) installation document. If you encountered problems, you may refer to more details in the official document, or raise an issue in this repo.
a. Create a conda virtual environment of this repo, and activate it:
shell
conda create -n mmaction python=3.7 -y
conda activate mmaction
b. Install PyTorch and TorchVision following the official instructions, e.g.,
shell
conda install pytorch=1.7.1 cudatoolkit=11.0 torchvision=0.8.2 -c pytorch
c. Install mmcv, we recommend you to install the pre-build mmcv as below.
shell
pip install mmcv-full==1.2.1 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.1/index.html
Important: If you have already installed mmcv
and try to install mmcv-full
, you have to uninstall mmcv
first by running pip uninstall mmcv
. Otherwise, there will be ModuleNotFoundError
.
d. Clone the source code of this repo:
shell
git clone https://github.com/Cogito2012/DEAR.git mmaction2
cd mmaction2
e. Install build requirements and then install DEAR.
shell
pip install -r requirements/build.txt
pip install -v -e . # or "python setup.py develop"
If no error appears in your installation steps, then you are all set!
This repo uses standard video action datasets, i.e., UCF-101 for closed set training, and HMDB-51 and MiT-v2 test sets as two different unknowns. Please refer to the default MMAction2 dataset setup steps to setup these three datasets correctly.
Note: You can just ignore the Step 3. Extract RGB and Flow
in the referred setup steps since all codes related to our paper do not rely on extracted frames and optical flow. This will save you large amount of disk space!
To test our pre-trained models (see the Model Zoo), you need to download a model file and unzip it under work_dir
. Let's take the I3D
-based DEAR model as an example. First, download the pre-trained I3D-based models, where the full DEAR model is saved in the folder finetune_ucf101_i3d_edlnokl_avuc_debias
. The following directory tree is for your reference to place the downloaded files.
shell
work_dirs
├── i3d
│ ├── finetune_ucf101_i3d_bnn
│ │ └── latest.pth
│ ├── finetune_ucf101_i3d_dnn
│ │ └── latest.pth
│ ├── finetune_ucf101_i3d_edlnokl
│ │ └── latest.pth
│ ├── finetune_ucf101_i3d_edlnokl_avuc_ced
│ │ └── latest.pth
│ ├── finetune_ucf101_i3d_edlnokl_avuc_debias
│ │ └── latest.pth
│ └── finetune_ucf101_i3d_rpl
│ └── latest.pth
├── slowfast
├── tpn_slowonly
└── tsm
a. Closed Set Evaluation.
Top-K accuracy and mean class accuracy will be reported.
shell
cd experiments/i3d
bash evaluate_i3d_edlnokl_avuc_debias_ucf101.sh 0
b. Get Uncertainty Threshold.
The threshold value of one model will be reported. ```shell cd experiments/i3d
bash run_get_threshold.sh 0 edlnokl_avuc_debias 2 ```
c. Out-of-Distribution Detection.
The uncertainty distribution figure of a specified model will be reported.
shell
cd experiments/i3d
bash run_ood_detection.sh 0 HMDB edlnokl_avuc_debias
d. Open Set Evaluation and Comparison.
The open set evaluation metrics and openness curves will be reported.
Note: Make sure the threshold values of different models are from the reported results in step b.
shell
cd experiments/i3d
bash run_openness.sh HMDB # use HMDB-51 test set as the Unknown
bash run_openness.sh MiT # use MiT-v2 test set as the Unknown
e. Draw Open Set Confusion Matrix
The confusion matrix with unknown dataset used will be reported.
shell
cd experiments/i3d
bash run_draw_confmat.sh HMDB # or MiT
Let's still take the I3D-based DEAR model as an example.
shell
cd experiments/i3d
bash finetune_i3d_edlnokl_avuc_debias_ucf101.sh 0
Since model training is time consuming, we strongly recommend you to run the above training script in a backend way if you are using SSH remote connection. ```shell nohup bash finetune_i3d_edlnokl_avuc_debias_ucf101.sh 0 >train.log 2>&1 &
tail -f train.log ```
Visualizing the training curves (losses, accuracies, etc.) on TensorBoard:
shell
cd work_dirs/i3d/finetune_ucf101_i3d_edlnokl_avuc_debias/tf_logs
tensorboard --logdir=./ --port 6008
Then, you will see the generated url address http://localhost:6008
. Open this address with your Internet Browser (such as Chrome), you will monitoring the status of training.
If you are using SSH connection to a remote server without monitor, tensorboard visualization can be done on your local machine by manually mapping the SSH port number:
shell
ssh -L 16008:localhost:6008 {your_remote_name}@{your_remote_ip}
Then, you can monitor the tensorboard by the port number 16008
by typing http://localhost:16008
in your browser.
The pre-trained weights (checkpoints) are available below. | Model | Checkpoint | Train Config | Test Config | Open maF1 (%) | Open Set AUC (%) | Closed Set ACC (%) | |:--|:--:|:--:|:--:|:--:|:--:|:--:| |I3D + DEAR |ckpt| train | test | 77.24 / 69.98 | 77.08 / 81.54 | 93.89 | |TSM + DEAR | ckpt| train | test | 84.69 / 70.15 | 78.65 / 83.92 | 94.48 | |TPN + DEAR | ckpt| train | test | 81.79 / 71.18 | 79.23 / 81.80 | 96.30 | |SlowFast + DEAR |ckpt| train | test | 85.48 / 77.28 | 82.94 / 86.99 | 96.48 |
For other checkpoints of the compared baseline models, please download them in the Google Drive.
If you find the code useful in your research, please cite:
@inproceedings{BaoICCV2021DEAR,
author = "Bao, Wentao and Yu, Qi and Kong, Yu",
title = "Evidential Deep Learning for Open Set Action Recognition",
booktitle = "International Conference on Computer Vision (ICCV)",
year = "2021"
}
In addition to the MMAction2 codebase, this repo contains modified codes from: - pytorch-classification-uncertainty: for implementation of the EDL (NeurIPS-2018). - ARPL: for implementation of baseline method RPL (ECCV-2020). - OSDN: for implementation of baseline method OpenMax (CVPR-2016). - bayes-by-backprop: for implementation of the baseline method Bayesian Neural Networks (BNNs). - rebias: for implementation of HSIC regularizer used in ReBias (ICML-2020)
We sincerely thank the owners of all these great repos!
It seems that the debiashead is used to implement CED. But it seems no 3D operations though some modules (self.f1_conv3d, self.f2_conv3d) are named with '3D'. Because the temporal size of conlolution kenerls is 1.
In this way , shuffling the feat actually won't make any sense. Actually it seems no difference between this three branch: 1.(f1_conv3d-->avg_pool-->fc1), 2.(temporal shuffling-->f2_conv3d-->avg_pool-->fc2) 3.(reshape-->f3_conv2d-->avg_pool-->fc3).
Here is the code in
https://github.com/Cogito2012/DEAR/tree/master/mmaction/models/heads/debias_head.py:
``` @HEADS.register_module() class DebiasHead(BaseHead): """Debias head.
Args:
num_classes (int): Number of classes to be classified.
in_channels (int): Number of channels in input feature.
loss_cls (dict): Config for building loss.
Default: dict(type='EvidenceLoss')
spatial_type (str): Pooling type in spatial dimension. Default: 'avg'.
dropout_ratio (float): Probability of dropout layer. Default: 0.5.
init_std (float): Std value for Initiation. Default: 0.01.
kwargs (dict, optional): Any keyword argument to be used to initialize
the head.
"""
def __init__(self,
num_classes,
in_channels,
loss_cls=dict(type='EvidenceLoss'),
loss_factor=0.1,
hsic_factor=0.5, # useful when alternative=True
alternative=False,
bias_input=True,
bias_network=True,
dropout_ratio=0.5,
init_std=0.01,
**kwargs):
super().__init__(num_classes, in_channels, loss_cls, **kwargs)
self.bias_input = bias_input
self.bias_network = bias_network
assert bias_input or bias_network, "At least one of the choices (bias_input, bias_network) should be True!"
self.loss_factor = loss_factor
self.hsic_factor = hsic_factor
self.alternative = alternative
self.f1_conv3d = ConvModule(
in_channels,
in_channels * 2, (1, 3, 3),
stride=(1, 2, 2),
padding=(0, 1, 1),
bias=False,
conv_cfg=dict(type='Conv3d'),
norm_cfg=dict(type='BN3d', requires_grad=True))
if bias_input:
self.f2_conv3d = ConvModule(
in_channels,
in_channels * 2, (1, 3, 3),
stride=(1, 2, 2),
padding=(0, 1, 1),
bias=False,
conv_cfg=dict(type='Conv3d'),
norm_cfg=dict(type='BN3d', requires_grad=True))
if bias_network:
self.f3_conv2d = ConvModule(
in_channels,
in_channels * 2, (3, 3),
stride=(2, 2),
padding=(1, 1),
bias=False,
conv_cfg=dict(type='Conv2d'),
norm_cfg=dict(type='BN', requires_grad=True))
self.dropout_ratio = dropout_ratio
self.init_std = init_std
if self.dropout_ratio != 0:
self.dropout = nn.Dropout(p=self.dropout_ratio)
else:
self.dropout = None
self.f1_fc = nn.Linear(self.in_channels * 2, self.num_classes)
self.f2_fc = nn.Linear(self.in_channels * 2, self.num_classes)
self.f3_fc = nn.Linear(self.in_channels * 2, self.num_classes)
self.avg_pool = nn.AdaptiveAvgPool3d((1, 1, 1))
.............
def forward(self, x, num_segs=None, target=None, **kwargs):
"""Defines the computation performed at every call.
Args:
x (torch.Tensor): The input data. (B, 1024, 8, 14, 14)
Returns:
torch.Tensor: The classification scores for input samples.
"""
feat = x.clone() if isinstance(x, torch.Tensor) else x[-2].clone()
if len(feat.size()) == 4: # for 2D recognizer
assert num_segs is not None
feat = feat.view((-1, num_segs) + feat.size()[1:]).transpose(1, 2).contiguous()
# one-hot embedding for the target
y = torch.eye(self.num_classes).to(feat.device)
y = y[target]
losses = dict()
# f1_Conv3D(x)
x = self.f1_conv3d(feat) # (B, 2048, 8, 7, 7)
feat_unbias = self.avg_pool(x).squeeze(-1).squeeze(-1).squeeze(-1)
x = self.dropout(feat_unbias)
x = self.f1_fc(x)
alpha_unbias = self.exp_evidence(x) + 1
# minimize the edl losses
loss_cls1 = self.edl_loss(torch.log, alpha_unbias, y)
losses.update({'loss_unbias_cls': loss_cls1})
loss_hsic_f, loss_hsic_g = torch.zeros_like(loss_cls1), torch.zeros_like(loss_cls1)
if self.bias_input:
# f2_Conv3D(x)
feat_shuffle = feat[:, :, torch.randperm(feat.size()[2])]
x = self.f2_conv3d(feat_shuffle) # (B, 2048, 8, 7, 7)
feat_bias1 = self.avg_pool(x).squeeze(-1).squeeze(-1).squeeze(-1)
x = self.dropout(feat_bias1)
x = self.f2_fc(x)
alpha_bias1 = self.exp_evidence(x) + 1
# minimize the edl losses
loss_cls2 = self.edl_loss(torch.log, alpha_bias1, y)
losses.update({'loss_bias1_cls': loss_cls2})
if self.alternative:
# minimize HSIC w.r.t. feat_unbias, and maximize HSIC w.r.t. feat_bias1
loss_hsic_f += self.hsic_factor * self.hsic_loss(feat_unbias, feat_bias1.detach(), unbiased=True)
loss_hsic_g += - self.hsic_factor * self.hsic_loss(feat_unbias.detach(), feat_bias1, unbiased=True)
else:
# maximize HSIC
loss_hsic1 = -1.0 * self.hsic_loss(alpha_unbias, alpha_bias1)
losses.update({"loss_bias1_hsic": loss_hsic1})
if self.bias_network:
# f3_Conv2D(x)
B, C, T, H, W = feat.size()
feat_reshape = feat.permute(0, 2, 1, 3, 4).contiguous().view(-1, C, H, W) # (B*T, C, H, W)
x = self.f3_conv2d(feat_reshape) # (64, 2048, 7, 7)
x = x.view(B, T, x.size(-3), x.size(-2), x.size(-1)).permute(0, 2, 1, 3, 4) # (B, 2048, 8, 7, 7)
feat_bias2 = self.avg_pool(x).squeeze(-1).squeeze(-1).squeeze(-1)
x = self.dropout(feat_bias2)
x = self.f3_fc(x)
alpha_bias2 = self.exp_evidence(x) + 1
# minimize the edl losses
loss_cls3 = self.edl_loss(torch.log, alpha_bias2, y)
losses.update({'loss_bias2_cls': loss_cls3})
if self.alternative:
# minimize HSIC w.r.t. feat_unbias, and maximize HSIC w.r.t. feat_bias2
loss_hsic_f += self.hsic_factor * self.hsic_loss(feat_unbias, feat_bias2.detach(), unbiased=True)
loss_hsic_g += - self.hsic_factor * self.hsic_loss(feat_unbias.detach(), feat_bias2, unbiased=True)
else:
# maximize HSIC
loss_hsic2 = -1.0 * self.hsic_loss(alpha_unbias, alpha_bias2)
losses.update({"loss_bias2_hsic": loss_hsic2})
if self.alternative:
# Here, we use odd iterations for minimizing hsic_f, and use even iterations for maximizing hsic_g
assert 'iter' in kwargs, "iter number is missing!"
loss_mask = kwargs['iter'] % 2
loss_hsic = loss_mask * loss_hsic_f + (1 - loss_mask) * loss_hsic_g
losses.update({'loss_hsic': loss_hsic})
for k, v in losses.items():
losses.update({k: v * self.loss_factor})
return losses
```
action-recognition openset-recognition video-understanding uncertainty-quantification evidential-deep-learning debiasing model-calibration ood-detection