DL之RefineNet：RefineNet和Light-Weight RefineNet算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略

便当超级 2022-09-19 14:26:44  66670

分类专栏：资讯

RefineNet和Light-Weight RefineNet算法的简介(论文介绍)

RefineNet

Light-Weight RefineNet

0.1、实验结果

0.2、Light-Weight RefineNet的实验性能

1、回顾先前网络的缺点

RefineNet算法的架构详解

RefineNet算法的案例应用

RefineNet和Light-Weight RefineNet算法的简介(论文介绍)

更新……

RefineNet

Abstract
Recently, very deep convolutional neural networks (CNNs) have shown outstanding performance in object recognition and have also been the first choice for dense classification problems such as semantic segmentation. However, repeated subsampling operations like pooling or convolution striding in deep CNNs lead to a significant decrease in the initial image resolution. Here, we present RefineNet, a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections. In this way, the deeper layers that capture high-level semantic features can be directly refined using fine-grained features from earlier convolutions. The individual components of RefineNet employ residual connections following the identity mapping mindset, which allows for effective end-to-end training. Further, we introduce chained residual pooling, which captures rich background context in an efficient manner. We carry out comprehensive experiments and set new stateof-the-art results on seven public datasets. In particular, we achieve an intersection-over-union score of 83.4 on the challenging PASCAL VOC 2012 dataset, which is the best reported result to date.
近年来，深度卷积神经网络(CNNs)在目标识别方面表现出了优异的性能，也成为语义分割等密集分类问题的首选。然而，在深度CNNs中，重复的子采样操作(如池化或卷积跨行)会导致初始图像分辨率显著降低。在这里，我们介绍RefineNet，这是一个通用的多路径优化网络，它显式地利用了下行采样过程中可用的所有信息，从而使用远程剩余连接实现高分辨率预测。通过这种方式，可以使用早期卷积的细粒度特性直接细化捕获高级语义特性的更深层。RefineNet的各个组件使用了遵循身份映射思维的剩余连接，这允许进行有效的端到端训练。此外，我们还引入了链式残差池，它以一种有效的方式捕获了丰富的背景上下文。我们对七个公共数据集进行了全面的实验，并设置了最新的研究结果。特别是，我们在具有挑战性的PASCAL VOC 2012数据集上获得了83.4分的交叉-过度-联合得分，这是迄今为止最好的报告结果。
Conclusion
We have presented RefineNet, a novel multi-path refinement network for semantic segmentation and object parsing. The cascaded architecture is able to effectively combine high-level semantics and low-level features to produce high-resolution segmentation maps. Our design choices are inspired by the idea of identity mapping which facilitates gradient propagation across long-range connections and thus enables effective end-to-end learning. We outperform all previous works on seven public benchmarks, setting a new mark for the state of the art in semantic labeling.
本文提出了一种用于语义分割和对象解析的多路径优化网络RefineNet。级联结构能够有效地结合高级语义和低级特征，生成高分辨率的分割地图。我们的设计选择受到身份映射思想的启发，这种思想促进了跨远程连接的梯度传播，从而实现了有效的端到端学习。在七个公共基准上，我们的表现超过了之前所有的作品，为语义标记的最新水平设定了一个新的标记。

论文
Guosheng Lin, Anton Milan, Chunhua Shen, Ian Reid
RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation
https://arxiv.org/abs/1611.06612

Light-Weight RefineNet

RefineNet轻量化的改进，得到Light-Weight RefineNet算法模型。

Conclusions
In this work, we tackled the problem of rethinking an existing semantic segmentation architecture into the one suitable for real-time performance, while keeping the performance levels mostly intact. We achieved that by proposing simple modifications to the existing network and highlighting which building blocks were redundant for the final result. Our method can be applied along with any classification network for any dataset and can further benefit from using light-weight backbone networks, and other compression approaches. Quantitatively, we were able to closely match the performance of the original network while significantly surpassing its runtime and even acquiring 55 FPS on 512×512 inputs (from initial 20 FPS). Besides that, we demonstrate that having convolutions with large kernel sizes can be unnecessary in the decoder part of segmentation networks, and we will devote future work to further cover this topic.
在这项工作中，我们解决了将现有的语义分割体系结构重新考虑为适合实时性能的体系结构的问题，同时保持性能级别基本不变。我们通过对现有网络进行简单的修改，并强调哪些构建块对于最终结果是多余的，从而实现了这一点。我们的方法可以应用于任何数据集的任何分类网络，并且可以进一步受益于使用轻量级骨干网络和其他压缩方法。在数量上，我们能够很好地匹配原始网络的性能，同时显著地超过了它的运行时，甚至在512×512的输入(最初的20个FPS)上获得55个FPS。除此之外，我们还证明了在分割网络的解码器部分，使用大内核大小的卷积是不必要的，我们将在以后的工作中进一步讨论这个问题。
Abstract
We consider an important task of effective and efficient semantic image segmentation. In particular, we adapt a powerful semantic segmentation architecture, called RefineNet [46], into the more compact one, suitable even for tasks requiring real-time performance on high-resolution inputs. To this end, we identify computationally expensive blocks in the original setup, and propose two modifications aimed to decrease the number of parameters and floating point operations. By doing that, we achieve more than twofold model reduction, while keeping the performance levels almost intact. Our fastest model undergoes a significant speed-up boost from 20 FPS to 55 FPS on a generic GPU card on 512×512 inputs with solid 81.1% mean iou performance on the test set of PASCAL VOC [18], while our slowest model with 32 FPS (from original 17 FPS) shows 82.7% mean iou on the same dataset. Alternatively, we showcase that our approach is easily mixable with light-weight classification networks: we attain 79.2% mean iou on PASCAL VOC using a model that contains only 3.3M parameters and performs only 9.3B floating point operations.
我们认为有效和高效的语义图像分割是一个重要的任务。特别是，我们采用了一个强大的语义分割架构RefineNet[46]，使其更加紧凑，甚至适用于需要高分辨率输入的实时性能的任务。为此，我们在原始设置中确定了计算开销较大的块，并提出了两个修改，目的是减少参数和浮点运算的数量。通过这样做，我们实现了两倍以上的模型缩减，同时几乎保持了性能水平不变。最快模型经历了从一个巨大的加速提升20 FPS 55 FPS通用GPU卡上512×512输入81.1%意味着借据性能测试集的PASCAL VOC[18],而我们最慢的模型与32 FPS(从原始17 FPS)显示,82.7%意味着在相同的数据集。或者，我们展示了我们的方法很容易与轻量级分类网络混合:我们使用一个仅包含3.3M的参数且仅执行93亿次浮点运算的模型，在PASCAL VOC上获得79.2%的平均iou。

论文
Vladimir Nekrasov, Chunhua Shen, Ian Reid
Light-Weight RefineNetfor Real-Time Semantic Segmentation. BMVC 2018
http://bmvc2018.org/contents/papers/0494.pdf

0.1、实验结果

1、Experiments

Six popular datasets for semantic segmentation on indoors and outdoors scenes (NYUDv2, PASCAL VOC 2012, SUN-RGBD, PASCAL-Context, Cityscapes, ADE20K MIT)
6个流行的用于室内外场景语义分割的数据集(NYUDv2, PASCAL VOC 2012, SUN-RGBD, PASCAL- context, Cityscapes, ADE20K MIT)
One dataset for object parsing called Person-Part
一个用于对象解析的数据集，称为Person-Part

2、Object parsing results on the Person-Part dataset

3、Prediction examples on Person-Parts dataset

4、Pascal VOC 2012测试集的结果（IOU分数）—RefineNet 记录了最佳性能（IOU 83.4）

Table 5. Results on the PASCAL VOC 2012 test set (IoU scores). Our RefineNet archives the best performance (IoU 83.4).

5、Our prediction examples on VOC 2012 dataset

6、城市景观数据集的预测实例

Prediction examples on Cityscapes dataset

0.2、Light-Weight RefineNet的实验性能

1、Quantitative results on PASCAL VOC

55 FPS on a generic GPU card on 512╳512 inputs with solid 81.1%mIoUwhile our slowest model with 32 FPS shows 82.7% mIoU. 55 FPS通用GPU卡上512╳512与固体81.1% miouwhile慢模型输入32 FPS mIoU显示82.7%。

1、回顾先前网络的缺点

ResNet：特征图分辨率逐步降低
Dilated convolutions 膨胀卷积：计算量大、高内存占用（需存储中间特征图）、粗糙下采样