学习笔记：MobileNets: Efficient Convolutional Neural Networks for Mobile Vision

程序员文章站 2022-07-14 15:19:23

...

https://zhuanlan.zhihu.com/p/35405071
论文地址:MobileNetv1
Howard, Andrew G., et al. “Mobilenets: Efficient convolutional neural networks for mobile vision applications.” arXiv preprint arXiv:1704.04861 (2017).

一、介绍

Mobilenet v1是Google于2017年发布的网络架构，旨在充分利用移动设备和嵌入式应用的有限的资源，有效地最大化模型的准确性，以满足有限资源下的各种应用案例。Mobilenet v1也可以像其他流行模型（如VGG，ResNet）一样用于分类、检测、嵌入和分割等任务提取图像卷积特征。

一、核心深度可分离卷积Depthwise cnn+Pointwise cnn

学习笔记：MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
上图是普通卷积，
kernel-channel = input channel，扫描input-img（HW)对应点做乘法，再求和
output-channel = kernel-num
乘法计算量：F=[Ci x 3 x 3] x (H x W) x Co # 这里的C-out 等价于kernel-num
学习笔记：MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
上图是depthwise卷积，各个通道分离做卷积得到还是 HW C的img，再进行1*1卷积（对通道进行融合）
乘法计算量

1 左图深度分离卷积：F1= [Ci x 3 x 3] x [H x W]
2 右图1x1卷积：F2= [Ci 1 x 1 x 1 ] x [H x W] x Co

[depthwise+Piontwise] / 普通CNN计算量 = 1/Co + 1/9
(这里Co是输出图像通道个数）如下是深度可分离卷积替代普通CNN多一个BN
（

2 模型网络

学习笔记：MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
按如下分组，可以看出类似vgg那种堆叠

vgg16

这里设置了两个超参数来控制模型大小与计算量

宽度乘子：用于控制通道数，阿发a，当<1时，模型会变窄，计算量减少为a²
分辨率乘子：用于控制特征图尺寸，记作p ，在相应的特征图上应用该乘子可降低计算量

增加了一个超参数 α∈[0,1] 来控制feature map的通道数，alpha 越小，则模型越小。作用是改变输入输出通道数，减少特征图数量，让网络变瘦
学习笔记：MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
原文公式

当然，压缩网络计算量肯定是有代价的。图11展示了 [公式] 不同时Mobilenet v1在ImageNet上的性能。可以看到即使 [公式] 时Mobilenet v1在ImageNet上依然有63.7%的准确度。
学习笔记：MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
增加了一个超参数 ρ 来控制输入图像的分辨率，ρ 越小，则输入图像越小。

成果

1 分类

计算量和参数size降低很多倍的前提下，acc相当Googlenet和VGGnet
学习笔记：MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
输入图像的大小影响acc

2 目标检测，相比大型网络mAP相差挺多，但计算量确实下降不少

学习笔记：MobileNets: Efficient Convolutional Neural Networks for Mobile Vision

3 人脸分类

学习笔记：MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
知识蒸馏，Facenet教Mobilenet学习识别人脸
The FaceNet model is a state of the art face recognition model [25]. It builds face embeddings based on the triplet loss. To build a mobile FaceNet model we use distillation to train by minimizing the squared differences of the output of FaceNet and MobileNet on the training data. Results for very small MobileNet models can be found in table 14.
学习笔记：MobileNets: Efficient Convolutional Neural Networks for Mobile Vision

插曲：什么是知识蒸馏hinton2015年提出

论文
知识蒸馏是什么：原来我们需要让新模型的softmax分布与真实标签匹配，现在只需要让新模型与原模型在给定输入下的softmax分布匹配了就行（新函数逼近原函数）
原模型产生的某个logits是 [公式] ，新模型产生的logits是 [公式]
学习笔记：MobileNets: Efficient Convolutional Neural Networks for Mobile Vision 让这个函数逼近0，用反向传播算法
在化学中，蒸馏是一个有效的分离沸点不同的组分的方法，大致步骤是先升温使低沸点的组分汽化，然后降温冷凝，达到分离出目标物质的目的。在前面提到的这个过程中，我们先让温度 [公式] 升高，然后在测试阶段恢复「低温」，从而将原模型中的知识提取出来，因此将其称为是蒸馏，实在是妙

class MobileNetv1(nn.Module):
    def __init__(self):
        super(MobileNetv1, self).__init__()

        def conv_bn(dim_in, dim_out, stride):
            return nn.Sequential(
                nn.Conv2d(dim_in, dim_out, 3, stride, 1, bias=False),
                nn.BatchNorm2d(dim_out),
                nn.ReLU(inplace=True)
            )

        def conv_dw(dim_in, dim_out, stride):
            return nn.Sequential(
                nn.Conv2d(dim_in, dim_in, 3, stride, 1, groups= dim_in, bias=False),
                nn.BatchNorm2d(dim_in),
                nn.ReLU(inplace=True),
                nn.Conv2d(dim_in, dim_out, 1, 1, 0, bias=False),
                nn.BatchNorm2d(dim_out),
                nn.ReLU(inplace=True),
            )
        self.model = nn.Sequential(
            conv_bn(  3,  32, 2),
            conv_dw( 32,  64, 1),
            conv_dw( 64, 128, 2),
            conv_dw(128, 128, 1),
            conv_dw(128, 256, 2),
            conv_dw(256, 256, 1),
            conv_dw(256, 512, 2),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 1024, 2),
            conv_dw(1024, 1024, 1),
            nn.AvgPool2d(7),
        )
        self.fc = nn.Linear(1024, 20)

    def forward(self, x):
        x = self.model(x)
        x = x.view(-1, 1024)
        x = self.fc(x)
        return x

总结

利用Depthwise和pointwise实现深度可分离，降低计算量和模型大小
（文中反复提的low latency和model size）
模型比较复古，类似vgg堆叠，没有残差、特征融合等技术
缺点：深度分解卷积各通道独立，卷积核维度较小，输出特征中只有较少的输入特征，加上relu容易变成零，导师特征提取失败，卷积核冗余

自己实验结果pytorch代码（非finetune）

6 mobilenetv1 2080 t-bs:64 v-bs:64 lr:0.01 100epoch

数据集GHIM-20分类，基本上在epoch=8000 x 64 / 9000 = 57 轮拟合好了
学习笔记：MobileNets: Efficient Convolutional Neural Networks for Mobile Vision

另：对于分类基础网络的训练影响因素实验如下博客，研究了batchsize核learn rate对训练模型好坏的影响
https://blog.csdn.net/weixin_44523062/article/details/105457045