【YOLOX】用YOLOv5框架YOLOX

news/2024/7/11 1:41:19 标签: YOLO, python, 深度学习

YOLOX】用YOLOv5框架YOLOX

      • 一、新建common_x.py
      • 二、修改yolo.py
      • 三、新建yolox.yaml
      • 四、训练


最近在跑YOLO主流框架的对比实验,发现了一个很奇怪的问题,就是同一个数据集,在不同YOLO框架下训练出的结果差距竟然大的离谱。我使用ultralytics公司出品的v5、v3框架跑出的结果精度差距是合理的,然而用该Up主写的Yolov4代码,竟与ultralytics公司出品的v5、v3框架跑出的结果精度能低20-30%,帧率低的离谱。并且YOLOX也是一样结果。虽然不知道为什么,但确实无法进行对比实验,于是只能将Yolov4结构与YoloX结构在Yolov5框架中实现。Yolov4在Yolov5框架中的实现我参考了这个博主的博客,大家有需求可以参考:yolov4_u5版复现。是一系列文章。
下面我来实现将YoloX结构移植到Yolov5框架中,以下是结合网络结构以及YoloX源码进行实现:


一、新建common_x.py

python文件存放的是YOLOX中用到的模块,主要包括BaseConv、CSPLayer、Dark,代码如下:

python">import torch
import torch.nn as nn


class SiLU(nn.Module):
    """export-friendly version of nn.SiLU()"""

    @staticmethod
    def forward(x):
        return x * torch.sigmoid(x)


def get_activation(name="silu", inplace=True):
    if name == "silu":
        module = nn.SiLU(inplace=inplace)
    elif name == "relu":
        module = nn.ReLU(inplace=inplace)
    elif name == "lrelu":
        module = nn.LeakyReLU(0.1, inplace=inplace)
    else:
        raise AttributeError("Unsupported act type: {}".format(name))
    return module


class BaseConv(nn.Module):
    """A Conv2d -> Batchnorm -> silu/leaky relu block"""

    def __init__(
            self, in_channels, out_channels, ksize, stride, groups=1, bias=False, act="silu"
    ):
        super().__init__()
        # same padding
        pad = (ksize - 1) // 2
        self.conv = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=ksize,
            stride=stride,
            padding=pad,
            groups=groups,
            bias=bias,
        )
        self.bn = nn.BatchNorm2d(out_channels)
        self.act = get_activation(act, inplace=True)

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

    def fuseforward(self, x):
        return self.act(self.conv(x))


class DWConv(nn.Module):
    """Depthwise Conv + Conv"""

    def __init__(self, in_channels, out_channels, ksize, stride=1, act="silu"):
        super().__init__()
        self.dconv = BaseConv(
            in_channels,
            in_channels,
            ksize=ksize,
            stride=stride,
            groups=in_channels,
            act=act,
        )
        self.pconv = BaseConv(
            in_channels, out_channels, ksize=1, stride=1, groups=1, act=act
        )

    def forward(self, x):
        x = self.dconv(x)
        return self.pconv(x)


class Bottleneck(nn.Module):
    # Standard bottleneck
    def __init__(
            self,
            in_channels,
            out_channels,
            shortcut=True,
            expansion=0.5,
            depthwise=False,
            act="silu",
    ):
        super().__init__()
        hidden_channels = int(out_channels * expansion)
        Conv = DWConv if depthwise else BaseConv
        self.conv1 = BaseConv(in_channels, hidden_channels, 1, stride=1, act=act)
        self.conv2 = Conv(hidden_channels, out_channels, 3, stride=1, act=act)
        self.use_add = shortcut and in_channels == out_channels

    def forward(self, x):
        y = self.conv2(self.conv1(x))
        if self.use_add:
            y = y + x
        return y


class ResLayer(nn.Module):
    "Residual layer with `in_channels` inputs."

    def __init__(self, in_channels: int):
        super().__init__()
        mid_channels = in_channels // 2
        self.layer1 = BaseConv(
            in_channels, mid_channels, ksize=1, stride=1, act="lrelu"
        )
        self.layer2 = BaseConv(
            mid_channels, in_channels, ksize=3, stride=1, act="lrelu"
        )

    def forward(self, x):
        out = self.layer2(self.layer1(x))
        return x + out


class SPPBottleneck(nn.Module):
    """Spatial pyramid pooling layer used in YOLOv3-SPP"""

    def __init__(
            self, in_channels, out_channels, kernel_sizes=(5, 9, 13), activation="silu"
    ):
        super().__init__()
        hidden_channels = in_channels // 2
        self.conv1 = BaseConv(in_channels, hidden_channels, 1, stride=1, act=activation)
        self.m = nn.ModuleList(
            [
                nn.MaxPool2d(kernel_size=ks, stride=1, padding=ks // 2)
                for ks in kernel_sizes
            ]
        )
        conv2_channels = hidden_channels * (len(kernel_sizes) + 1)
        self.conv2 = BaseConv(conv2_channels, out_channels, 1, stride=1, act=activation)

    def forward(self, x):
        x = self.conv1(x)
        x = torch.cat([x] + [m(x) for m in self.m], dim=1)
        x = self.conv2(x)
        return x


class CSPLayer(nn.Module):
    """C3 in yolov5, CSP Bottleneck with 3 convolutions"""

    def __init__(
            self,
            in_channels,
            out_channels,
            n=1,
            shortcut=True,
            expansion=0.5,
            depthwise=False,
            act="silu",
    ):
        """
        Args:
            in_channels (int): input channels.
            out_channels (int): output channels.
            n (int): number of Bottlenecks. Default value: 1.
        """
        # ch_in, ch_out, number, shortcut, groups, expansion
        super().__init__()
        hidden_channels = int(out_channels * expansion)  # hidden channels
        self.conv1 = BaseConv(in_channels, hidden_channels, 1, stride=1, act=act)
        self.conv2 = BaseConv(in_channels, hidden_channels, 1, stride=1, act=act)
        self.conv3 = BaseConv(2 * hidden_channels, out_channels, 1, stride=1, act=act)
        module_list = [
            Bottleneck(
                hidden_channels, hidden_channels, shortcut, 1.0, depthwise, act=act
            )
            for _ in range(n)
        ]
        self.m = nn.Sequential(*module_list)

    def forward(self, x):
        x_1 = self.conv1(x)
        x_2 = self.conv2(x)
        x_1 = self.m(x_1)
        x = torch.cat((x_1, x_2), dim=1)
        return self.conv3(x)


class Dark(nn.Module):
    def __init__(self, c1, c2, n=1, act="silu"):
        super().__init__()
        self.cv1 = BaseConv(c1, c2, 3, 2, act=act)
        self.cv2 = CSPLayer(c2, c2, n=n, depthwise=False, act=act)

    def forward(self, x):
        return self.cv2(self.cv1(x))

二、修改yolo.py

由于YOLOX里面使用的是Decoupled Head解藕头,所以需要重新设计Detect部分,这里参考了这位博主的博客:YOLO v5 引入解耦头部。
将以下代码,放入yolo.py:

python">class DecoupledHead(nn.Module):
    def __init__(self, ch=256, nc=80, anchors=()):
        super().__init__()
        self.nc = nc  # number of classes
        self.gd = 0.5
        self.nl = len(anchors)  # number of detection layers
        self.na = len(anchors[0]) // 2  # number of anchors
        c_ = int(ch * self.gd)
        self.merge = Conv(ch, c_, 1, 1)
        self.cls_convs1 = Conv(c_, c_, 3, 1, 1)
        self.cls_convs2 = Conv(c_, c_, 3, 1, 1)
        self.reg_convs1 = Conv(c_, c_, 3, 1, 1)
        self.reg_convs2 = Conv(c_, c_, 3, 1, 1)
        self.cls_preds = nn.Conv2d(c_, self.nc * self.na, 1)
        self.reg_preds = nn.Conv2d(c_, 4 * self.na, 1)
        self.obj_preds = nn.Conv2d(c_, 1 * self.na, 1)

    def forward(self, x):
        x = self.merge(x)
        x1 = self.cls_convs1(x)
        x1 = self.cls_convs2(x1)
        x1 = self.cls_preds(x1)
        x2 = self.reg_convs1(x)
        x2 = self.reg_convs2(x2)
        x21 = self.reg_preds(x2)
        x22 = self.obj_preds(x2)
        out = torch.cat([x21, x22, x1], 1)
        return out


class Decoupled_Detect(nn.Module):
    stride = None  # strides computed during build
    onnx_dynamic = False  # ONNX export parameter
    export = False  # export mode

    def __init__(self, nc=80, anchors=(), ch=(), inplace=True):  # detection layer
        super().__init__()

        self.nc = nc  # number of classes
        self.no = nc + 5  # number of outputs per anchor
        self.nl = len(anchors)  # number of detection layers
        self.na = len(anchors[0]) // 2  # number of anchors
        self.grid = [torch.zeros(1)] * self.nl  # init grid
        self.anchor_grid = [torch.zeros(1)] * self.nl  # init anchor grid
        self.register_buffer('anchors', torch.tensor(anchors).float().view(self.nl, -1, 2))  # shape(nl,na,2)
        self.m = nn.ModuleList(DecoupledHead(x, nc, anchors) for x in ch)
        self.inplace = inplace  # use in-place ops (e.g. slice assignment)

    def forward(self, x):
        z = []  # inference output
        for i in range(self.nl):
            x[i] = self.m[i](x[i])  # conv
            bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
            x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

            if not self.training:  # inference
                if self.onnx_dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:
                    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)

                y = x[i].sigmoid()
                if self.inplace:
                    y[..., 0:2] = (y[..., 0:2] * 2 + self.grid[i]) * self.stride[i]  # xy
                    y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
                else:  # for YOLOv5 on AWS Inferentia https://github.com/ultralytics/yolov5/pull/2953
                    xy, wh, conf = y.split((2, 2, self.nc + 1), 4)  # y.tensor_split((2, 4, 5), 4)  # torch 1.8.0
                    xy = (xy * 2 + self.grid[i]) * self.stride[i]  # xy
                    wh = (wh * 2) ** 2 * self.anchor_grid[i]  # wh
                    y = torch.cat((xy, wh, conf), 4)
                z.append(y.view(bs, -1, self.no))

        return x if self.training else (torch.cat(z, 1),) if self.export else (torch.cat(z, 1), x)

    def _make_grid(self, nx=20, ny=20, i=0):
        d = self.anchors[i].device
        t = self.anchors[i].dtype
        shape = 1, self.na, ny, nx, 2  # grid shape
        y, x = torch.arange(ny, device=d, dtype=t), torch.arange(nx, device=d, dtype=t)
        if check_version(torch.__version__, '1.10.0'):  # torch>=1.10.0 meshgrid workaround for torch>=0.7 compatibility
            yv, xv = torch.meshgrid(y, x, indexing='ij')
        else:
            yv, xv = torch.meshgrid(y, x)
        grid = torch.stack((xv, yv), 2).expand(shape) - 0.5  # add grid offset, i.e. y = 2.0 * x - 0.5
        anchor_grid = (self.anchors[i] * self.stride[i]).view((1, self.na, 1, 1, 2)).expand(shape)
        return grid, anchor_grid

修改yolo.py中的parse_model函数:
在下面添加BaseConv、CSPLayer、Dark模块
在这里插入图片描述添加Decoupled_Detect模块
在这里插入图片描述

三、新建yolox.yaml

新建yolox.yaml

# parameters
nc: 80  # number of classes
depth_multiple: 0.33  # expand model depth
width_multiple: 0.5  # expand layer channels

# anchors
anchors:
  - [12,16, 19,36, 40,28]  # P3/8
  - [36,75, 76,55, 72,146]  # P4/16
  - [142,110, 192,243, 459,401]  # P5/32

# yolov4l backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Focus, [64, 3, 1]],  # 0
   [-1, 3, Dark, [128]],  # 1-P1/2
   [-1, 9, Dark, [256]],
   [-1, 9, Dark, [512]],  # 3-P2/4
   [-1, 3, Dark, [1024]],
  ]

# yolov4l head
# na = len(anchors[0])
head:
  [[-1, 1, BaseConv, [512, 1, 1]], # 11
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 3], 1, Concat, [1]],
   [-1, 3, CSPLayer, [512]], # 16

   [-1, 1, BaseConv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 2], 1, Concat, [1]],
   [-1, 3, CSPLayer, [256]], # 21

   [-1, 1, BaseConv, [256, 3, 2]],
   [[-1, 9], 1, Concat, [1]],  # cat
   [-1, 3, CSPLayer, [512]], # 25

   [-1, 1, BaseConv, [512, 3, 2]], # route backbone P3
   [[-1, 5], 1, Concat, [1]],  # cat
   [-1, 3, CSPLayer, [1024]], # 29

   [[12,15,18], 1, Decoupled_Detect, [nc, anchors]],   # Detect(P3, P4, P5)
  ]

四、训练

以上配置完之后,其他操作与训练Yolov5步骤一致,最终训练出来的效果,要比原YoloX训练结果好不少,看起来更加合理,与Yolov5训练结果差距也是在合理范围内。


http://www.niftyadmin.cn/n/153079.html

相关文章

CAPL学习之路-通信对象函数(Part1)

通信对象函数,用于canoe的新型通信模式中,在应用层对soa面向服务的通信对象进行操作 1、事件程序(事件函数) 1.1、on Abstract_EventSubscribed 当Consumer使用抽象绑定订阅事件时,在Provider端调用此事件函数 1.2、on Abstract_EventUnsubscribed 当Consumer使用抽象绑…

fl studio怎么设置中文?如何切换flstudio21中文语言详细操作教程

fl studio怎么设置中文?支持多音轨录音时间拉伸和音高移动原始音频编辑,是一款功能强大的软件音乐制作环境或数字音频工作站。最近有不少小伙伴们,咨询我安装fl studio英文版,怎么设置切换fl studio中文版,fl studio总…

IPv6基础(2)

IPv6基础特殊的IPv6地址接口的IP地址IPv6链路本地单播地址IPv6全局单播地址IPv6组播地址保留IPv6组播地址请求节点组播地址无状态自动配置特殊的IPv6地址 接口的IP地址 单个接口上可以同时开启多个任意类型的IPv6地址,但是必须又link-local地址。每个支持IPv6的地址…

Go语言基础(一篇上手go语言基本语法)

Go简介 Go语言的创始人有三位,分别是图灵奖获得者、C语法联合发明人、Unix之父肯汤普森(Ken Thompson)、Plan 9操作系统领导者、UTF-8编码的最初设计者罗伯派克(Rob Pike),以及Java的HotSpot虚拟机和Chrom…

k8s多master节点集群搭建

k8s多master节点集群搭建单master节点集群搭建添加多个master节点可能遇到的错误单master节点集群搭建 首先搭建一个单master节点的集群,具体可以参考博文k8s单master节点集群搭建:使用kubeadm和kubernetes群集部署与测试、以及K8s集群部署中的变化和注…

深度学习面试题汇总(一)

深度学习面试题汇总(一) 文章目录深度学习面试题汇总(一)1.Dropout1.1Dropout在训练的过程中会随机去掉神经元,那么在编码过程中是怎么处理的呢?1.2dropout的训练过程需要做rescale,这个过程是什…

Python控制Anritsu MT8820C 无线通讯测试仪

目录 前言 MT8820C简介 结构图 实现原理 Step 1:添加设备 Step 2:验证 代码介绍 Step 1: 建立连接 Step 2:构建发送SCPI指令的底层私有函数 Step 3:具体功能的封装 后记 前言 因为要开发一款GSM TX In RX B…

硬件系统工程师宝典(15)-----PCB上的EMC设计,“拿捏了”

各位同学大家好,欢迎继续做客电子工程学习圈,今天我们继续来讲这本书,硬件系统工程师宝典。上篇我们说到PCB常用的多层板叠层结构,综合成本、性能、需求考虑选择不同的叠层结构。今天我们来看看为提高EMC性能,在PCB设计…