YOLOv5目标检测学习（7）：验证部分val.py简要分析；训练、验证、推理三文件的关系

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档

文章目录

前言
一、val.py的大致结构如下：
- 1.0 准备工作
- - 1.获取文件路径
  - 2.存储预测信息为.txt文件
  - 3.存储预测信息为coco格式的.json文件
- 1.1 主函数main：解析命令行参数，调用run（）函数
- 1.2 run函数
- - ①传参
  - ②模型的初始化和设备设置，以及加载模型和数据
  - ③模型设置为评估模式、CUDA加速、数据集类型、类别数以及用于计算mAP的IoU向量
  - ④数据加载器的设置和模型评估
  - ⑤计算指标、打印结果、打印处理速度
  - ⑥绘制图表、保存JSON文件以及返回评估结果
- 总结run（）函数
二、训练train、验证val、推理detcet三文件的关系
- 1.三文件的作用
- 2.三者的关系
- - ①数据集分为train训练用数据集\val验证用数据集\test测试用数据集,即训练集、验证集、测试集
  - ②train是第一步，在每一轮epoch训练结束后，都会用val去验证当前模型的mAP、混淆矩阵等指标以及各个超参数是否是最佳，得到一个best模型后再用detcet去实际应用。
  - ②在评估模型结果时候，使用train训练出来的最好的模型best.pt，去运行val.py（，这个得到的结果能用来当论文最终评价指标，而实际做应用做检测任务，用test（detcet）来做。

前言

一、val.py的大致结构如下：

def save_one_txt(predn, save_conf, shape, file):
	# save txt

def save_one_json(predn, jdict, path, class_map):
	# Save one JSON result {"image_id": 42, 
	#						"category_id": 18, 
	#						"bbox": [258.15, 41.29, 348.26, 243.78], 
	#						"score": 0.236}

def process_batch(detections, labels, iouv):
    """
    Return correct predictions matrix. Both sets of boxes are in (x1, y1, x2, y2) format.
    Arguments:
        detections (Array[N, 6]), x1, y1, x2, y2, conf, class
        labels (Array[M, 5]), class, x1, y1, x2, y2
    Returns:
        correct (Array[N, 10]), for 10 IoU levels
    """
    # 计算指标的关键函数之一
    # iou：[0.5:0.95]，10个不同的iou阈值下，计算标签与预测的匹配结果，存于矩阵，标记是否预测正确
    
@torch.no_grad()
def run(
        data,
        weights=None,  # model.pt path(s)
        batch_size=32,  # batch size
        imgsz=640,  # inference size (pixels)
        conf_thres=0.001,  # confidence threshold
        iou_thres=0.6,  # NMS IoU threshold
        task='val',  # train, val, test, speed or study
        device='',  # cuda device, i.e. 0 or 0,1,2,3 or cpu
        workers=8,  # max dataloader workers (per RANK in DDP mode)
        ...
        ...
):
	"""
	# 函数run()的处理流程如下：
		1. 加载模型；
		2. 加载数据；
		3. 网络预测，NMS处理；
		4. 计算AP，mAP；
		5. 绘制指标图；
		6. 保存结果；
	"""

def parse_opt():
	# 运行相关参数定义

def main(opt):
	# 入口函数
	run(**vars(opt))
	
if __name__ == "__main__":
    opt = parse_opt()
    main(opt)

1.0 准备工作

1.获取文件路径

FILE = Path(__file__).resolve() #获取当前文件的绝对路径，D://yolov5/val.py
ROOT = FILE.parents[0]  # YOLOv5 root directory，当前文件的父目录（上一级目录），D://yolov5/
if str(ROOT) not in sys.path:
    sys.path.append(str(ROOT))  # add ROOT to PATH，把root添加到运行路径
ROOT = Path(os.path.relpath(ROOT, Path.cwd()))  # relative，将root设置为相对路径

2.存储预测信息为.txt文件

def save_one_txt(predn, save_conf, shape, file):
    # Save one txt result
    gn = torch.tensor(shape)[[1, 0, 1, 0]]  # normalization gain whwh，gn = [w, h, w, h] 对应图片的宽高  用于后面归一化
    for *xyxy, conf, cls in predn.tolist():# tolist：变为列表
        xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh，将左上角和右下角的xyxy格式转为xywh(中心点位置+宽高)格式，并归一化，转化为列表再保存
        line = (cls, *xywh, conf) if save_conf else (cls, *xywh)  # label format，若save_conf为true，则line的形式是："类别 xywh 置信度"，否则line的形式是： "类别 xywh"，
        with open(file, 'a') as f:
            f.write(('%g ' * len(line)).rstrip() % line + '\n') # 写入对应的文件夹里，路径默认为“runs\detect\exp*\labels”

3.存储预测信息为coco格式的.json文件

def save_one_json(predn, jdict, path, class_map):
    # Save one JSON result {"image_id": 42, "category_id": 18, "bbox": [258.15, 41.29, 348.26, 243.78], "score": 0.236}
    image_id = int(path.stem) if path.stem.isnumeric() else path.stem#获取图片ID
    box = xyxy2xywh(predn[:, :4])  # xywh，转换为中心点坐标和宽、高的形式
    box[:, :2] -= box[:, 2:] / 2  # xy center to top-left corner
    for p, b in zip(predn.tolist(), box.tolist()):
        jdict.append({
            'image_id': image_id, #图片ID
            'category_id': class_map[int(p[5])], #类别
            'bbox': [round(x, 3) for x in b], #预测框位置
            'score': round(p[4], 5)}) #预测得分

注意：之前的的xyxy格式是左上角右下角坐标，xywh是中心的坐标和宽高，而coco的json格式的框坐标是xywh(左上角坐标 + 宽高)，所以 box[:, :2] -= box[:, 2:] / 2 这行代码是将中心点坐标 -> 左上角坐标
zip（）：每次从predn.tolist()和box.tolist()里各拿一个组成新的元组，分别赋值给p，b

1.1 主函数main：解析命令行参数，调用run（）函数

不用多说了，训练、验证、推理都是这样的结构。

1.2 run函数

①传参

def run(
    data,
    weights=None,  # model.pt path(s)
    batch_size=32,  # batch size
    imgsz=640,  # inference size (pixels)
    conf_thres=0.001,  # confidence threshold
    iou_thres=0.6,  # NMS IoU threshold
    max_det=300,  # maximum detections per image
    task="val",  # train, val, test, speed or study
    device="",  # cuda device, i.e. 0 or 0,1,2,3 or cpu
    workers=8,  # max dataloader workers (per RANK in DDP mode)
    single_cls=False,  # treat as single-class dataset
    augment=False,  # augmented inference
    verbose=False,  # verbose output
    save_txt=False,  # save results to *.txt
    save_hybrid=False,  # save label+prediction hybrid results to *.txt
    save_conf=False,  # save confidences in --save-txt labels
    save_json=False,  # save a COCO-JSON results file
    project=ROOT / "runs/val",  # save to project/name
    name="exp",  # save to project/name
    exist_ok=False,  # existing project/name ok, do not increment
    half=True,  # use FP16 half-precision inference
    dnn=False,  # use OpenCV DNN for ONNX inference
    model=None,
    dataloader=None,
    save_dir=Path(""),
    plots=True,
    callbacks=Callbacks(),
    compute_loss=None,
):

②模型的初始化和设备设置，以及加载模型和数据

# Initialize/load model and set device
    training = model is not None
    if training:  # called by train.py
        device, pt, jit, engine = next(model.parameters()).device, True, False, False  # get model device, PyTorch model
        half &= device.type != "cpu"  # half precision only supported on CUDA
        model.half() if half else model.float()
    else:  # called directly
        device = select_device(device, batch_size=batch_size)

        # Directories
        save_dir = increment_path(Path(project) / name, exist_ok=exist_ok)  # increment run
        (save_dir / "labels" if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir

        # Load model
        model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)
        stride, pt, jit, engine = model.stride, model.pt, model.jit, model.engine
        imgsz = check_img_size(imgsz, s=stride)  # check image size
        half = model.fp16  # FP16 supported on limited backends with CUDA
        if engine:
            batch_size = model.batch_size
        else:
            device = model.device
            if not (pt or jit):
                batch_size = 1  # export.py models default to batch-size 1
                LOGGER.info(f"Forcing --batch-size 1 square inference (1,3,{imgsz},{imgsz}) for non-PyTorch models")

        # Data
        data = check_dataset(data)  # check

③模型设置为评估模式、CUDA加速、数据集类型、类别数以及用于计算mAP的IoU向量

# Configure
    model.eval()
    cuda = device.type != "cpu"
    is_coco = isinstance(data.get("val"), str) and data["val"].endswith(f"coco{os.sep}val2017.txt")  # COCO dataset
    nc = 1 if single_cls else int(data["nc"])  # number of classes
    iouv = torch.linspace(0.5, 0.95, 10, device=device)  # iou vector for mAP@0.5:0.95
    niou = iouv.numel()

④数据加载器的设置和模型评估

数据加载器设置：如果不是训练模式，则进行一系列设置，包括检查权重是否在相同数据集上训练、模型预热、设置推理时的填充和矩形参数等。
确定任务类型为训练、验证或测试，并创建数据加载器。
评估过程：初始化一些变量，如混淆矩阵、类别名称、类别映射等。
针对数据加载器中的每个批次进行评估操作，包括数据准备、推理、损失计算、非极大值抑制、指标计算等。
根据预测结果和标签计算指标，如准确率、召回率、mAP等。根据需要保存结果到文本文件或JSON文件，并进行可视化绘图。
在评估过程中运行回调函数，用于处理评估过程中的特定事件。

# Dataloader
    if not training:
        if pt and not single_cls:  # check --weights are trained on --data
            ncm = model.model.nc
            assert ncm == nc, (
                f"{weights} ({ncm} classes) trained on different --data than what you passed ({nc} "
                f"classes). Pass correct combination of --weights and --data that are trained together."
            )
        model.warmup(imgsz=(1 if pt else batch_size, 3, imgsz, imgsz))  # warmup
        pad, rect = (0.0, False) if task == "speed" else (0.5, pt)  # square inference for benchmarks
        task = task if task in ("train", "val", "test") else "val"  # path to train/val/test images
        dataloader = create_dataloader(
            data[task],
            imgsz,
            batch_size,
            stride,
            single_cls,
            pad=pad,
            rect=rect,
            workers=workers,
            prefix=colorstr(f"{task}: "),
        )[0]

    seen = 0
    confusion_matrix = ConfusionMatrix(nc=nc)
    names = model.names if hasattr(model, "names") else model.module.names  # get class names
    if isinstance(names, (list, tuple)):  # old format
        names = dict(enumerate(names))
    class_map = coco80_to_coco91_class() if is_coco else list(range(1000))
    s = ("%22s" + "%11s" * 6) % ("Class", "Images", "Instances", "P", "R", "mAP50", "mAP50-95")
    tp, fp, p, r, f1, mp, mr, map50, ap50, map = 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0
    dt = Profile(device=device), Profile(device=device), Profile(device=device)  # profiling times
    loss = torch.zeros(3, device=device)
    jdict, stats, ap, ap_class = [], [], [], []
    callbacks.run("on_val_start")
    pbar = tqdm(dataloader, desc=s, bar_format=TQDM_BAR_FORMAT)  # progress bar
    for batch_i, (im, targets, paths, shapes) in enumerate(pbar):
        callbacks.run("on_val_batch_start")
        with dt[0]:
            if cuda:
                im = im.to(device, non_blocking=True)
                targets = targets.to(device)
            im = im.half() if half else im.float()  # uint8 to fp16/32
            im /= 255  # 0 - 255 to 0.0 - 1.0
            nb, _, height, width = im.shape  # batch size, channels, height, width

        # Inference
        with dt[1]:
            preds, train_out = model(im) if compute_loss else (model(im, augment=augment), None)

        # Loss
        if compute_loss:
            loss += compute_loss(train_out, targets)[1]  # box, obj, cls

        # NMS
        targets[:, 2:] *= torch.tensor((width, height, width, height), device=device)  # to pixels
        lb = [targets[targets[:, 0] == i, 1:] for i in range(nb)] if save_hybrid else []  # for autolabelling
        with dt[2]:
            preds = non_max_suppression(
                preds, conf_thres, iou_thres, labels=lb, multi_label=True, agnostic=single_cls, max_det=max_det
            )

        # Metrics
        for si, pred in enumerate(preds):
            labels = targets[targets[:, 0] == si, 1:]
            nl, npr = labels.shape[0], pred.shape[0]  # number of labels, predictions
            path, shape = Path(paths[si]), shapes[si][0]
            correct = torch.zeros(npr, niou, dtype=torch.bool, device=device)  # init
            seen += 1

            if npr == 0:
                if nl:
                    stats.append((correct, *torch.zeros((2, 0), device=device), labels[:, 0]))
                    if plots:
                        confusion_matrix.process_batch(detections=None, labels=labels[:, 0])
                continue

            # Predictions
            if single_cls:
                pred[:, 5] = 0
            predn = pred.clone()
            scale_boxes(im[si].shape[1:], predn[:, :4], shape, shapes[si][1])  # native-space pred

            # Evaluate
            if nl:
                tbox = xywh2xyxy(labels[:, 1:5])  # target boxes
                scale_boxes(im[si].shape[1:], tbox, shape, shapes[si][1])  # native-space labels
                labelsn = torch.cat((labels[:, 0:1], tbox), 1)  # native-space labels
                correct = process_batch(predn, labelsn, iouv)
                if plots:
                    confusion_matrix.process_batch(predn, labelsn)
            stats.append((correct, pred[:, 4], pred[:, 5], labels[:, 0]))  # (correct, conf, pcls, tcls)

            # Save/log
            if save_txt:
                (save_dir / "labels").mkdir(parents=True, exist_ok=True)
                save_one_txt(predn, save_conf, shape, file=save_dir / "labels" / f"{path.stem}.txt")
            if save_json:
                save_one_json(predn, jdict, path, class_map)  # append to COCO-JSON dictionary
            callbacks.run("on_val_image_end", pred, predn, path, names, im[si])

        # Plot images
        if plots and batch_i < 3:
            plot_images(im, targets, paths, save_dir / f"val_batch{batch_i}_labels.jpg", names)  # labels
            plot_images(im, output_to_target(preds), paths, save_dir / f"val_batch{batch_i}_pred.jpg", names)  # pred

        callbacks.run("on_val_batch_end", batch_i, im, targets, paths, shapes, preds)

⑤计算指标、打印结果、打印处理速度

计算指标：将统计数据转换为NumPy数组（stats = [torch.cat(x, 0).cpu().numpy() for x in
zip(*stats)]）。根据统计数据计算各类别的准确率、召回率、F1分数、AP等指标（tp, fp, p, r, f1, ap,
ap_class = ap_per_class(*stats, plot=plots, save_dir=save_dir,
names=names)）。计算平均准确率、召回率、mAP等指标。
打印结果：统计每个类别的目标数量（nt = np.bincount(stats[3].astype(int),
minlength=nc)）。打印整体结果和每个类别的结果，包括目标数量、准确率、召回率、AP等指标。
如果没有找到标签，则打印警告信息。
打印速度：计算预处理、推理和NMS的速度，并打印每个图像的处理时间。

# Compute metrics
    stats = [torch.cat(x, 0).cpu().numpy() for x in zip(*stats)]  # to numpy
    if len(stats) and stats[0].any():
        tp, fp, p, r, f1, ap, ap_class = ap_per_class(*stats, plot=plots, save_dir=save_dir, names=names)
        ap50, ap = ap[:, 0], ap.mean(1)  # AP@0.5, AP@0.5:0.95
        mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean()
    nt = np.bincount(stats[3].astype(int), minlength=nc)  # number of targets per class

    # Print results
    pf = "%22s" + "%11i" * 2 + "%11.3g" * 4  # print format
    LOGGER.info(pf % ("all", seen, nt.sum(), mp, mr, map50, map))
    if nt.sum() == 0:
        LOGGER.warning(f"WARNING ⚠️ no labels found in {task} set, can not compute metrics without labels")

    # Print results per class
    if (verbose or (nc < 50 and not training)) and nc > 1 and len(stats):
        for i, c in enumerate(ap_class):
            LOGGER.info(pf % (names[c], seen, nt[c], p[i], r[i], ap50[i], ap[i]))

    # Print speeds
    t = tuple(x.t / seen * 1e3 for x in dt)  # speeds per image
    if not training:
        shape = (batch_size, 3, imgsz, imgsz)
        LOGGER.info(f"Speed: %.1fms pre-process, %.1fms inference, %.1fms NMS per image at shape {shape}" % t)

⑥绘制图表、保存JSON文件以及返回评估结果

绘制图表：如果需要绘制图表（plots=True），则绘制混淆矩阵图表，并运行评估结束时的回调函数。
保存JSON文件：如果需要保存JSON文件且存在预测结果（save_json=True and
len(jdict)），则保存预测结果到JSON文件中。
使用pycocotools库进行COCO数据集的评估，计算mAP和mAP@0.5，并打印评估结果。
返回结果：将模型转换为浮点数格式（model.float()）。
如果不是训练模式，则返回结果，包括平均准确率、平均召回率、mAP@0.5、mAP等指标，每个类别的mAP值，以及处理速度。

# Plots
    if plots:
        confusion_matrix.plot(save_dir=save_dir, names=list(names.values()))
        callbacks.run("on_val_end", nt, tp, fp, p, r, f1, ap, ap50, ap_class, confusion_matrix)

    # Save JSON
    if save_json and len(jdict):
        w = Path(weights[0] if isinstance(weights, list) else weights).stem if weights is not None else ""  # weights
        anno_json = str(Path("../datasets/coco/annotations/instances_val2017.json"))  # annotations
        if not os.path.exists(anno_json):
            anno_json = os.path.join(data["path"], "annotations", "instances_val2017.json")
        pred_json = str(save_dir / f"{w}_predictions.json")  # predictions
        LOGGER.info(f"\nEvaluating pycocotools mAP... saving {pred_json}...")
        with open(pred_json, "w") as f:
            json.dump(jdict, f)

        try:  # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb
            check_requirements("pycocotools>=2.0.6")
            from pycocotools.coco import COCO
            from pycocotools.cocoeval import COCOeval

            anno = COCO(anno_json)  # init annotations api
            pred = anno.loadRes(pred_json)  # init predictions api
            eval = COCOeval(anno, pred, "bbox")
            if is_coco:
                eval.params.imgIds = [int(Path(x).stem) for x in dataloader.dataset.im_files]  # image IDs to evaluate
            eval.evaluate()
            eval.accumulate()
            eval.summarize()
            map, map50 = eval.stats[:2]  # update results (mAP@0.5:0.95, mAP@0.5)
        except Exception as e:
            LOGGER.info(f"pycocotools unable to run: {e}")

    # Return results
    model.float()  # for training
    if not training:
        s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ""
        LOGGER.info(f"Results saved to {colorstr('bold', save_dir)}{s}")
    maps = np.zeros(nc) + map
    for i, c in enumerate(ap_class):
        maps[c] = ap[i]
    return (mp, mr, map50, map, *(loss.cpu() / len(dataloader)).tolist()), maps, t

总结run（）函数

这个run函数主要完成了模型在验证集上的评估过程，包括以下几个关键步骤：

数据加载器设置：根据验证集数据设置数据加载器，准备进行模型评估。
模型评估过程：对每个批次的数据进行推理、损失计算、非极大值抑制、指标计算等操作，生成评估统计数据。
计算评估指标：根据统计数据计算各类别的准确率、召回率、mAP等指标，并打印结果。
绘制图表和保存结果：根据需要绘制混淆矩阵图表，保存预测结果到JSON文件，并进行COCO数据集的评估。
返回结果：将评估结果返回，包括平均准确率、平均召回率、mAP@0.5、mAP等指标，每个类别的mAP值，以及处理速度。