训练yolov8+SAM的过程记录

news/2024/7/10 22:41:03 标签: YOLO

1-首先将拿到的数据集进行重新命名（dataset1：是经过校色之后裁剪的图片；dataset2：原图）
图片文件从1.jpg开始命名的代码：

folder_path = r'C:\Users\23608\Desktop\Luli_work\data\fanStudent\tongueseg\Fan\Fan\.jpg'
new_folder = r'C:\Users\23608\Desktop\Luli_work\data\fanStudent\tongueseg\imgOrig'

jpg_files = [f for f in os.listdir(folder_path) if f.endswith('.jpg')]
n = 1
for i, jpg_file in enumerate(jpg_files):
    new_filename = f'{
     n}.jpg'
    n =n+1
    
    # 构造原始文件和新文件的完整路径
    original_path = os.path.join(folder_path, jpg_file)
    new_path = os.path.join(new_folder, new_filename)
    
    # 复制文件到新文件夹并重命名
    shutil.copy(original_path, new_path)
    
print("重命名完成！")

2-将数据预处理之后的数据上传到服务器，接着使用yolov8SAM代码将代码中的舌体掩码跑出来：
数据存放位置
Imgorig：/share1/luli/tongueseg/data/dataset2/imgOrig/
IMgcrop：/share1/luli/tongueseg/data/dataset1/imgCrop/

微调SAM的查找

How to Fine-Tune Segment Anything

1. How to Fine-Tune Segment Anything

We gave an overview of the SAM architecture in the introduction section. The image encoder has a complex architecture with many parameters. To fine-tune the model, it makes sense for us to focus on the mask decoder which is lightweight and therefore easier, faster and more memory efficient to fine-tune.

In order to fine tune SAM, we need to extract the underlying pieces of its architecture (image and prompt encoders, mask decoder). We cannot use SamPredictor.predict (link) for two reasons:

· We want to fine tune only the mask decoder
· This function calls SamPredictor.predict_torch which has the @torch.no_grad() decorator (link), which prevents us from computing gradients

Thus, we need to examine the SamPredictor.predict function and call the appropriate functions with gradient calculation enabled on the part we want to fine tune (the mask decoder). Doing this is also a good way to learn more about how SAM works.

2. Creating a Custom Dataset

We need three things to fine tune our model:(这里其实并没有说GT、datase的具体类型)

· Images on which to draw segmentations
· Segmentation ground truth masks
· Prompts to feed into the model

后续在代码里面可以看到图片代码里面是png格式，mask掩码是黑白二值图。

我现在使用的是labelme标记的舌头，json文件，需要把json文件转化成二值图。

这里涉及到3个数据类型的转化：

· png图片转json文件（也就是使用yolov8+SAM对数据集进行简单的分割之后再人工进行微调，此时的微调使用的是labelme格式是json，这里使用到TongueSAM里马赛克json里面的代码）
· json文件转成png图片，这里使用到labelme里面的自己的代码（参考链接）[labelme] json格式批量转换为mask.png，步骤入下：
1.使用labelme制作语义分割数据集，生成.json格式文件，将所有放置于一个文件夹下。
2.找到labelme安装位置的json_to_dataset.py文件，（可以使用Everything软件）
用下面的代码替换里面的代码：

import argparse
import json
import os
import os.path as osp
import warnings
import copy
import numpy as np
import PIL.Image
from skimage import io
import yaml
from labelme import utils

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('json_file')   # 标注文件json所在的文件夹
    parser.add_argument('-o', '--out', default=None)
    args = parser.parse_args()

    json_file = args.json_file

    list = os.listdir(json_file)   # 获取json文件列表
    for i in range(0, len(list)):
        path = os.path.join(json_file, list[i])  # 获取每个json文件的绝对路径
        filename = list[i][:-5]       # 提取出.json前的字符作为文件名，以便后续保存Label图片的时候使用
        extension = list[i][-4:]
        if extension