YOLO-NAS：一種新的目標檢測模型，超越YOLOv8

► 前言

在深度學習的領域中，神經網絡的架構設計是一個重要而困難的問題。傳統的方法是人工設計或者使用經驗法則來選擇合適的架構，但這種方法往往需要大量的時間和專業知識。為了解決這個問題，2016年Neural Architecture Search with Reinforcement Learning提出一種新的方法，稱為神經網路架構搜索（Neural Architecture Search，NAS），利用優化算法搜尋可能的神經網路空間，並自動找到最佳的架構，從而提高模型的性能和效率，本文將介紹一種基於NAS的物件偵測模型稱為Yolo-NAS。

► 什麼是YOLO-NAS？

目標檢測是計算機視覺中的一項重要任務，它使機器能夠識別和定位圖像或視頻中的物體。這項技術在自動駕駛汽車、面部識別系統等許多應用中發揮了重要作用。推動目標檢測進步的一個關鍵因素是發現了強大的神經網絡架構，例如Faster R-CNN和YOLO等。

YOLO(You Only Look Once)為主流的目標檢測方法之一，第一個版本於2016年推出，通過將目標檢測視為單個回歸問題來改變目標檢測的執行方式，將圖像劃分為網格並同時預測邊界框和類別概率。自第一個YOLO架構問世以來，已經開發多種基於YOLO的架構延伸模型架構，以其準確性、實時性能、邊緣設備及雲端實現目標檢測而聞名，目前最先進的模型的版本為YOLOv5、YOLOv6、YOLOv7和YOLOv8。

然而，現有的YOLO模型仍然面臨一些限制，例如量化支持不足、定位精度不高以及准確性和延遲之間的權衡不足。因此，在 YOLOv8 之後，深度學習公司Deci.ai基於YOLOv6開發出一種新的目標檢測模型 YOLO-NAS，解決之前 YOLO（You Only Look Once）模型的的問題。

Deci.ai公司開發專有神經架構搜索技術AutoNAC產生YOLO-NAS模型。AutoNAC引擎用於確定階段的最佳尺寸和結構，包括塊類型、塊數量和每個階段的通道數量，找到最佳架構。YOLO-NAS模型在包括COCO、Objects365和Roboflow 100在內的知名數據集上進行預訓練模型。Deci.ai從這個區域中採樣了三個點，分別創建YOLO-NAS-S、YOLO-NAS-M和YOLO-NAS-L三種不同大小的模型。

新型YOLO-NAS提供最先進(SOTA)的性能，性能具有無與倫比的精度及速度，優於YOLOv5、YOLOv6、YOLOv7和YOLOv8等模型。

Model	mAP	Latency (ms)
YOLO-NAS S	47.5	3.21
YOLO-NAS M	51.55	5.85
YOLO-NAS L	52.22	7.87
YOLO-NAS S INT-8	47.03	2.36
YOLO-NAS M INT-8	51.0	3.78
YOLO-NAS L INT-8	52.1	4.78

上表中為為官方GitHub提供，內容為Coco 2017 Val數據集中的mAP以及模型在Nvidia T4 GPU上執行640x640圖像進行測試的延遲時間。

►YOLO-NAS的實現

可以使用Google Colab編寫執行程式碼，如果在自己電腦上執行，需要先安裝符合Nvidia顯卡的PyTorch版本，然後安照以下步驟：

Step 1. 在自己電腦上可以安裝anaconda，如果使用Google Colab直接跳至Step 3

conda create --name YoloNas python=3.8 -y
conda activate YoloNas

Step 2. 安裝Torch

Step 3. 安裝super-gradients

pip install super-gradients

透過以上三個步驟，就完成YOLO-NAS環境建置

接下來是程式碼的部分，以下程式碼使用COCO Dataset的格式
設定資料集位置及相關參數

from super_gradients.training.datasets.detection_datasets.coco_format_detection import COCOFormatDetectionDataset
from super_gradients.training.transforms.transforms import DetectionMosaic, DetectionRandomAffine, DetectionHSV, \
    DetectionHorizontalFlip, DetectionPaddedRescale, DetectionStandardize, DetectionTargetsFormatTransform 
from super_gradients.training.utils.detection_utils import DetectionCollateFN, CrowdDetectionCollateFN
from super_gradients.training import dataloaders
from super_gradients.training.datasets.datasets_utils import worker_init_reset_seed


trainset = COCOFormatDetectionDataset(data_dir="./aicheckout",
                                      images_dir="train",
                                      json_annotation_file="train/_annotations.coco.json",
                                      input_dim=(640, 640),
                                      ignore_empty_annotations=False,
                                      transforms=[
                                          DetectionMosaic(prob=1., input_dim=(640, 640)),
                                          DetectionRandomAffine(degrees=0., scales=(0.5, 1.5), shear=0.,
                                                                target_size=(640, 640),
                                                                filter_box_candidates=False, border_value=128),
                                          DetectionHSV(prob=1., hgain=5, vgain=30, sgain=30),
                                          DetectionHorizontalFlip(prob=0.5),
                                          DetectionPaddedRescale(input_dim=(640, 640), max_targets=300),
                                          DetectionStandardize(max_value=255),
                                          DetectionTargetsFormatTransform(max_targets=300, input_dim=(640, 640),
                                                                          output_format="LABEL_CXCYWH")
                                      ])


valset = COCOFormatDetectionDataset(data_dir="./aicheckout",
                                    images_dir="valid",
                                    json_annotation_file="valid/_annotations.coco.json",
                                    input_dim=(640, 640),

ignore_empty_annotations=False,
transforms=[
DetectionPaddedRescale(input_dim=(640, 640), max_targets=300),
DetectionStandardize(max_value=255),
DetectionTargetsFormatTransform(max_targets=300, input_dim=(640, 640),
output_format="LABEL_CXCYWH")
])

train_loader = dataloaders.get(dataset=trainset, dataloader_params={
"shuffle": True,
"batch_size": 4,
"drop_last": False,
"pin_memory": True,
"collate_fn": CrowdDetectionCollateFN(),
"worker_init_fn": worker_init_reset_seed,
"min_samples": 512,
})

valid_loader = dataloaders.get(dataset=valset, dataloader_params={
"shuffle": False,
"batch_size": 4,
"num_workers": 2,
"drop_last": False,
"pin_memory": True,
"collate_fn": CrowdDetectionCollateFN(),
"worker_init_fn": worker_init_reset_seed
})

設定訓練參數，可以調整max_epochs設置最大執行的次數，num_classes及num_cls為幾個類別需要依照資料集類別進行調整

from super_gradients.training.losses import PPYoloELoss
from super_gradients.training.metrics import DetectionMetrics_050
from super_gradients.training.models.detection_models.pp_yolo_e import PPYoloEPostPredictionCallback

train_params = {
    "warmup_initial_lr": 1e-6,
    "initial_lr": 5e-4,
    "lr_mode": "cosine",
    "cosine_final_lr_ratio": 0.1,
    "optimizer": "AdamW",
    "zero_weight_decay_on_bias_and_bn": True,
    "lr_warmup_epochs": 3,
    "warmup_mode": "linear_epoch_step",
    "optimizer_params": {"weight_decay": 0.0001},
    "ema": True,
    "ema_params": {"decay": 0.9, "decay_type": "threshold"},
    "max_epochs": 10,
    "mixed_precision": True,
    "loss": PPYoloELoss(use_static_assigner=False, num_classes=11, reg_max=16),
    "valid_metrics_list": [
        DetectionMetrics_050(score_thres=0.1, top_k_predictions=300, num_cls=11, normalize_targets=True,
                             post_prediction_callback=PPYoloEPostPredictionCallback(score_threshold=0.01,
                                                                                    nms_top_k=1000, max_predictions=300,
                                                                                    nms_threshold=0.7))],

    "metric_to_watch": 'mAP@0.50'}

開始訓練模型，num_classes為幾個類別需要依照資料集類別進行調整，pretrained_weights使用哪種預訓練模型，Models訓練哪種大小的模型(YOLO-NAS-S、YOLO-NAS-M和YOLO-NAS-L)，執行後開始訓練模型

from super_gradients.training import Trainer
from super_gradients.common.object_names import Models
from super_gradients.training import models
from super_gradients.training.processing import ComposeProcessing

net = models.get(Models.YOLO_NAS_S, num_classes=11, pretrained_weights="coco")
trainer = Trainer(experiment_name="AICHECKOUT", ckpt_root_dir="./checkpoints/")
trainer.train(model=net, training_params=train_params, train_loader=train_loader, valid_loader=valid_loader)

測試圖片查看狀況

import os
net = models.get(Models.YOLO_NAS_S, num_classes=11, checkpoint_path=os.path.join(trainer.checkpoints_dir_path, "ckpt_best.pth"))
prediction = net.predict("test/test.jpg", fuse_model=False)
prediction.show()

輸出ONNX，可在自行轉換其他模型

import torch
net.eval()
net.prep_model_for_conversion(input_size=[1, 3, 320, 320])
dummy_input = torch.randn([1, 3, 320, 320], device="cpu")
torch.onnx.export(net, dummy_input, "yolo_nas_s-sg.onnx", opset_version=11)

► 小結

YoloNAS是一種基於NAS技術的YOLO系列模型的延伸，在物體檢測方面提供優異的性能和效率，你可以參考這裡的程式碼訓練自己的模型，希望這篇博文對你有所幫助，謝謝你的閱讀！

► 參考資料

Build with SuperGradients

YOLO-NAS | YOLO新高度，引入NAS，出於YOLOv8而優於YOLOv8

► Q&A

問：YOLO-NAS的全名是什麼？
答：YOLO-NAS的全名是You Only Look Once-Neural Architecture Search。

問：什麼是YOLO-NAS？
答：YOLO-NAS是一種基於神經架構搜索（NAS）的物件偵測方法，它可以自動設計高效且準確的神經網路，用於實時的物件偵測任務。

問：YOLO-NAS有什麼優勢？
答：YOLO-NAS可以根據不同的任務和資源限制，找出最適合的物件偵測模型。YOLO-NAS可以節省人工設計模型的時間和成本，並提高模型的效能和泛化能力。

問：YOLO-NAS適用於哪些場景？
答：需要快速且準確地偵測物件的場景，例如安全監控、醫學影像、人臉識別等。

問：YOLO-NAS需要多少時間和資源？
答：YOLO-NAS的時間和資源消耗取決於搜索空間的大小、評估函數的複雜度和停止條件。一般來說，YOLO-NAS需要幾個小時到幾天的時間，以及一個或多個GPU的資源。

★博文內容均由個人提供，與平台無關，如有違法或侵權，請與網站管理員聯繫。

★文明上網，請理性發言。內容一周內被舉報5次，發文人進小黑屋喔~

YOLO-NAS：一種新的目標檢測模型，超越YOLOv8

評論