如何在opencv物件檢測的變數中僅捕獲最后一幀中的新物件-有解無憂

我的問題是我希望我的程式只說出現在相機前的新物件，而不是一直說和重復它看到的物件，就像我希望它看到一個人時只說一次“人”，當新的東西像杯子一樣進入框架時，它只會說一次“杯子”，我被告知要這樣做，我應該在變數中捕獲最后一幀中的最后一個物件，這樣它就會是會說話的。但我不清楚如何在代碼中做到這一點，我正在使用 opencv、yolov3、pyttsx3

import cv2
import numpy as np
import pyttsx3


net = cv2.dnn.readNet('yolov3-tiny.weights', 'yolov3-tiny.cfg')

classes = []
with open("coco.names.txt", "r") as f:
   classes = f.read().splitlines()

cap = cv2.VideoCapture(0)
font = cv2.FONT_HERSHEY_PLAIN
colors = np.random.uniform(0, 255, size=(100, 3))

while True:
    _, img = cap.read()
    height, width, _ = img.shape

    blob = cv2.dnn.blobFromImage(img, 1/255, (416, 416), (0,0,0), swapRB=True, crop=False)
    net.setInput(blob)
    output_layers_names = net.getUnconnectedOutLayersNames()
    layerOutputs = net.forward(output_layers_names)

    boxes = []
    confidences = []
    class_ids = []

    for output in layerOutputs:
        for detection in output:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            if confidence > 0.2:
                center_x = int(detection[0]*width)
                center_y = int(detection[1]*height)
                w = int(detection[2]*width)
                h = int(detection[3]*height)

                x = int(center_x - w/2)
                y = int(center_y - h/2)

                boxes.append([x, y, w, h])
                confidences.append((float(confidence)))
                class_ids.append(class_id)

    indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.2, 0.4)

    if len(indexes)>0:
        for i in indexes.flatten():
            x, y, w, h = boxes[i]
            label = str(classes[class_ids[i]])
            confidence = str(round(confidences[i],2))
            color = colors[i]
            cv2.rectangle(img, (x,y), (x w, y h), color, 2)
            cv2.putText(img, label   " "   confidence, (x, y 20), font, 2, (255,255,255), 2)
            engine = pyttsx3.init()
            engine.say(label)
            engine.runAndWait()

    cv2.imshow('Image', img)
    key = cv2.waitKey(1)
    if key==27:
        break
cap.release()
cv2.destroyAllWindows()

uj5u.com熱心網友回復：

有整篇關于這個的博士論文。不只是幾個。這本身就是一個幾乎整個研究領域。

您需要跟蹤物件。yolo 處理影像，而不是影像序列。從 yolo 的角度來看，一幀中發生的事情與另一幀中發生的事情之間根本沒有關系。幀可以是隨機的、不相關的影像，從照片集中抓取，它是一樣的。

所以你需要在 yolo 之后的第二個階段，要復雜得多，來跟蹤物件。而且由于 yolo 不是一個確定的東西（也不是任何其他 CNN 檢測器），所以第二階段不能簡單地更新檢測到的物件的字典。某些物件只會在一半的幀中被檢測到，無論如何您都需要跟蹤它們。其他一些人（例如 2 個并排的行人）可能會被混淆，因為第 t 1 幀的行人 B 與第 t 幀的行人 A 的位置相同。然而，你想追蹤他們

我不能只給你一些代碼作為答案。它比適合這樣的問答要大得多。

但是我可以為您指出一種，通常與 YOLO 結合使用的跟蹤物件的演算法，即SORT，它的一個實作是this。

你可以很容易地找到很多關于如何表達 YOLO SORT 的描述。像這一篇（最近的一篇。YOLO SORT在2020年之前已經是經典的方式了，所以不知道這篇具體的文章在2022年有什么新的東西。也許沒什么，所有的文章都不是新的，即使應該是。我隨便挑了一個）

或者，由于現在人們通常喜歡視頻而不是書面檔案，因此您可以觀看本教程

但是，好吧，無論你如何學習這個，你會發現它不僅僅是問 SO。

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/524849.html

標籤：opencv计算机视觉物体检测约洛pyttsx3

上一篇：當我在顫振的initstate中使用api時沒有獲取用戶串列

下一篇：有沒有辦法在非零numpy陣列值周圍創建多個邊界框？