Real Time Custom Object Detection

栏目: IT技术 · 发布时间: 3年前

Real Time Custom Object Detection

A guide to create a Custom Object Detector — Part 2

Jun 10 ·8min read

Real Time Custom Object Detection

Photo by Jacek Dylag on Unsplash

In this article we will test the Custom trained Darknet model from my previous article

C itations: The video output feed is available on YouTube by Bloomberg Quicktake. Image of a window is a screenshot of my personal computer. The output image feed is taken from an open source dataset from Kaggle . Huge thanks to Shauryasikt Jena

In my last article, we saw how to create a custom mask detector using darknet. It would be more fun to see it in action, wouldn't it ;)

So let’s make it work and yeah, the steps are way easier than the one to train the model because you have already installed the required libraries if you have followed my previous article (Phew!).

If you haven’t, Keep Calm :), you can check everything in detail by going on my article.

Here’s my previous article —

This entire code is executed using a CPU . If you are writing the video output, you don’t need a GPU, the video is written according to your preferred frames per second value. For writing a video file, check out step 10.

Contents

  1. Importing Libraries
  2. Getting the generated files from training
  3. Reading the net
  4. Some Preprocessing of input images
  5. Confidence scores, ClassId, Coordinates of Bounding Boxes
  6. Non Maximum Suppression (NMS)
  7. Drawing bounding boxes
  8. Usage
  9. Writing to a file(optional)

Okay… let’s make it work! Please go through the entire article so that you don’t miss out anything. Thanks :)

1. Importing Libraries

Please import these libraries.

import tensorflow as tf
import numpy as np
import cv2
import pandas as pd
import time
import os
import matplotlib.pyplot as plt
from PIL import Image

N ote: You also need ffmpeg==4.2.2+ to write the video output file. Please go through my previous article if you’re having any issues. Once you have ffmpeg make sure you are running everything in the same anaconda environment in which you have installed ffmpeg.

That’s all you need, let’s go to the important next step!

2. Collecting files from Darknet and Trained model

This is a very crucial step for our object detector to roll. I am listing these files down below, ensure you have these files.

  1. Custom .names file
  2. Custom .cfg file

N ote: We created these files just before our training, so if you are missing any one of them, your model will give you a hard time. These two files are very specific to your custom object detector, my previous article will guide you what changes can be made. You can chill out! it just takes a minute to create these files, if followed every detail :)

3. Custom .weights file

Okay… let’s pause here for a minute to understand exactly how you get it.

This file is known as the weights file, it is generally a large file also depending on your training size(for me it was 256mb). You get this file when your training has completed. In my case, the file name which I used was yolov3_custom_train_3000.weights. Here, ‘3000’ means that the file was generated after completing 3000 epochs. If you have gone through the .cfg file, you’ll find the epochs set to be 6000.

So why didn’t I go with ‘yolov3_custom_train_6000.weights’?

This was because after some testing I found out that the weights file generated after 3000 epochs had the best accuracy among every weights file generated actually, not just the ‘6000’ one.

Okay. So more epochs should mean more accuracy right?

Not really :(

More epochs can also mean overfitting which can drastically reduce the accuracy. My training data might have had some duplicate images, or I might have labelled some incorrectly (Yeah I know.. it was a tedious task so uh.. you know how the mind deviates right) which indeed had a direct impact on accuracy.

3. Reading the net

Now.. the testing part starts. I will try my best to make it easy and simple to follow and obviously, understand side by side :)

Luckily, cv2 has a built-in function.

net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]

These beautiful functions makes our day way easier by directly reading the network model stored in Darknet model files and setting them up to for our detector code(Yaaasss!!).

For more info on the function —

You also need to get the labels from the ‘yolo.names’ file..

LABELS = open(labelsPath).read().strip().split("\n")

N ote: configPath, weightsPath and labelsPath contain the paths to the respective files

4. Some Preprocessing of input images

These are some steps we need to do for our model to get some preprocessed images. The preprocessing includes Mean Subtraction and Scaling .

(H, W) = image.shape[:2]
blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416),
       swapRB=True, crop=False)
net.setInput(blob)
layerOutputs = net.forward(ln)# Initializing for getting box coordinates, confidences, classid boxes = []
confidences = []
classIDs = []
threshold = 0.15

5. Getting some confidence

Yeah…literally after this step we will have some confidence about our code and better understanding about what we have done and what are we gonna do after this.

for output in layerOutputs:
    for detection in output:
        scores = detection[5:]
        classID = np.argmax(scores)
        confidence = scores[classID]        if confidence > threshold:
            box = detection[0:4] * np.array([W, H, W, H])
            (centerX, centerY, width, height) = box.astype("int")           
            x = int(centerX - (width / 2))
            y = int(centerY - (height / 2))    
            boxes.append([x, y, int(width), int(height)])
            confidences.append(float(confidence))
            classIDs.append(classID)

So what exactly is this code doing?

layerOutputs contain a huge 2D array of float numbers from which we need the coordinates of our “to be” drawn bounding boxes, classid and the confidence scores of each prediction or we can say detection :)

6. Non Maximum Suppression (NMS)

Oh yeah.. this step gave me a hard time initially when I was not providing the correct input data type to it. I also tried some pre-written functions of NMS, but my object detection was so slow…

Real Time Custom Object Detection

Photo by Nick Abrams on Unsplash

After hitting my head for some time (not literally..), I was able to get the correct input datatype by writing the code given in the previous step for this super-fast life-saving function.

idxs = cv2.dnn.NMSBoxes(boxes, confidences, threshold, 0.1)

You can find some info here —

So what is NMS?

The model returns more than one predictions, hence more than one boxes are present to a single object. We surely don’t want that. Thanks to NMS, it returns a single best bounding box for that object.

To get a deep understanding of NMS and how it works —

7. Drawing the Bounding Boxes

Aahhaa.. the interesting part. Let’s get our detector running now

mc = 0
nmc = 0if len(idxs) > 0:
   for i in idxs.flatten():
       (x, y) = (boxes[i][0], boxes[i][1])
       (w, h) = (boxes[i][2], boxes[i][3])       if LABELS[classIDs[i]] == 'OBJECT_NAME_1'):
          mc += 1
          color = (0, 255, 0)
          cv2.rectangle(image, (x, y), (x + w, y + h), color, 1)
          text = "{}".format(LABELS[classIDs[i]])
          cv2.putText(image, text, (x + w, y + h),                     
          cv2.FONT_HERSHEY_SIMPLEX,0.5, color, 1)
            
       if (LABELS[classIDs[i]] == 'OBJECT_NAME_2'):
          nmc += 1
          color = (0, 0, 255)
          cv2.rectangle(image, (x, y), (x + w, y + h), color, 1)
          text = "{}".format(LABELS[classIDs[i]])
          cv2.putText(image, text, (x + w, y + h),      
              cv2.FONT_HERSHEY_SIMPLEX,0.5, color, 1)text1 = "No. of people wearing masks: " + str(mc)
text2 = "No. of people not wearing masks: " + str(nmc)
color1 = (0, 255, 0)
color2 = (0, 0, 255)cv2.putText(image, text1, (2, 15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color1, 2)
cv2.putText(image, text2, (2, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color2, 2)

Done!! This code will give you an image/frame containing your bounding boxes

N ote: Be sure to change OBJECT_NAME_1 and OBJECT_NAME_2 according to your object name. This would make your understanding better about your code;)

T ip: I would recommend you to create a function in which you pass an image because later you can use this function for video as well as for an image input ;)

9. Usage

The above code can be used in two ways —

  1. Real Time , that is passing a video to the detector

This can be done by just reading the frame from a video, you can also resize it if you want so that your ‘cv2.imshow’ displays the output frames at a quicker rate that is frames per second. To read a video using cv2 —

N ote: You don’t need to convert the frames obtained to grey-scale.

Now just pass the frame to the function (mentioned in the tip) and boom.. you have your real time object detector ready!

Output Video —

Video output written at 20fps

N ote: The above video output is smooth because I have saved the frames by writing it to a .mp4 file at 20 Frames per Second(fps)

2. Image

You can also test your object detector by just passing a single image. (Yeah.. less fun). To read an image using cv2 —

Output Image —

Real Time Custom Object Detection

Single Image output of object detector

10. Writing to a file(optional)

You might be wondering how I got the video output so smooth, right? Here’s a trick you can use to get your smooth video output…

OpenCV has a function called as cv2.VideoWriter(), you can write your frames by specifying the file name, codecid, fps, and the same resolution as your input field.

Define the variable out outside the while loop in which you are reading each frame of a video

out = cv2.VideoWriter('file_name.mp4', -1, fps,    
         (int(cap.get(3)),int(cap.get(4))))

N ote: The second parameter ‘-1’ is the codecid to be given, but it worked fine for me on my computer. The codecid can be different on your computer. Please visit this site for debugging—

The last parameter will help you to get the resolution of your input video. After this, put the code below in the while loop where your detector function is being called.

while True:
  ....
  ....  
  image = detector(frame) 
  out.write(image)
  ....
  ....

N ote: Your detector function should return an ‘image’

T ip: You can also use ‘moviepy’ to write your frames into video…

Real Time Custom Object Detection

Inference time graph from pjreddie darknet
So that’s it! I hope you have your own custom object detector by now. Cheers!
Thank you for going through the entire article, hope you found it informative. If you have any feedbacks they are most welcome!

以上所述就是小编给大家介绍的《Real Time Custom Object Detection》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

失控的真相

失控的真相

[美] 迈克尔·帕特里克·林奇 / 赵亚男 / 中信出版社 / 2017-6 / 42.00元

编辑推荐 在信息泛滥的时代,知识变得无处不在。鼠标轻轻一点,我们就坐拥一座巨型图书馆。然而,我们并没有因此就离真相更近。相反,互联网的普及使人们早已习惯于凡事问搜索引擎,并形成了一种“搜索即相信”的认知模式。当社交网络把数字人类带入一个个彼此隔绝的线上群体中,我们清楚地看到,真相与谎言在互联网中交织,知识与观念混为一谈,情绪宣泄掩盖了事实分析。联网的世界让我们更容易看到彼此的观点,但同时也制......一起来看看 《失控的真相》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

MD5 加密
MD5 加密

MD5 加密工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换