With a lot of ML projects accuracy is fairly straightforward. But when doing object detection, there are a lot of nuances once you start to think about it. I will give you a quick overview of what goes into the calculations and what we are reporting. There are a lot of good articles that go into the subject deeper if you get interested.
Some terms to start with.
GroundTruth. This refers to an image that was labeled by hand. As far as the ML can tell, when it evaluates against an image like this, it is the truth.
Predicted objects - This is when the trained neural network detects an object in an image. Usually the GroundTruth image, without the bounding boxes so that it can compare how it is doing.
Epoch - When you provide a dataset of labeled images to the training, we take 70% of the images as input to the neural network for training, and 30% of the images for validating the trained neural network against. One Epoch is when it has gone through a cycle of training and validating with all the supplied images. A neural network gets better when it goes through many cycles of the training and validating.
We have set up our training to look at the accuracy information after every 100 Epochs.
Now for some of the nuances in looking at the accuracy. Let’s say we have a GroundTruth image with 5 cargo and 4 hatches labeled on it. To start with let’s just look at Cargo. The systems takes the GroundTruth image and compares it with the hand labeled bounding boxes and the predicted bounding boxes. The prediction shows 4 cargo. 3 are actual cargo and 1 is the person in the stands with an orange shirt on. There are two interesting numbers here. One is called Recall. The Recall is out of 5 true cargos it recognized 3. This means the recall is 3/5. The next number is precision. The precision is that out of 4 recognized cargo 3 where correct. Making the Precision 3/4. These numbers many times turn out to be inversely related. So when there is more recall, the accuracy may go down, more false positives. And when the recall goes down there may be better accuracy or fewer false positives. In order to find a good way to represent these two numbers another number is calculated. This is call the Average Precision (AP). This is done by taking the amount of precision at each level of recall, creating a Precision Recall curve. The calculation for this is very interesting and worth looking up if you are interested. All of these are done every 100 Epochs in our case. Then this is done for each of the classes, Cargo and then Hatches. And the mean, or average, are taken for the AP of each of the classes to make the mean Average Precision or mAP. This gives us a pretty good feeling for the accuracy, but there is one more consideration around locality. If you take the bounding box from the GroundTruth for and object and the predicted bounding box and overlap them, you want at least 50% of those boxes to intersect. If not, the predicted object is not accurate enough to be considered a match. This overlapping intersection is referred to as Intersection over Union (IoU). Different accuracy checks use different percentages of IoU, but we are going to stick with >50%.
So now if we look at the mAP for an IoU greater than 50% it gives us a level of accuracy that we can understand. When training a network you want this number to go up over the number of Epochs. Not too fast, and many times there will be an early peak to this number, but that can be due to what is called overfitting, where it is just matching the validation images, but won’t work as well on the general case of the video camera on your robot. So you want to train until this number stops going up when looking at a smoothed curve. If the accuracy never gets up that high, it may be that you need to change some of your hyperparameters, like the number of Epochs or Batch size. Or it could be that you need more labeled or better labeled images in your dataset. This tuning is part of the “magic” of machine learning. Certainly over time one gets a better intuition on how to tune a network, but there is at this time a direct correlation to when you see this type of curve, change this parameter to improve it.