EfficientDet Implementation for Object Detection

相关文章推荐

知识渊博的柚子 · 2023中国南通江海国际文化旅游节新闻发布会 ...· 1 年前 ·

沉着的胡萝卜 · 广东省市场监督管理局关于2023年广东省特种 ...· 1 年前 ·

傻傻的地瓜 · 方舟生存进化木制梯子怎么做 ...· 2 年前 ·

谦和的拐杖 · 蓝厂要爆发了？首发IMX866及GNV两大传 ...· 2 年前 ·

活泼的石榴 · 遵义市人民政府- 遵义教育· 2 年前 ·

I have been interested in Machine Learning but left it untouched for years. I finally decided to start training myself so that I get insight into data usage and the capability of coding by myself. I found an interesting image competition and started with it. The competition had already finished but the data is still available and I can still submit my prediction and get the score.

The tasks are straightforward, including object detection and classification. I found EfficientDet as a useful model these days that manages both of these tasks, and decided to develop a model with it. However, the implementation was extremely hard. Some error messages require me to edit the packages imported, which I couldn't manage in Kaggle notebook. Therefore, I re-started the implementation of EfficientDet with a simple data.

A useful example I found is the blog written in Japanese in Oct 2021. The setting is to detect a red circle on a black square background. The source code on the blog worked in most parts, but I met some errors when I tested it in April 2023. The below is a note of the errors and remedies, and the accuracy of result.

# Error message
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-9f3114f08672> in <cell line: 19>()
     33     targets['cls'] = targets['cls']
     34     optimizer.zero_grad()
---> 35     losses = bench(inputs, targets)
     36     loss = losses['loss']
     37     loss.backward()
/usr/local/lib/python3.9/dist-packages/effdet/anchors.py in batch_label_anchors(self, gt_boxes, gt_classes, filter_valid)
    396                     cls_targets[count:count + steps].view([feat_size[0], feat_size[1], -1]))
    397                 box_targets_out[level_idx].append(
--> 398                     box_targets[count:count + steps].view([feat_size[0], feat_size[1], -1]))
    399                 count += steps
    400                 if last_sample:
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
    > Line 398 Before correction
  #box_targets[count:count + steps].view([feat_size[0], feat_size[1], -1]))
> After correction
  box_targets[count:count + steps].reshape([feat_size[0], feat_size[1], -1]))
    The dataset makes augmentation processes before outputting the data. These processes include in the first part randomly cropping the input image, which sometimes delete the information of bounding boxes and labels when the bounding boxes are cropped out from the original image. The sample code defines the process if this case happens, but it only defines the new bounding box and doesn't define the new labels, which causes the error.

class CircleDataset(Dataset):
  def __getitem__(self, idx):
    if bboxes.shape[0] == 0:
      bboxes = torch.zeros([1, 4], dtype=bboxes.dtype)
    return x, y
    # Error message
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-10-9f3114f08672> in <cell line: 19>()
     28   t = tqdm(loader, leave=False)
---> 30   for inputs, targets in t:
     31     inputs = inputs
     32     targets['bbox'] = targets['bbox']
/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/collate.py in collate_tensor_fn(batch, collate_fn_map)
    161         storage = elem.storage()._new_shared(numel, device=elem.device)
    162         out = elem.new(storage).resize_(len(batch), *list(elem.size()))
--> 163     return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [0] at entry 0 and [1] at entry 1
    # After correction
if bboxes.shape[0] == 0:
    bboxes = torch.zeros([1, 4], dtype=bboxes.dtype)
    labels = torch.FloatTensor(np.array([0])) # Added
    I obtained a prediction, taking one image randomly from the training set and inputting it into the trained model. 
Prediction uses DetBenchPredict within the effdet package. The original data size is (3, 512, 512) while DetBenchPredict takes batch as its input. So, I added a dimension using 'unsqueeze' function. 
DetBenchPredict outputs (N, 6) tensor. N is the number of bounding boxes predicted, and the meaning of each of the six elements is:
x-coordinate of bounding box top left
y-coordinate of bounding box top left
x-coordinate of bounding box bottom right
y-coordinate of bounding box bottom right
probability that the image is classified correctly
classification
The code is as below. Bounding boxes are drawn if the probability is over 50%.

image, targets = dataset.__getitem__(0)
image = image.unsqueeze(0)
bench = DetBenchPredict(model)
with torch.no_grad():
  output = bench(image)
# Draw the predictions with over 50% probability
fig, ax = pp.subplots()
ax.imshow(image[0,:,:])
for i in range(output.shape[1]):
  if output[0, i, 4] > 0.5:
    x1 = int(output[0, i, 0])
    y1 = int(output[0, i, 1])
    width = int(output[0, i, 2] - output[0, i, 0])
    height = int(output[0, i, 3] - output[0, i, 1])
    rect = patches.Rectangle((x1, y1), width, height, edgecolor='r', facecolor='none')
    ax.add_patch(rect)
    print(output[0,i,:])
pp.show()
    (output[0, i, :)
tensor([ 14.0453, 114.7553,  26.5884, 158.7972,   0.6781,   1.0000])
tensor([144.7045, 129.4016, 182.4770, 259.8239,   0.6156,   1.0000])
tensor([ -0.6067, 162.9664,  68.7289, 175.3027,   0.5549,   1.0000])
tensor([ -4.6260,   7.1583, 156.3810, 120.1586,   0.5246,   1.0000])
tensor([ 29.6035,  88.9964,  99.8469, 168.4458,   0.5069,   1.0000])
tensor([182.1268, 257.2897, 182.7585, 465.5251,   0.5004,   1.0000])
    Built on Forem — the open source software that powers DEV and other inclusive communities.
      Made with love and Ruby on Rails. DEV Community © 2016 - 2025.