I have been interested in Machine Learning but left it untouched for years. I finally decided to start training myself so that I get insight into data usage and the capability of coding by myself. I found an interesting image competition and started with it. The competition had already finished but the data is still available and I can still submit my prediction and get the score.

The tasks are straightforward, including object detection and classification. I found EfficientDet as a useful model these days that manages both of these tasks, and decided to develop a model with it. However, the implementation was extremely hard. Some error messages require me to edit the packages imported, which I couldn't manage in Kaggle notebook. Therefore, I re-started the implementation of EfficientDet with a simple data.

A useful example I found is the blog written in Japanese in Oct 2021. The setting is to detect a red circle on a black square background. The source code on the blog worked in most parts, but I met some errors when I tested it in April 2023. The below is a note of the errors and remedies, and the accuracy of result.

# Error message
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-9f3114f08672> in <cell line: 19>()
     33     targets['cls'] = targets['cls']
     34     optimizer.zero_grad()
---> 35     losses = bench(inputs, targets)
     36     loss = losses['loss']
     37     loss.backward()
/usr/local/lib/python3.9/dist-packages/effdet/anchors.py in batch_label_anchors(self, gt_boxes, gt_classes, filter_valid)
    396                     cls_targets[count:count + steps].view([feat_size[0], feat_size[1], -1]))
    397                 box_targets_out[level_idx].append(
--> 398                     box_targets[count:count + steps].view([feat_size[0], feat_size[1], -1]))
    399                 count += steps
    400                 if last_sample:
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
    Enter fullscreen mode
    Exit fullscreen mode
> Line 398 Before correction
  #box_targets[count:count + steps].view([feat_size[0], feat_size[1], -1]))
> After correction
  box_targets[count:count + steps].reshape([feat_size[0], feat_size[1], -1]))
    Enter fullscreen mode
    Exit fullscreen mode

The dataset makes augmentation processes before outputting the data. These processes include in the first part randomly cropping the input image, which sometimes delete the information of bounding boxes and labels when the bounding boxes are cropped out from the original image. The sample code defines the process if this case happens, but it only defines the new bounding box and doesn't define the new labels, which causes the error.

class CircleDataset(Dataset):
  def __getitem__(self, idx):
    if bboxes.shape[0] == 0:
      bboxes = torch.zeros([1, 4], dtype=bboxes.dtype)
    return x, y
    Enter fullscreen mode
    Exit fullscreen mode
# Error message
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-10-9f3114f08672> in <cell line: 19>()
     28   t = tqdm(loader, leave=False)
---> 30   for inputs, targets in t:
     31     inputs = inputs
     32     targets['bbox'] = targets['bbox']
/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/collate.py in collate_tensor_fn(batch, collate_fn_map)
    161         storage = elem.storage()._new_shared(numel, device=elem.device)
    162         out = elem.new(storage).resize_(len(batch), *list(elem.size()))
--> 163     return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [0] at entry 0 and [1] at entry 1
    Enter fullscreen mode
    Exit fullscreen mode
# After correction
if bboxes.shape[0] == 0:
    bboxes = torch.zeros([1, 4], dtype=bboxes.dtype)
    labels = torch.FloatTensor(np.array([0])) # Added
    Enter fullscreen mode
    Exit fullscreen mode

I obtained a prediction, taking one image randomly from the training set and inputting it into the trained model.

Prediction uses DetBenchPredict within the effdet package. The original data size is (3, 512, 512) while DetBenchPredict takes batch as its input. So, I added a dimension using 'unsqueeze' function.

DetBenchPredict outputs (N, 6) tensor. N is the number of bounding boxes predicted, and the meaning of each of the six elements is:

  • x-coordinate of bounding box top left
  • y-coordinate of bounding box top left
  • x-coordinate of bounding box bottom right
  • y-coordinate of bounding box bottom right
  • probability that the image is classified correctly
  • classification
  • The code is as below. Bounding boxes are drawn if the probability is over 50%.

    image, targets = dataset.__getitem__(0)
    image = image.unsqueeze(0)
    bench = DetBenchPredict(model)
    with torch.no_grad():
      output = bench(image)
    # Draw the predictions with over 50% probability
    fig, ax = pp.subplots()
    ax.imshow(image[0,:,:])
    for i in range(output.shape[1]):
      if output[0, i, 4] > 0.5:
        x1 = int(output[0, i, 0])
        y1 = int(output[0, i, 1])
        width = int(output[0, i, 2] - output[0, i, 0])
        height = int(output[0, i, 3] - output[0, i, 1])
        rect = patches.Rectangle((x1, y1), width, height, edgecolor='r', facecolor='none')
        ax.add_patch(rect)
        print(output[0,i,:])
    pp.show()
        Enter fullscreen mode
        Exit fullscreen mode
    
    (output[0, i, :)
    tensor([ 14.0453, 114.7553,  26.5884, 158.7972,   0.6781,   1.0000])
    tensor([144.7045, 129.4016, 182.4770, 259.8239,   0.6156,   1.0000])
    tensor([ -0.6067, 162.9664,  68.7289, 175.3027,   0.5549,   1.0000])
    tensor([ -4.6260,   7.1583, 156.3810, 120.1586,   0.5246,   1.0000])
    tensor([ 29.6035,  88.9964,  99.8469, 168.4458,   0.5069,   1.0000])
    tensor([182.1268, 257.2897, 182.7585, 465.5251,   0.5004,   1.0000])
        Enter fullscreen mode
        Exit fullscreen mode
          

    Built on Forem — the open source software that powers DEV and other inclusive communities.

    Made with love and Ruby on Rails. DEV Community © 2016 - 2025.