Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

RuntimeError: Given groups=1, weight of size [64, 64, 1, 1], expected input[4, 1, 1080, 1920] to have 64 channels, but got 1 channels instead

Ask Question

I want to train a U-net segmentation model on the German Asphalt Pavement Distress (GAPs) dataset using U-Net. I'm trying to modify the model at https://github.com/khanhha/crack_segmentation to train on that dataset.

Here is the folder containing all the related files and folders: https://drive.google.com/drive/folders/14NQdtMXokIixBJ5XizexVECn23Jh9aTM?usp=sharing

I modified the training file, and renamed it as "train_unet_GAPs.py". When I try to train on Colab using the following command:

!python /content/drive/Othercomputers/My\ Laptop/crack_segmentation_khanhha/crack_segmentation-master/train_unet_GAPs.py -data_dir "/content/drive/Othercomputers/My Laptop/crack_segmentation_khanhha/crack_segmentation-master/GAPs/" -model_dir /content/drive/Othercomputers/My\ Laptop/crack_segmentation_khanhha/crack_segmentation-master/model/ -model_type resnet101

I get the following error:

total images = 2410
create resnet101 model
Downloading: "https://download.pytorch.org/models/resnet101-63fe2227.pth" to /root/.cache/torch/hub/checkpoints/resnet101-63fe2227.pth
100% 171M/171M [00:00<00:00, 212MB/s]
Started training model from epoch 0
Epoch 0:   0% 0/2048 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/content/drive/Othercomputers/My Laptop/crack_segmentation_khanhha/crack_segmentation-master/train_unet_GAPs.py", line 259, in <module>
    train(train_loader, model, criterion, optimizer, validate, args)
  File "/content/drive/Othercomputers/My Laptop/crack_segmentation_khanhha/crack_segmentation-master/train_unet_GAPs.py", line 118, in train
    masks_pred = model(input_var)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/drive/Othercomputers/My Laptop/crack_segmentation_khanhha/crack_segmentation-master/unet/unet_transfer.py", line 224, in forward
    conv2 = self.conv2(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torchvision/models/resnet.py", line 144, in forward
    out = self.conv1(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 447, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 444, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [64, 64, 1, 1], expected input[4, 1, 1080, 1920] to have 64 channels, but got 1 channels instead
Epoch 0:   0% 0/2048 [00:08<?, ?it/s]

I think that this is because the images of GAPs dataset are grayscale images (with one channel), while Resnet expects to receive RGB images with 3 channels.

How can I solve this issue? How can I modify the model to receive grayscale images instead of RGB images? I need help with that. I have no experience with torch, and I think this implementation uses built-in Resnet model.

This error typically means that there is a mismatch in terms of channel between your input and a certain layer. To help you more, could you provide the stack trace error, so we can get the problematic layer and the model implementation code ? – Max D. Jun 10, 2022 at 7:57 @MaxD. Thanks for your comment. I edited my question and added the stack trace error. I look forward to getting your answer. – Mohamed Hedeya Jun 10, 2022 at 8:08 Convolution op wants weights of shape [out_channels, in_channels/groups, kernel_height, kernel_width] and input of shape [batch, in_channels, height, width]. – n. m. Jun 10, 2022 at 8:19 @n.1.8e9-where's-my-sharem. Thanks. I understand this. However, I need help on how to solve the issue. – Mohamed Hedeya Jun 10, 2022 at 9:07 Your weights (not images) are in the wrong shape. I have no idea why they came to be this way but you need to fix them. The error has nothing to do with RGB or grayscale images. Nothing at this point has or expects 3 channels. – n. m. Jun 10, 2022 at 9:30

I figured out few things with your code.

According to the trace back, you are using a resnet based Unet model.

Your current model forward method is defined as :

def forward(self, x):
    #conv1 = self.conv1(x)
    #conv2 = self.conv2(conv1)
    conv2 = self.conv2(x)
    conv3 = self.conv3(conv2)
    conv4 = self.conv4(conv3)
    conv5 = self.conv5(conv4)

Your error comes from self.conv2(x), because, conv2 takes a matrix with a number of channels of 64. It means, something is missing, or.. commented :)

By changing

    #conv1 = self.conv1(x)
    #conv2 = self.conv2(conv1)
    conv2 = self.conv2(x)
    conv1 = self.conv1(x)
    conv2 = self.conv2(conv1) 

Will fix the problem the problem of 64 channels as input. But, there is another problem :

Using an input of (B,1,H,W), no matters what B, H and W are, won't be possible with your current architecture. Why ? Because of this :

resnet34 = torchvision.models.resnet34(pretrained=False)
resnet101 = torchvision.models.resnet101(pretrained=False)
resnet152 = torchvision.models.resnet152(pretrained=False)
print(resnet34.conv1)
-> Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
print(resnet101.conv1)
-> Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
print(resnet152.conv1)
-> Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

In any case, the layer conv1 of resnet, takes a 3 channels input.

Once you have made those modifications, you should also try your network with a dummy example like :

model = UNetResNet(34,num_classes=2)
out = model(torch.rand(4,3,1920,1920))
print(out.shape)
-> (4,2,1920,1920) | (batch_size, num_classes, H, W)

Why your width and height are the same here ? Because your current architecture only supports squared images.

For example :

-> (1080,1920) = dim mismatching during concatenation part
-> (1920,1920) = success
-> (108,192) = dim mismatching during concatenation part
-> (192,192) = success

Conclusion :

  • Modify your network to accept grayscale images if your dataset is made of grayscale images.
  • Preprocess your images to make Width=Height.
  • Edit (device mismatch) :

    class UNetResNet(nn.Module):
        def __init__(self, encoder_depth, num_classes, num_filters=32, dropout_2d=0.2,
                     pretrained=False, is_deconv=False):
            super().__init__()
            self.num_classes = num_classes
            self.dropout_2d = dropout_2d
            if encoder_depth == 34:
                self.encoder = torchvision.models.resnet34(pretrained=pretrained)
                bottom_channel_nr = 512
            elif encoder_depth == 101:
                self.encoder = torchvision.models.resnet101(pretrained=pretrained)
                bottom_channel_nr = 2048
            elif encoder_depth == 152:
                self.encoder = torchvision.models.resnet152(pretrained=pretrained)
                bottom_channel_nr = 2048
            else:
                raise NotImplementedError('only 34, 101, 152 version of Resnet are implemented')
            self.pool = nn.MaxPool2d(2, 2)
            self.relu = nn.ReLU(inplace=True)
            #self.conv1 = nn.Sequential(self.encoder.conv1,
            #                           self.encoder.bn1,
            #                           self.encoder.relu,
            #                           self.pool)
            self.conv1 = nn.Sequential(nn.Conv2d(1,64,kernel_size=(7,7),stride=(2,2),padding=(3,3),bias=False), # 1 Here is for grayscale images, replace by 3 if you need RGB/BGR
                                       nn.BatchNorm2d(64),
                                       nn.ReLU(),
                                       self.pool
            self.conv2 = self.encoder.layer1
            self.conv3 = self.encoder.layer2
            self.conv4 = self.encoder.layer3
            self.conv5 = self.encoder.layer4
            self.center = DecoderBlockV2(bottom_channel_nr, num_filters * 8 * 2, num_filters * 8, is_deconv)
            self.dec5 = DecoderBlockV2(bottom_channel_nr + num_filters * 8, num_filters * 8 * 2, num_filters * 8, is_deconv)
            self.dec4 = DecoderBlockV2(bottom_channel_nr // 2 + num_filters * 8, num_filters * 8 * 2, num_filters * 8,
                                       is_deconv)
            self.dec3 = DecoderBlockV2(bottom_channel_nr // 4 + num_filters * 8, num_filters * 4 * 2, num_filters * 2,
                                       is_deconv)
            self.dec2 = DecoderBlockV2(bottom_channel_nr // 8 + num_filters * 2, num_filters * 2 * 2, num_filters * 2 * 2,
                                       is_deconv)
            self.dec1 = DecoderBlockV2(num_filters * 2 * 2, num_filters * 2 * 2, num_filters, is_deconv)
            self.dec0 = ConvRelu(num_filters, num_filters)
            self.final = nn.Conv2d(num_filters, num_classes, kernel_size=1)
        def forward(self, x):
            conv1 = self.conv1(x)
            conv2 = self.conv2(conv1)
            conv3 = self.conv3(conv2)
            conv4 = self.conv4(conv3)
            conv5 = self.conv5(conv4)
            pool = self.pool(conv5)
            center = self.center(pool)
            dec5 = self.dec5(torch.cat([center, conv5], 1))
            dec4 = self.dec4(torch.cat([dec5, conv4], 1))
            dec3 = self.dec3(torch.cat([dec4, conv3], 1))
            dec2 = self.dec2(torch.cat([dec3, conv2], 1))
            dec1 = self.dec1(dec2)
            dec0 = self.dec0(dec1)
            return self.final(F.dropout2d(dec0, p=self.dropout_2d))
                    I understand that I should modify the network to accept grayscale images. However, I don't know how I can do this. I don't have experience with pytorch. Especially that I understand that these are built-in models. Could you pls advise what is the modification I can make in the code to make the model accept grayscale images?
    – Mohamed Hedeya
                    Jun 10, 2022 at 11:41
                    I managed to modify the model to accept grayscale images by adding self.conv1 = torch.nn.Conv2d(1, 64, (7, 7), (2, 2), (3, 3), bias=False) as the first line in forward. However, now I'm getting the following error: RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
    – Mohamed Hedeya
                    Jun 10, 2022 at 12:05
                    This error means your input data is on cuda data.to(torch.device("cuda:0")) but your model isn't. You need to push it to your gpu : model.to(torch.device("cuda:0"))
    – Max D.
                    Jun 10, 2022 at 13:09
                    The code was already having model.cuda() in train_unet_GAPs.py. Even after I replaced it with model.to(torch.device("cuda:0")), I look forward to receiving further advice from you. Thanks.
    – Mohamed Hedeya
                    Jun 10, 2022 at 14:11
            

    Thanks for contributing an answer to Stack Overflow!

    • Please be sure to answer the question. Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    To learn more, see our tips on writing great answers.