python - PyTorch error in trying to backward through the graph a second time

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I'm trying to run this code: https://github.com/aitorzip/PyTorch-CycleGAN
I modified only the dataloader and transforms to be compatible with my data. When trying to run it I get this error:

Traceback (most recent call last):
File "models/CycleGANs/train", line 150, in loss_D_A.backward()
File "/opt/conda/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph)
"/opt/conda/lib/python3.8/site-packages/torch/autograd/ init .py", line 130, in backward Variable._execution_engine.run_backward(
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.

This is the train loop up to the point of error:

for epoch in range(opt.epoch, opt.n_epochs):
for i, batch in enumerate(dataloader):
    # Set model input
    real_A = Variable(input_A.copy_(batch['A']))
    real_B = Variable(input_B.copy_(batch['B']))
    ##### Generators A2B and B2A #####
    optimizer_G.zero_grad()
    # Identity loss
    # G_A2B(B) should equal B if real B is fed
    same_B = netG_A2B(real_B)
    loss_identity_B = criterion_identity(same_B, real_B)*5.0
    # G_B2A(A) should equal A if real A is fed
    same_A = netG_B2A(real_A)
    loss_identity_A = criterion_identity(same_A, real_A)*5.0
    # GAN loss
    fake_B = netG_A2B(real_A)
    pred_fake = netD_B(fake_B)
    loss_GAN_A2B = criterion_GAN(pred_fake, target_real)
    fake_A = netG_B2A(real_B)
    pred_fake = netD_A(fake_A)
    loss_GAN_B2A = criterion_GAN(pred_fake, target_real)
    # Cycle loss
    # TODO: cycle loss doesn't allow for multimodality. I leave it for now but needs to be thrown out later
    recovered_A = netG_B2A(fake_B)
    loss_cycle_ABA = criterion_cycle(recovered_A, real_A)*10.0
    recovered_B = netG_A2B(fake_A)
    loss_cycle_BAB = criterion_cycle(recovered_B, real_B)*10.0
    # Total loss
    loss_G = loss_identity_A + loss_identity_B + loss_GAN_A2B + loss_GAN_B2A + loss_cycle_ABA + loss_cycle_BAB
    loss_G.backward()
    optimizer_G.step()
    ##### Discriminator A #####
    optimizer_D_A.zero_grad()
    # Real loss
    pred_real = netD_A(real_A)
    loss_D_real = criterion_GAN(pred_real, target_real)
    # Fake loss
    fake_A = fake_A_buffer.push_and_pop(fake_A)
    pred_fale = netD_A(fake_A.detach())
    loss_D_fake = criterion_GAN(pred_fake, target_fake)
    # Total loss
    loss_D_A = (loss_D_real + loss_D_fake)*0.5
    loss_D_A.backward()
I am not familiar at all what it means. My guess is it's something to do with fake_A_buffer. It's just a fake_A_buffer = ReplayBuffer()
class ReplayBuffer():
def __init__(self, max_size=50):
    assert (max_size > 0), 'Empty buffer or trying to create a black hole. Be careful.'
    self.max_size = max_size
    self.data = []
def push_and_pop(self, data):
    to_return = []
    for element in data.data:
        element = torch.unsqueeze(element, 0)
        if len(self.data) < self.max_size:
            self.data.append(element)
            to_return.append(element)
        else:
            if random.uniform(0,1) > 0.5:
                i = random.randint(0, self.max_size-1)
                to_return.append(self.data[i].clone())
                self.data[i] = element
            else:
                to_return.append(element)
    return Variable(torch.cat(to_return))
Error after setting `loss_G.backward(retain_graph=True)
Traceback (most recent call last):   File "models/CycleGANs/train",
line 150, in 
loss_D_A.backward()   File "/opt/conda/lib/python3.8/site-packages/torch/tensor.py", line 221, in
backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)   File
"/opt/conda/lib/python3.8/site-packages/torch/autograd/init.py",
line 130, in backward
Variable._execution_engine.run_backward( RuntimeError: one of the variables needed for gradient computation has been modified by an
inplace operation: [torch.FloatTensor [3, 64, 7, 7]] is at version 2;
expected version 1 instead. Hint: enable anomaly detection to find the
operation that failed to compute its gradient, with
torch.autograd.set_detect_anomaly(True).
And after setting torch.autograd.set_detect_anomaly(True)
/opt/conda/lib/python3.8/site-packages/torch/autograd/init.py:130:
UserWarning: Error detected in MkldnnConvolutionBackward. Traceback of
forward call that caused the error:

File "models/CycleGANs/train",
line 115, in 
fake_B = netG_A2B(real_A)

File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py",
line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "/home/Histology-Style-Transfer-Research/models/CycleGANs/models.py",
line 67, in forward
return self.model(x)

File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py",
line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py",
line 117, in forward
input = module(input)

File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py",
line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "/home/Histology-Style-Transfer-Research/models/CycleGANs/models.py",
line 19, in forward
return x + self.conv_block(x)

File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py",
line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py",
line 117, in forward
input = module(input)

File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py",
line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py",
line 423, in forward
return self._conv_forward(input, self.weight)

File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py",
line 419, in _conv_forward
return F.conv2d(input, weight, self.bias, self.stride,  (Triggered internally at
/opt/conda/conda-bld/pytorch_1603729096996/work/torch/csrc/autograd/python_anomaly_mode.cpp:104.)
Variable._execution_engine.run_backward(

Traceback (most recent call
last):   File "models/CycleGANs/train", line 133, in 
loss_G.backward(retain_graph=True)

File "/opt/conda/lib/python3.8/site-packages/torch/tensor.py", line 221, in
backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)

"/opt/conda/lib/python3.8/site-packages/torch/autograd/init.py",
line 130, in backward
Variable._execution_engine.run_backward( RuntimeError: Function 'MkldnnConvolutionBackward' returned nan values in its 2th output.
                Does this answer your question? Pytorch - RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed
– Chris Stryczynski
                Dec 5, 2020 at 15:35
                I tried that but unfortunately it doesn't work. It shows exactly the same error in the same place.
– Jarartur
                Dec 1, 2020 at 10:53
                would criterion cycle call backwards for any reason? Can you show the full stack trace as well plz
– user13392352
                Dec 1, 2020 at 12:49
                i must have done something wrong before, now it shows different error but in the same place. I updated original question with it. criterion is just standard nn.MSELoss and nn.L1Loss
– Jarartur
                Dec 1, 2020 at 13:16
                Try setting realA.grad = None and realB.grad = None after optimizer_D_A.zero_grad(). Doing second order backprop can cause some weird stuff to happen and setting the labels / inputs grad to None has worked for me in the past
– user13392352
                Dec 1, 2020 at 14:17
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.