相关文章推荐
深情的可乐  ·  uncaught typeerror ...·  2 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

can't find the inplace operation: one of the variables needed for gradient computation has been modified by an inplace operation

Ask Question

I am trying to compute a loss on the jacobian of the network (i.e. to perform double backprop), and I get the following error: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

I can't find the inplace operation in my code, so I don't know which line to fix.

*The error occurs in the last line: loss3.backward()

            inputs_reg = Variable(data, requires_grad=True)
            output_reg = self.model.forward(inputs_reg)
            num_classes = output.size()[1]
            jacobian_list = []
            grad_output = torch.zeros(*output_reg.size())
            if inputs_reg.is_cuda:
                grad_output = grad_output.cuda()
                jacobian_list = jacobian.cuda()
            for i in range(10):
                zero_gradients(inputs_reg)
                grad_output.zero_()
                grad_output[:, i] = 1
                jacobian_list.append(torch.autograd.grad(outputs=output_reg,
                                                  inputs=inputs_reg,
                                                  grad_outputs=grad_output,
                                                  only_inputs=True,
                                                  retain_graph=True,
                                                  create_graph=True)[0])
            jacobian = torch.stack(jacobian_list, dim=0)
            loss3 = jacobian.norm()
            loss3.backward()
                grad_output.zero_() seems like an in-place operation. you might have in-place operations in self.model.
– Shai
                Dec 9, 2018 at 11:31
                grad_output.zero_() is the inplace operation. In PyTorch the inplace operations end with an underscore. I think you wanted to write `grad_output.zero_grad()
– kHarshit
                Dec 9, 2018 at 11:32
                I need to zero grad_output before I set the new column (corresponding with the output that I want the gradient to be calculated for) to be ones. so I changed grad_output.zero_() to grad_output[:,i-1] = 0 and it did not help.
– Einav
                Dec 9, 2018 at 12:53

You can make use of set_detect_anomaly function available in autograd package to exactly find which line is responsible for the error.

Here is the link which describes the same problem and a solution using the abovementioned function.

grad_output.zero_() is in-place and so is grad_output[:, i-1] = 0. In-place means "modify a tensor instead of returning a new one, which has the modifications applied". An example solution which is not in-place is torch.where. An example use to zero out the 1st column

import torch
t = torch.randn(3, 3)
ixs = torch.arange(3, dtype=torch.int64)
zeroed = torch.where(ixs[None, :] == 1, torch.tensor(0.), t)
zeroed
tensor([[-0.6616,  0.0000,  0.7329],
        [ 0.8961,  0.0000, -0.1978],
        [ 0.0798,  0.0000, -1.2041]])
tensor([[-0.6616, -1.6422,  0.7329],
        [ 0.8961, -0.9623, -0.1978],
        [ 0.0798, -0.7733, -1.2041]])

Notice how t retains the values it had before and zeroed has the values you want.

I replaced the problematic code of the inplace operation in grad_output with:

            inputs_reg = Variable(data, requires_grad=True)
            output_reg = self.model.forward(inputs_reg)
            num_classes = output.size()[1]
            jacobian_list = []
            grad_output = torch.zeros(*output_reg.size())
            if inputs_reg.is_cuda:
                grad_output = grad_output.cuda()
            for i in range(5):
                zero_gradients(inputs_reg)
                grad_output_curr = grad_output.clone()
                grad_output_curr[:, i] = 1
                jacobian_list.append(torch.autograd.grad(outputs=output_reg,
                                                         inputs=inputs_reg,
                                                         grad_outputs=grad_output_curr,
                                                         only_inputs=True,
                                                         retain_graph=True,
                                                         create_graph=True)[0])
            jacobian = torch.stack(jacobian_list, dim=0)
            loss3 = jacobian.norm()
            loss3.backward()
                Please note the grad_output_curr[:, i] = 1 line is still an in-place operation and may (or may not) cause trouble further down the line.
– Jatentaki
                Dec 9, 2018 at 13:32

I hope your problem got solved. I had this problem and solutions like using function clone() did not work for me. But when I installed pytorch version 1.4, it solved.
I think this problem is kind of bug in step() function. Some weird thing is this bug happen when you use pytorch version 1.5 but it's not in v1.4.
You can see all released versions of pytorch in this link.

I met this error when I was doing the PPO (Proximal Policy Optimization). I solve this problem by defining a target network and a main network. The target network at the beginning has the same parameter values with the main network. During the training, the target network parameters are assigned to the main network every constant time steps. The details can be found in the code: https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_colab.ipynb

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.