Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
Ask Question
I am trying to compute a loss on the jacobian of the network (i.e. to perform double backprop), and I get the following error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
I can't find the inplace operation in my code, so I don't know which line to fix.
*The error occurs in the last line:
loss3.backward()
inputs_reg = Variable(data, requires_grad=True)
output_reg = self.model.forward(inputs_reg)
num_classes = output.size()[1]
jacobian_list = []
grad_output = torch.zeros(*output_reg.size())
if inputs_reg.is_cuda:
grad_output = grad_output.cuda()
jacobian_list = jacobian.cuda()
for i in range(10):
zero_gradients(inputs_reg)
grad_output.zero_()
grad_output[:, i] = 1
jacobian_list.append(torch.autograd.grad(outputs=output_reg,
inputs=inputs_reg,
grad_outputs=grad_output,
only_inputs=True,
retain_graph=True,
create_graph=True)[0])
jacobian = torch.stack(jacobian_list, dim=0)
loss3 = jacobian.norm()
loss3.backward()
–
–
–
You can make use of set_detect_anomaly function available in autograd package to exactly find which line is responsible for the error.
Here is the link which describes the same problem and a solution using the abovementioned function.
grad_output.zero_() is in-place and so is grad_output[:, i-1] = 0. In-place means "modify a tensor instead of returning a new one, which has the modifications applied". An example solution which is not in-place is torch.where. An example use to zero out the 1st column
import torch
t = torch.randn(3, 3)
ixs = torch.arange(3, dtype=torch.int64)
zeroed = torch.where(ixs[None, :] == 1, torch.tensor(0.), t)
zeroed
tensor([[-0.6616, 0.0000, 0.7329],
[ 0.8961, 0.0000, -0.1978],
[ 0.0798, 0.0000, -1.2041]])
tensor([[-0.6616, -1.6422, 0.7329],
[ 0.8961, -0.9623, -0.1978],
[ 0.0798, -0.7733, -1.2041]])
Notice how t retains the values it had before and zeroed has the values you want.
I replaced the problematic code of the inplace operation in grad_output with:
inputs_reg = Variable(data, requires_grad=True)
output_reg = self.model.forward(inputs_reg)
num_classes = output.size()[1]
jacobian_list = []
grad_output = torch.zeros(*output_reg.size())
if inputs_reg.is_cuda:
grad_output = grad_output.cuda()
for i in range(5):
zero_gradients(inputs_reg)
grad_output_curr = grad_output.clone()
grad_output_curr[:, i] = 1
jacobian_list.append(torch.autograd.grad(outputs=output_reg,
inputs=inputs_reg,
grad_outputs=grad_output_curr,
only_inputs=True,
retain_graph=True,
create_graph=True)[0])
jacobian = torch.stack(jacobian_list, dim=0)
loss3 = jacobian.norm()
loss3.backward()
–
I hope your problem got solved. I had this problem and solutions like using function clone() did not work for me. But when I installed pytorch version 1.4, it solved.
I think this problem is kind of bug in step() function. Some weird thing is this bug happen when you use pytorch version 1.5 but it's not in v1.4.
You can see all released versions of pytorch in this link.
I met this error when I was doing the PPO (Proximal Policy Optimization). I solve this problem by defining a target network and a main network. The target network at the beginning has the same parameter values with the main network. During the training, the target network parameters are assigned to the main network every constant time steps. The details can be found in the code: https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_colab.ipynb
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.