Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

pytorch backward error, one of variables for gradient computation modified by an inplace operation

Ask Question

I'm new to pytorch, i've been trying to implement a text summarization network. When i call loss.backward() an error appears.

RuntimeError : one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 1, 1, 400]], which is output 0 of UnsqueezeBackward0, is at version 98; expected version 97 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

it's a seq2seq model, and i think the problem comes from this code snippet

    final_dists=torch.zeros((batch_size,dec_max_len,extended_vsize)) #to hold the model outputs with extended vocab
    attn_dists=torch.zeros((batch_size,dec_max_len,enc_max_len)) #to retain the attention weights over decoder steps        
    coverages=torch.zeros((batch_size,dec_max_len,enc_max_len))  #the coverages are retained to compute coverage loss
    inp=self.emb_dropout(self.embedding(dec_batch[:,0])) #starting input: <SOS> shape [batch_size]
    #self.prev_coverage is the accumulated coverage
    coverage=None #initially none, but accumulates                              
    with torch.autograd.set_detect_anomaly(True):
        for i in range(1,dec_max_len):
            #NOTE: the outputs, atten_dists, p_gens assignments start from i=1 (DON'T FORGET!)
            vocab_dists,hidden,attn_dists_tmp,p_gen,coverage=self.decoder(inp,hidden,enc_outputs,enc_lens,coverage)
            attn_dists[:,i,:]=attn_dists_tmp.squeeze(1)
            coverages[:,i,:]=coverage.squeeze(1)
            #vocab_dists: [batch_size, 1, dec_vocab_size] Note: this is the normalized probability
            #hidden: [1,batch_size, dec_hid_dim]
            #attn_dists_tmp: [batch_size, 1, enc_max_len]
            #p_gen: [batch_size, 1]
            #coverage: [batch_size, 1, enc_max_len]
            #===================================================================
            #To compute the final dist in pointer-generator network by extending vocabulary 
            vocab_dists_p=p_gen.unsqueeze(-1)*vocab_dists  #[batch_size,1,dec_vocab_size] note we want to maintain vocab_dists for teacher_forcing_ratio
            attn_dists_tmp=(1-p_gen).unsqueeze(-1)*attn_dists_tmp #[batch_size, 1, enc_max_len] note we want to maintain attn_dists for later use
            extra_zeros=torch.zeros((batch_size,1,max_art_oovs)).to(self.device)
            vocab_dists_extended=torch.cat((vocab_dists_p,extra_zeros),dim=2) #[batch_size, 1, extended_vsize]
            attn_dists_projected=torch.zeros((batch_size,1,extended_vsize)).to(self.device)
            indices=enc_batch_extend_vocab.clone().unsqueeze(1) #[batch_size, 1,enc_max_size]
            attn_dists_projected=attn_dists_projected.scatter(2,indices,attn_dists_tmp)
            #We need this otherwise we would modify a leaf Variable inplace
            #attn_dists_projected_clone=attn_dists_projected.clone()
            #attn_dists_projected_clone.scatter_(2,indices,attn_dists_tmp) #this will project the attention weights 
            #attn_dists_projected.scatter_(2,indices,attn_dists_tmp) 
            final_dists[:,i,:]=vocab_dists_extended.squeeze(1)+attn_dists_projected.squeeze(1) 
            #===================================================================
            #teacher forcing, whether or not should use pred or dec sequence label        
            if random.random()<teacher_forcing_ratio:
                inp=self.emb_dropout(self.embedding(dec_batch[:,i]))
            else:
                inp=self.emb_dropout(self.embedding(vocab_dists.squeeze(1).argmax(1)))

if i remove the for loop, and just do one step of updating attn_dists[:,1,:] etc, with toy loss from the outputs returned by forward, then it works fine. Anyone has any idea what is wrong here? There is no inplace operation here. Many thanks!

From looking at your code, the problem likely comes from the following lines:

    attn_dists[:,i,:]=attn_dists_tmp.squeeze(1)
    coverages[:,i,:]=coverage.squeeze(1)

you are performing an in place operation that conflicts with the graph created by pytorch for backprop. It should be solved by concatenating the new info at every loop (you may run out of memory very soon!)

    attn_dists = torch.cat((attn_dists, attn_dists_tmp.squeeze(1)), dim=1)
    coverages = torch.cat(coverages, coverage.squeeze(1)),dim=1)

You should, change their initialization as well, otherwise you will endup of a tensor that is twice the size you were accounting for.

thanks for the reply, the torch.cat will expand the dimension, that's not what i wanted. Also, attn_dists[:,i,:]=attn_dists_tmp.squeeze(1) is assigning the slice to a different variable. I've seen in seq2seq example in which outputs is created and outputs[i] slice is assigned to a prediction in the loop. I don't see no difference here. or am i missing something? thanks a lot – Xue Tintin Jul 15, 2020 at 9:46 i tried it, same problem, i don't think it's the slice assignment. and thanks for the advice of using torch.cat, code would look cleaner. – Xue Tintin Jul 15, 2020 at 10:12 I just notice the line: attn_dists_tmp=(1-p_gen).unsqueeze(-1)*attn_dists_tmp have you tried changing it to attn_dists_tmp2=(1-p_gen).unsqueeze(-1)*attn_dists_tmp ? You should be looking for inplace operations. – Victor Zuanazzi Jul 15, 2020 at 10:20

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.