RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [896, 1]] is at version 11539; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

正向传播时:开启自动求导的异常侦测

torch.autograd.set_detect_anomaly(True)

反向传播时:在求导时开启侦测

with torch.autograd.detect_anomaly():
loss.backward()

FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison result = method(y) 以上问题,及解决方法: 要记住,Python很聪明,但是呢,有时也是会出现无法识别的问题,因此最好明确的告诉... 错误提示: Runtime Error : one of the variables needed for grad ient computation has been modified by an inplace operation: [torch.FloatTensor [544, 768]], which is output 0 of ViewBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detect Pytorch 梯度截断:torch.nn.utils.clip_ grad _norm_ 梯度裁剪: 既然在BP过程中会产生梯度消失(即偏导无限接近0,导致长时记忆无法更新),那么最简单粗暴的方法,设定阈值,当梯度小于阈值时,更新的梯度为阈值(梯度裁剪解决的是梯度消失或爆炸的问题,即设定阈值),如下图所示: torch.nn.utils.clip_ grad _norm_(parameters, max_norm, norm_type=2) 函数定义:裁剪可迭代参数的渐变范数,范数是在所有梯度一起计算的,就好想 【就看这一篇就行】Runtime Error : one of the variables needed for grad ient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256]] is at version 4; expected version 3 instead. Hint: enable anomaly detection to find the operation that fai 一定要使用那种饱和类型的损失,例如你希望模型二分类,loss不应该设置为,对于类别1,希望模型输出为正无穷大,对于类别0,希望模型输出为负无穷大,这样loss不会饱和,模型会一直训练下去,很容易nan。应该改为,对模型输出的结果,加一层sigmoid,从而对于类别1,希望模型输出为1,对于类别0,模型输出为0。而对于sigmoid这种饱和函数,输出为1,输入不需要是正无穷大,6,7,8这种数字就可以差不多输出为1了。这就导致,在输出可以做到为0的情况下,我log之后就会nan。 一、报错提示 FutureWarning: Non-finite norm encountered in torch.nn.utils.clip_ grad _norm_; continuing anyway. Note that the default behavior will change in a future release to error out if a non-finite total norm is encountered. At that point, setting error _if_n Runtime Error : one of the variables needed for grad ient computation has been modified by an inplace operation: [torch.cuda.LongTensor [4, 512, 512]] is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation th. 关于 pytorch 中多个backward出现的问题:enable anomaly detection to find the operation that failed to compute its grad ient, with torch. auto grad .set_detect_anomaly (True) 1.如果在迭代的100轮以内,出现NaN,一般情况下的原因是因为你的学习率过高,需要降低学习率。可以不断降低学习率直至不出现NaN为止,一般来说低于现有学习率1-10倍即可。 2.如果当前的网络是类似于RNN的循环神经网络的话,出现NaN可能是因为梯度爆炸的原因,一个有效的方式是增加“ grad ient clipping”(梯度截断来解决) Runtime Error : one of the variables needed for grad ient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2, 256, 20, 20]], which is output 0 of struct torch:: auto grad ::CopySlices, is at version 3; expected version 1 instead. save_path = "../../data/Enron/graph.pkl" with open(save_path, "wb") as f: pkl.dump(graphs, f) print("Processed Data Saved at {}".format(sa. 防止梯度爆炸,即设置一个梯度大小的上限,当梯度的范数大于上限时,将该参数的范数设置为上限。 补充:范数 最常用的是p-范数,若向量x=[x1,x2,⋯ ,xn]Tx=\left[x_{1}, x_{2}, \cdots, x_{n}\right]^{\mathrm{T}}x=[x1​,x2​,⋯,xn​]T,则p范数定义如下 ∥x∥p=(∣x1∣p+∣x2∣p+⋯+∣xn∣p)1p \|x\|_{p}=\left(\left|x_{1}\right|^{p}+\left|x_{2}\right|