RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [896, 1]] is at version 11539; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
torch.autograd.set_detect_anomaly(True)
with torch.autograd.detect_anomaly():
loss.backward()
FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
result = method(y)
以上问题,及解决方法:
要记住,Python很聪明,但是呢,有时也是会出现无法识别的问题,因此最好明确的告诉...
错误提示:
Runtime
Error
: one of the variables needed for
grad
ient computation has been modified by an inplace operation: [torch.FloatTensor [544, 768]], which is output 0 of ViewBackward, is at version 1; expected version 0 instead.
Hint: enable anomaly detect
Pytorch
梯度截断:torch.nn.utils.clip_
grad
_norm_
梯度裁剪:
既然在BP过程中会产生梯度消失(即偏导无限接近0,导致长时记忆无法更新),那么最简单粗暴的方法,设定阈值,当梯度小于阈值时,更新的梯度为阈值(梯度裁剪解决的是梯度消失或爆炸的问题,即设定阈值),如下图所示:
torch.nn.utils.clip_
grad
_norm_(parameters, max_norm, norm_type=2)
函数定义:裁剪可迭代参数的渐变范数,范数是在所有梯度一起计算的,就好想
【就看这一篇就行】Runtime
Error
: one of the variables needed for
grad
ient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256]] is at version 4; expected version 3 instead. Hint: enable anomaly detection to find the operation that fai
一定要使用那种饱和类型的损失,例如你希望模型二分类,loss不应该设置为,对于类别1,希望模型输出为正无穷大,对于类别0,希望模型输出为负无穷大,这样loss不会饱和,模型会一直训练下去,很容易nan。应该改为,对模型输出的结果,加一层sigmoid,从而对于类别1,希望模型输出为1,对于类别0,模型输出为0。而对于sigmoid这种饱和函数,输出为1,输入不需要是正无穷大,6,7,8这种数字就可以差不多输出为1了。这就导致,在输出可以做到为0的情况下,我log之后就会nan。
一、报错提示
FutureWarning: Non-finite norm encountered in torch.nn.utils.clip_
grad
_norm_; continuing anyway. Note that the default behavior will change in a future release to
error
out if a non-finite total norm is encountered. At that point, setting
error
_if_n
Runtime
Error
: one of the variables needed for
grad
ient computation has been modified by an inplace operation: [torch.cuda.LongTensor [4, 512, 512]] is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation th.
关于
pytorch
中多个backward出现的问题:enable anomaly detection to find the operation that failed to compute its
grad
ient, with torch.
auto
grad
.set_detect_anomaly (True)
1.如果在迭代的100轮以内,出现NaN,一般情况下的原因是因为你的学习率过高,需要降低学习率。可以不断降低学习率直至不出现NaN为止,一般来说低于现有学习率1-10倍即可。
2.如果当前的网络是类似于RNN的循环神经网络的话,出现NaN可能是因为梯度爆炸的原因,一个有效的方式是增加“
grad
ient clipping”(梯度截断来解决)
Runtime
Error
: one of the variables needed for
grad
ient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2, 256, 20, 20]], which is output 0 of struct torch::
auto
grad
::CopySlices, is at version 3; expected version 1 instead.
save_path = "../../data/Enron/graph.pkl"
with open(save_path, "wb") as f:
pkl.dump(graphs, f)
print("Processed Data Saved at {}".format(sa.
防止梯度爆炸,即设置一个梯度大小的上限,当梯度的范数大于上限时,将该参数的范数设置为上限。
补充:范数
最常用的是p-范数,若向量x=[x1,x2,⋯ ,xn]Tx=\left[x_{1}, x_{2}, \cdots, x_{n}\right]^{\mathrm{T}}x=[x1,x2,⋯,xn]T,则p范数定义如下
∥x∥p=(∣x1∣p+∣x2∣p+⋯+∣xn∣p)1p
\|x\|_{p}=\left(\left|x_{1}\right|^{p}+\left|x_{2}\right|