torch.nn.NLLLoss()与torch.nn.CrossEntropyLoss()__torch.nn.CrossEntropyLoss

相关文章推荐

儒雅的手电筒 · surface go 使用Windows ...· 9 月前 ·

强健的汤圆 · 邻家律师赵德浩_百度百科· 10 月前 ·

大力的山羊 · 力- song and lyrics by ...· 11 月前 ·

有腹肌的香烟 · 公园地图-游客服务-深圳欢乐谷官方网站· 11 月前 ·

耍酷的骆驼 · 天文专家解释每年清明节日期为何不固定在4月5日· 1 年前 ·

torch.nn.NLLLoss()

class torch.nn.NLLLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')
计算公式：loss(input, class) = -input[class]
公式理解：input = [-0.1187, 0.2110, 0.7463]，target = [1]，那么 loss = -0.2110。
个人理解：感觉像是把 target 转换成 one-hot 编码，然后与 input 点乘得到的结果。
nn.NLLLoss输入是一个对数概率向量和一个目标标签。NLLLoss() ，即负对数似然损失函数（Negative Log Likelihood）。 
NLLLoss() 损失函数公式：
 
常用于多分类任务，NLLLoss 函数输入 input 之前，需要对 input 进行 log_softmax 处理，即将 input 转换成概率分布的形式，并且取对数，底数为 e。
torch.manual_seed(2019)
output = torch.randn(1, 3)  # 网络输出
target = torch.ones(1, dtype=torch.long).random_(3)  # 真实标签
print(output)
print(target)
# 直接调用
loss = F.nll_loss(output, target)
print(loss)
# 实例化类
criterion = nn.NLLLoss()
loss = criterion(output, target)
print(loss)
tensor([[-0.1187,  0.2110,  0.7463]])
tensor([1])
tensor(-0.2110)
tensor(-0.2110)
实例2：
 如果 input 维度为 M x N，那么 loss 默认取 M 个 loss 的平均值，reduction=‘none’ 表示显示全部 loss. 
import torch
import torch.nn as nn
import torch.nn.functional as F
torch.manual_seed(2019)
output = torch.randn(2, 3)  # 网络输出
target = torch.ones(2, dtype=torch.long).random_(3)  # 真实标签
print(output)
print(target)
# 直接调用
loss = F.nll_loss(output, target)
print(loss)
# 实例化类
criterion = nn.NLLLoss(reduction='none')
loss = criterion(output, target)
print(loss)
tensor([[-0.1187,  0.2110,  0.7463],
        [-0.6136, -0.1186,  1.5565]])
tensor([2, 0])
tensor(-0.0664)
tensor([-0.7463,  0.6136])
参考：https://blog.csdn.net/weixin_40476348/article/details/94562240 
torch.nn.CrossEntropyLoss() 
对数据进行softmax,再log，再进行NLLLoss。其与nn.NLLLoss的关系可以描述为： 
softmax(x)+log(x)+nn.NLLLoss====>nn.CrossEntropyLoss
无需对输出结果进行softmax处理，使用nn.CrossEntropyLoss会自动加上Softmax层。
 nn.CrossEntropy()的表达式：
 
import torch
import torch.nn as nn
a = torch.Tensor([[1,2,3]])
target = torch.Tensor([2]).long()
logsoftmax = nn.LogSoftmax()
ce = nn.CrossEntropyLoss()
nll = nn.NLLLoss()
# 测试CrossEntropyLoss
cel = ce(a,target)
print(cel)
# 输出：tensor(0.4076)
# 测试LogSoftmax+NLLLoss
lsm_a = logsoftmax(a)
nll_lsm_a = nll(lsm_a,target)
# 输出tensor(0.4076)
看来直接用nn.CrossEntropy和nn.LogSoftmax+nn.NLLLoss是一样的结果。为什么这样呢，回想下交叉熵的表达式：

 其中y是label，x是prediction的结果，所以其实交叉熵损失就是target对应位置的输出结果x再取-log。这个计算过程刚好就是先LogSoftmax()再NLLLoss()。 
参考：
 https://blog.csdn.net/watermelon1123/article/details/91044856
 https://blog.csdn.net/weixin_40522801/article/details/106616295
                            torch.softmax、torch.nn.Softmax、torch.nn.funtial.softmax和torch.nn.functional.log_softmax的区别
                            没有找到torch.softmax、torch.nn.Softmax、torch.nn.funtial.softmax这几者的具