Pytorch — LSTM (nn.LSTM & nn.LSTMCell)

output, h_n和c_n的关系

h_n：最后一个时间步的输出，即 h_n = output[:, -1, :]（一般可以直接输入到后续的全连接层）
c_n：最后一个时间步 LSTM cell 的状态（一般用不到）

import torch
import torch.nn as nn
lstm = nn.LSTM(input_size=2, hidden_size=3, batch_first=True)
input = torch.randn(5,4,2)
h0 = torch.randn(1, 5, 3)
c0 = torch.randn(1, 5, 3)
output, (hn, cn) = lstm(input, (h0, c0))

如果是两层

可以看到，如果是多层，那么output还是只会保留最后一层，而h_n则会多层都保留下来

如果是双向

可以看到，双向的output就是把两个方向的给concat到一起了，就是方向是反的

双向LSTM(BiLSTM)

很简单，只要加个bidirectional的参数就行了

import  torch
from  torch import nn
lstm = nn.LSTM(input_size=512, hidden_size=256, num_layers=2, batch_first=True, bidirectional=True)
print(lstm)
x = torch.randn(40,25,512)
out,(h_n,c_n) = lstm(x)
print(out.shape,h_n.shape,c_n.shape)

但是需要注意一点，双向RNN的话，输出的output的size会是2*hidden_size。

h_n和c_n的size不会变，但是他们的第一维会变，第一维是num_layers, 如果双向的话还要乘个2

如果用了Bi-LSTM，参数量会变为两倍

from torch import nn
def print_params(model):
    total_params = sum(p.numel() for p in model.parameters())
    print(f'{total_params:,} total parameters.')
    print(f'{total_params/(1024*1024):.2f}M total parameters.')
lstm = nn.LSTM(input_size=512, hidden_size=256, batch_first=True, num_layers=2)
lstm_bi = nn.LSTM(input_size=512, hidden_size=256, batch_first=True,bidirectional=True, num_layers=2)
for i in [lstm,lstm_bi]:
    print_params(i)

print('one layer lstm') cell=nn.LSTMCell(input_size=100, hidden_size=20) h=torch.zeros(3,20) c=torch.zeros(3,20) x = torch.randn(10,3,100) for xt in x: h,c = cell(xt, [h,c]) print('h.shape: ',h.shape) print('c.shape: ',c.shape)

import  torch
from  torch import nn
import numpy as np
x = torch.randn(10,3,100)
print('two layer lstm')
cell1=nn.LSTMCell(input_size=100, hidden_size=30)
cell2=nn.LSTMCell(input_size=30, hidden_size=20)
h1=torch. zeros(3,30)
c1=torch. zeros(3,30)
h2=torch. zeros(3,20)
c2=torch. zeros(3,20)
for xt in x: 
	h1,c1=cell1(xt,[h1, c1])
	h2,c2=cell2(h1,[h2, c2])
print('h.shape: ',h2.shape)
print('c.shape: ',c2.shape)
				PyTorch的`nn.LSTM`模块是一个用于构建长短期记忆（LSTM）网络的类，它是一种特殊类型的循环神经网络（RNN），能够学习序列数据中的`长期依赖关系`。
LSTM网络被广泛用于`时间序列预测`、`自然语言处理`、`语音识别`等领域。下面，我将简要介绍`nn.LSTM`的基本概念和如何在PyTorch中使用它。
				CLASS torch.nn.Embedding(num_embeddings: int, embedding_dim: int, padding_idx: 
      Optional[int] = None, max_norm: Optional[float] = None, norm_type: float = 2.0, 
      scale_grad_by_freq: bool = False, sparse: bool = False, _weight:  
      Optional[.
				torch.nn.LSTMCell类是一个LSTM的一个cell。数学表达式为：
i=σ(Wiix+bii+Whih+bhi)f=σ(Wifx+bif+Whfh+bhf)g=tanh⁡(Wigx+big+Whgh+bhg)o=σ(Wiox+bio+Whoh+bho)c′=f∗c+i∗gh′=o∗tanh⁡(c′)
\begin{array}{ll} 
i = \sigma(W_{ii} x + b...
>>> h0 = torch.randn(2, 3, 20)
>>> c0 = torch.randn(2, 3, 20)
>>> output, (hn, cn) = rnn(input, (h0, c0))
RNN可以看成是一个普通的网络（比如CNN）在时间线上做了多次复制，时
from torch.autograd import Variable
rnn = nn.LSTM(10,20,2)  #构建网络模型---输入矩阵特征数input_size、输出矩阵特征数hidden_size、层数num_layers
input of shape (seq_len, batch, input_si...
1.1 构造方法
使用nn.LSTM可以直接构建若干层的LSTM，构造时传入的三个参数和nn.RNN一样，依次是：
[feature_len,hidden_len,num_layers]
[feature\_len,hidden\_len,num\_layers]
[feature_len,hidden_len,num_layers]
其中h...
				照例先贴官方文档~
以下是实例化一个nn.LSTM单元时会用到的参数，例如lstm = nn.LSTM(10, 20, 2)就实例化了一个input_size=10, hidden_size=20，num_layer=2的LSTM网络，也就是输入的维度为10，隐层的神经元数目为20，总共有2个隐层。
实例化好的LSTM如何使用呢？以下是输入，h0和c0都是可选的，重点是input，是一个表示输入序列特征的tensor，维度是（seq_len, batch, input_size），比如接上例，x = to
对输入序列的每个元素，LSTM的每层都会执行以下计算：
hth_tht是时刻ttt的隐状态,ctc_tct是时刻ttt的细胞状态，xtx_txt是上一层的在时刻ttt的隐状态或者是第一层在时刻ttt的输入。it,ft,gt,oti_t, f_t, g_t, o_tit,ft,gt,ot 分别代表 输入门，遗忘门，细胞和输出门。
参数说明:
input_size – 输入的特征维度，（特征向量的长度，如2048）
公式(1), 输入门
it=δ(Wiixt+Whiht−1)i_t = \delta(W_{ii}x_t+W_{hi}h_{t-1})it=δ(Wiixt+Whiht−1)， LSTM中有关输入的参是是WiiW_{ii}Wii和WhiW_{hi}Whi
公式(2),遗忘门
ft=δ(Wifx
num_layers: 堆叠多个lstm层数，默认值：1
bias: False则 b_ih=0 和 b_hh=0。默认值：True
batch_first: 输入的数据是否构成（sequence,batch_size,feature）结构。默认值：False
dropout: 除最后一层，每一层的输出都进行dropout，默认值: 0
bidirectional:True则为双向lstm，
Pytorch中LSTM总共有7个参数，前面3个是必须输入的
input_size – The number of expected features in the input x
hidden_size – The number of features in the hidden state h
num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stackin
				本文记录一下使用LSTM的一些心得。本文总参考是：pytorch的官方文档https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html?highlight=lstm#torch.nn.LSTM。
文章目录多层LSTM权重形状batch_first输入形状输出形状
多层LSTM
多层LSTM是这样：
而不是这样：
上面的权重除了偏置可以归结为3类，即U（输入专用）,V（目标输出专用）,W（隐藏层之间用）。不过，这里没有目标输出。所以只
input_size:输入特征的数目
hidden_size:隐层的特征数目
num_layers：这个是模型集成的LSTM的个数 记住这里是模型中有多少个LSTM摞起来 一般默认就1个
bias：用不用偏置 默认是用
bat...
				nn.LSTM(in_dim, hidden_dim, n_layer, batch_first=True):LSTM循环神经网络
input_size： 表示的是输入的矩阵特征数
hidden_size： 表示的是输出矩阵特征数
num_layers 表示堆叠几层的LSTM，默认是1
bias： True 或者 False，决定是否使用bias
batch_first： True 或者 False，因为nn.lstm()接受的数据输入是(序列长度，batch，输入维数)，这和我们cnn输入的方式不
				`nn.LSTM()`函数的输出有两个部分，分别是输出和隐藏状态。输出是指LSTM网络的最终输出，它可以用来预测下一个单词或者分类问题。隐藏状态是指LSTM网络在每个时间步骤时产生的隐藏状态，它可以被用来维护LSTM网络内部的状态信息。
具体地说，如果输入序列的长度为`seq_len`，每个输入单词的词向量维度为`input_size`，LSTM的隐含层中包含`hidden_size`个神经元，则`nn.LSTM()`函数的输出形状为`(seq_len, batch_size, hidden_size)`，其中`batch_size`是输入序列的批次大小。
需要注意的是，在LSTM网络中，输出和隐藏状态的形状是相同的，因此在一些实现中，这两个状态会被合并成一个张量输出，形状为`(seq_len, batch_size, hidden_size * num_directions)`，其中`num_directions`表示LSTM网络的方向数，通常为1或2。
                Pytorch显存机制与显存占用(一) —— 理论分析(memory)(训练过程中占用显存最大部分的是activation)(pytorch缓存区/缓存分配器)
                    新一代图书管理员の养成笔记: 
                    怎么样重新编译pytorch库呢？
                Pytorch获取中间变量的梯度grad
                    只想快乐: 
                    谢谢博主！
                Pytorch的timm库(Timm library)(torchvision.models 的扩展版)(CV为主)
                    CSDN-Ada助手: 
                    pytorch 的优势是什么？