用Pytorch实现循环神经网络RNN

用Pytorch实现循环神经网络RNN

2 年前 · 来自专栏 深度学习笔记

这篇文章主要讲用pytorch实现基本的RNNs(Vanilla RNNs)、 多层RNNs(Stacked RNNs)、双向RNNs(Bidirectional RNNs)和多层双向RNNs(Stacked Bidirectional RNNs)的Pytorch实现。重点关注输入、输出、隐层状态的维度和含义。

RNNs的种类

RNN主要用于处理时间序列数据、自然语言处理(NLP)等序列数据,根据输入输出所含时间序列的步长,RNNs大体可以分为以下几种。

  • 对时间序列数据而言
  1. 预测任务——多对多/多对一
  2. 分类任务——多对一
  • 对自然语言处理而言
  1. 文本分类:多对一
  2. 文本生成:多对多
  3. 机器翻译:多对多
  4. 命名实体识别:多对多
  5. 自动图像描述:一对多

多层RNNs(Stacked RNNs)的结构

多层RNNs一般用于提高性能。

双向RNNs(Bidirectional RNNs)的结构

双向RNNs使用了两个RNNs网络结构,输入序列按照正序输入其中一个RNNs,按照逆序输入另一个RNNs。

各种RNN的pytorch代码实例:

包导入Import Libraries

import torch
import torch.nn as nn

任务描述

时间序列数据预测。用5个时间步(Sequence Length = 5)的数据取预测接下来两个时间步的数据。

输入数据: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

data = torch.Tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]) 
print("Data: ", data.shape, "\n\n", data)

运行结果:

###################### ######################
Data:
tensor([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20.])
Data Shape:
torch.Size([20])

将其切分为4个batch:

[[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20]]

Batch Size = 4
Sequence Length = 5
Input Size = 1 (因为,该例子数据就是1维的)

Hidden_Size=2 (隐藏层状态的特征维度)

INPUT_SIZE = 1
SEQ_LENGTH = 5
HIDDEN_SIZE = 2
NUM_LAYERS = 1 #单层RNN网络
BATCH_SIZE = 4

基本RNNs(Vanilla RNNs)的Pytorch实现

torch.nn.RNN 有两个输入:

  1. input 即RNN网络的输入,维度应该为(seq_len, batch, input_size)。如果设置 batch_first=True ,输入维度则为 (batch, seq_len, input_size)
  2. h_0 即RNN网络的初始隐藏层状态,维度应该为(num_layers * num_directions, batch, input_size)。num_layers表示堆叠的RNN网络的层数。对于双向RNNs而言 num_directions = 2,对于单向RNNs而言,num_directions= 1。

torch.nn.RNN 有两个输出:

  1. out 即最后一层RNN网络在每一个时间步的输出,维度为 (seq_len, batch, num_directions * hidden_size) 。如果 batch_first=True 输出的维度为 (batch, seq_len, num_directions * hidden_size) .
  2. h_n 即RNN网络所有层的最后一个时间步的隐藏层状态,维度为 (num_layers * num_directions, batch, hidden_size) . h_n 的维度不受 batch_first=True 的影响。

下图是一个LSTM网络前向传播的例子,batch size=1, LSTM含有(h,c)两个中间层状态,而RNN和GRU都只有一个中间层状态h。

强调一下:

out is the output of the RNN from all time steps from the last RNN layer .
h_n is the hidden value from the last time-step of all RNN layers .
# Initialize the RNN.
rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, num_layers = 1, batch_first=True)
# input size : (batch, seq_len, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)
# out shape = (batch, seq_len, num_directions * hidden_size)
# h_n shape  = (num_layers * num_directions, batch, hidden_size)
out, h_n = rnn(inputs)
print('Input: ', inputs.shape, '\n', inputs)
print('\nOutput: ', out.shape, '\n', out)
print('\nHidden: ', h_n.shape, '\n', h_n)

其中,输入、输出、隐层状态的维度分别为:

input shape = [4, 5, 1] ,4表示batch size, 5表示步长,即SEQ_LENGTH,1表示每个步长上数据特征的维度。
out shape = [4, 5, 2] , 4表示batch size, 5表示步长,即SEQ_LENGTH,2表示隐层状态的维度,即隐层每一个步长上的数据维度。
h_n shape = [1, 4, 2] ,1表示最后一个步长, 4表示batch size,2表示隐层状态的维度,即隐层每一个步长上的数据维度。

双向RNNs的Pytorch实现

与基本RNN实现相比,双向RNN只需要在实例化时设置 bidirectional=True .

rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, batch_first=True, num_layers = 1, bidirectional = True)
# input size : (batch_size , seq_len, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)
# out shape = (batch, seq_len, num_directions * hidden_size)
# h_n shape  = (num_layers * num_directions, batch, hidden_size)
out, h_n = rnn(inputs)
print('Input: ', inputs.shape, '\n', inputs)
print('\nOutput: ', out.shape, '\n', out)
print('\nHidden: ', h_n.shape, '\n', h_n)
input shape = [4, 5, 1] ,4表示batch size, 5表示步长,即SEQ_LENGTH,1表示每个步长上数据特征的维度。
out shape = [4, 5, 4] ,4表示batch size, 5表示步长,即SEQ_LENGTH,4=2*2表示一共2个RNN方向(前向和后向)每个方向的隐层状态维度为2,即隐层每一个步长上的数据维度为2。
h_n shape = [2, 4, 2] ,2=2*1表示2个RNN方向(前向和后向),每个方向取最后1个步长, 4表示batch size,2表示隐层状态的维度,即隐层每一个步长上的数据维度。

双向RNN(BiRNN)两个方向上的模型结果分离

如下面的代码所示,可以分别将前向和后向的out, h_n的计算结果分离出来。在实现过程中一定要注意在实例化模型的时候是否有设置batch_first=True,注意保持各个维度含义的一致性。

#out 
out_reshaped = out.view(BATCH_SIZE, SEQ_LENGTH, 2, HIDDEN_SIZE)
print("Shape of the output after directions are separated: ", out_reshaped.shape)
out_forward = out_reshaped[:, :, 0, :]
out_backward = out_reshaped[:, :, 1, :]
print("Forward output: ", out_forward.shape, "\n", out_forward)
print("\n\nBackward output: ", out_backward.shape, "\n", out_backward)
h_n_reshaped = h_n.view(1, 2, BATCH_SIZE, HIDDEN_SIZE)
print("Shape of the hidden after directions are separated: ", h_n_reshaped.shape)
h_n_forward = h_n_reshaped[:, 0, :, :]
h_n_backward = h_n_reshaped[:, 1, :, :]
print("Forward h_n: ", h_n_forward.shape, "\n", h_n_forward)
print("\n\nBackward h_n: ", h_n_backward.shape, "\n", h_n_backward)

多层双向RNN的pytorch实现:Stacked Bidirectional RNN

只需要在实例化的时候设置 bidirectional=True num_layers = 3即可。

rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, batch_first=True, num_layers = 3, bidirectional = True)
# input size : (batch_size , seq_len, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)
# out shape = (batch, seq_len, num_directions * hidden_size)
# h_n shape  = (num_layers * num_directions, batch, hidden_size)
out, h_n = rnn(inputs)
input shape = [4, 5, 1] ,4表示batch size, 5表示步长,即SEQ_LENGTH,1表示每个步长上数据特征的维度。
out shape = [4, 5, 4] ,4表示batch size, 5表示步长,即SEQ_LENGTH,4=2*2表示最后一层BiRNN一共2个RNN方向(前向和后向)每个方向的隐层状态维度为2,即隐层每一个步长上的数据维度为2。
h_n shape = [6, 4, 2] ,6=3*2*1表示3层BiRNN网络,每层2个RNN方向(前向和后向),每个方向取最后1个步长, 4表示batch size,2表示隐层状态的维度,即隐层每一个步长上的数据维度。

多层双向RNN(BiRNN)模型结果分离

如下面的代码所示,可以分别将前向和后向的out, h_n的计算结果分离出来。在实现过程中一定要注意在实例化模型的时候是否有设置batch_first=True,注意保持各个维度含义的一致性。

#out
out_reshaped = out.view(BATCH_SIZE, SEQ_LENGTH, 2, HIDDEN_SIZE)
print("Shape of the output after directions are separated: ", out_reshaped.shape)
out_forward = out_reshaped[:, :, 0, :]
out_backward = out_reshaped[:, :, 1, :]
print("Forward output: ", out_forward.shape, "\n", out_forward)
print("\n\nBackward output: ", out_backward.shape, "\n", out_backward)
h_n_reshaped = h_n.view(3, 2, BATCH_SIZE, HIDDEN_SIZE)
print("Shape of the hidden after directions are separated: ", h_n_reshaped.shape)
h_n_forward = h_n_reshaped[:, 0, :, :]
h_n_backward = h_n_reshaped[:, 1, :, :]
print("Forward h_n: ", h_n_forward.shape, "\n", h_n_forward)
print("\n\nBackward h_n: ", h_n_backward.shape, "\n", h_n_backward)

全文完整代码

#%%
import torch
import torch.nn as nn
data = torch.Tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])
INPUT_SIZE = 1
SEQ_LENGTH = 5
HIDDEN_SIZE = 2
NUM_LAYERS = 1 #单层RNN网络
BATCH_SIZE = 4
#%% RNN.
rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, num_layers = 1, batch_first=True)
# input size : (batch, seq_len, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)
print('input size:',inputs.shape)
# out shape = (batch, seq_len, num_directions * hidden_size)
# h_n shape  = (num_layers * num_directions, batch, hidden_size)
out, h_n = rnn(inputs)
print('out size:',out.shape)
print('h_n size:',h_n.shape)
#%% BiRNN
rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, batch_first=True, num_layers = 1, bidirectional = True)
# input size : (batch_size , seq_len, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)
# out shape = (batch, seq_len, num_directions * hidden_size)
# h_n shape  = (num_layers * num_directions, batch, hidden_size)
out, h_n = rnn(inputs)
print('Input: ', inputs.shape, '\n', inputs)
print('\nOutput: ', out.shape, '\n', out)
print('\nHidden: ', h_n.shape, '\n', h_n)
#%% BiRNN seperated out
out_reshaped = out.view(BATCH_SIZE, SEQ_LENGTH, 2, HIDDEN_SIZE)
print("Shape of the output after directions are separated: ", out_reshaped.shape)
out_forward = out_reshaped[:, :, 0, :]
out_backward = out_reshaped[:, :, 1, :]
print("Forward output: ", out_forward.shape, "\n", out_forward)
print("\n\nBackward output: ", out_backward.shape, "\n", out_backward)
h_n_reshaped = h_n.view(1, 2, BATCH_SIZE, HIDDEN_SIZE)
print("Shape of the hidden after directions are separated: ", h_n_reshaped.shape)
h_n_forward = h_n_reshaped[:, 0, :, :]
h_n_backward = h_n_reshaped[:, 1, :, :]
print("Forward h_n: ", h_n_forward.shape, "\n", h_n_forward)
print("\n\nBackward h_n: ", h_n_backward.shape, "\n", h_n_backward)
#%% Stacked Bidirectional RNN
rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, batch_first=True, num_layers = 3, bidirectional = True)
# input size : (batch_size , seq_len, input_size)
inputs = data.view(BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE)
# out shape = (batch, seq_len, num_directions * hidden_size)
# h_n shape  = (num_layers * num_directions, batch, hidden_size)
out, h_n = rnn(inputs)
#%% Stacked BiRNN Separated out
out_reshaped = out.view(BATCH_SIZE, SEQ_LENGTH, 2, HIDDEN_SIZE)
print("Shape of the output after directions are separated: ", out_reshaped.shape)
out_forward = out_reshaped[:, :, 0, :]
out_backward = out_reshaped[:, :, 1, :]
print("Forward output: ", out_forward.shape, "\n", out_forward)
print("\n\nBackward output: ", out_backward.shape, "\n", out_backward)
h_n_reshaped = h_n.view(3, 2, BATCH_SIZE, HIDDEN_SIZE)
print("Shape of the hidden after directions are separated: ", h_n_reshaped.shape)
h_n_forward = h_n_reshaped[:, 0, :, :]
h_n_backward = h_n_reshaped[:, 1, :, :]
print("Forward h_n: ", h_n_forward.shape, "\n", h_n_forward)
print("\n\nBackward h_n: ", h_n_backward.shape, "\n", h_n_backward)


完结,撒花!

英文原文链接:


补充:双向LSTM 如何取output和hidden_state各个方向上的最后一个step上的结果。

来源: pytorch 中LSTM模型获取最后一层的输出结果,单向或双向

import torch.nn as nn
import torch
seq_len = 20
batch_size = 64
embedding_dim = 100
num_embeddings = 300
hidden_size = 128
number_layer = 3
input = torch.randint(low=0,high=256,size=[batch_size,seq_len])  #[64,20]
embedding = nn.Embedding(num_embeddings,embedding_dim)
input_embeded = embedding(input)  #[64,20,100]
#转置,变换batch_size 和seq_len
# input_embeded = input_embeded.transpose(0,1)
# input_embeded = input_embeded.permute(1,0,2)
#实例化lstm
lstm = nn.LSTM(input_size=embedding_dim,hidden_size=hidden_size,batch_first=True,num_layers=number_layer,bidirectional=True)
output,(h_n,c_n) = lstm(input_embeded)
print(output.size()) #[64,20,128*2]       [batch_size,seq_len,hidden_size]
print(h_n.size()) #[3*2,64,128]           [number_layer,batch_size,hidden_size]
print(c_n.size()) #同上
#获取反向的最后一个output
output_last = output[:,0,-128:]
#获反向最后一层的h_n
h_n_last = h_n[-1]
print(output_last.size())
print(h_n_last.size())
# 反向最后的output等于最后一层的h_n
print(output_last.eq(h_n_last))