基于attention的BiGRU的keras实现_keras实现bigru_ithicker的博客

相关文章推荐
阳光的皮带 · 袁翰青_已故_九三学社中央委员会· 1 年前 ·
俊逸的黄瓜 · 清库存！特斯拉Model ...· 1 年前 ·
悲伤的数据线 · 瀚晖制药有限公司招聘信息- 上海本地宝· 1 年前 ·
帅气的灭火器 · 2023全国帆船锦标赛名单 - 抖音· 1 年前 ·
爱喝酒的牛肉面 · 拉简-隆多——那个全联盟唯一拥有考辛斯说明书 ...· 1 年前 ·
from tensorflow.python.keras.layers import Input, GRU, Dense, Concatenate, TimeDistributed, Bidirectional
from tensorflow.python.keras.models import Model
from layers.attention import AttentionLayer
def define_nmt(hidden_size, batch_size, en_timesteps, en_vsize, fr_timesteps, fr_vsize):
    """ Defining a NMT model """
    # Define an input sequence and process it.
    if batch_size:
        encoder_inputs = Input(batch_shape=(batch_size, en_timesteps, en_vsize), name='encoder_inputs')
        decoder_inputs = Input(batch_shape=(batch_size, fr_timesteps - 1, fr_vsize), name='decoder_inputs')
    else:
        encoder_inputs = Input(shape=(en_timesteps, en_vsize), name='encoder_inputs')
        decoder_inputs = Input(shape=(fr_timesteps - 1, fr_vsize), name='decoder_inputs')
    # Encoder GRU
    encoder_gru = Bidirectional(GRU(hidden_size, return_sequences=True, return_state=True, name='encoder_gru'), name='bidirectional_encoder')
    encoder_out, encoder_fwd_state, encoder_back_state = encoder_gru(encoder_inputs)
    # Set up the decoder GRU, using `encoder_states` as initial state.
    decoder_gru = GRU(hidden_size*2, return_sequences=True, return_state=True, name='decoder_gru')
    decoder_out, decoder_state = decoder_gru(
        decoder_inputs, initial_state=Concatenate(axis=-1)([encoder_fwd_state, encoder_back_state])
    # Attention layer
    attn_layer = AttentionLayer(name='attention_layer')
    attn_out, attn_states = attn_layer([encoder_out, decoder_out])
    # Concat attention input and decoder GRU output
    decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_out, attn_out])
    # Dense layer
    dense = Dense(fr_vsize, activation='softmax', name='softmax_layer')
    dense_time = TimeDistributed(dense, name='time_distributed_layer')
    decoder_pred = dense_time(decoder_concat_input)
    # Full model
    full_model = Model(inputs=[encoder_inputs, decoder_inputs], outputs=decoder_pred)
    full_model.compile(optimizer='adam', loss='categorical_crossentropy')
    full_model.summary()
    """ Inference model """
    batch_size = 1
    """ Encoder (Inference) model """
    encoder_inf_inputs = Input(batch_shape=(batch_size, en_timesteps, en_vsize), name='encoder_inf_inputs')
    encoder_inf_out, encoder_inf_fwd_state, encoder_inf_back_state = encoder_gru(encoder_inf_inputs)
    encoder_model = Model(inputs=encoder_inf_inputs, outputs=[encoder_inf_out, encoder_inf_fwd_state, encoder_inf_back_state])
    """ Decoder (Inference) model """
    decoder_inf_inputs = Input(batch_shape=(batch_size, 1, fr_vsize), name='decoder_word_inputs')
    encoder_inf_states = Input(batch_shape=(batch_size, en_timesteps, 2*hidden_size), name='encoder_inf_states')
    decoder_init_state = Input(batch_shape=(batch_size, 2*hidden_size), name='decoder_init')
    decoder_inf_out, decoder_inf_state = decoder_gru(
        decoder_inf_inputs, initial_state=decoder_init_state)
    attn_inf_out, attn_inf_states = attn_layer([encoder_inf_states, decoder_inf_out])
    decoder_inf_concat = Concatenate(axis=-1, name='concat')([decoder_inf_out, attn_inf_out])
    decoder_inf_pred = TimeDistributed(dense)(decoder_inf_concat)
    decoder_model = Model(inputs=[encoder_inf_states, decoder_init_state, decoder_inf_inputs],
                          outputs=[decoder_inf_pred, attn_inf_states, decoder_inf_state])
    return full_model, encoder_model, decoder_model
if __name__ == '__main__':
    """ Checking nmt model for toy examples """
    define_nmt(64, None, 20, 30, 20, 20)
github链接：https://github.com/Razzaghnoori/mt_biGRU_attention_keras/blob/master/model.py
                    from tensorflow.python.keras.layers import Input, GRU, Dense, Concatenate, TimeDistributed, Bidirectionalfrom tensorflow.python.keras.models import Modelfrom layers.attention import AttentionLayerdef define_nmt(hidden_size, batch_size, en_timesteps, e
IMDB影评高度分类数据集，来自IMDB的25,000条影评，被标记为正面/纵向两种评价。影评已被预先为词下标构成的序列。方便起见，单词的下标基于它在数据集中出现的频率标定，例如整数3所编码的词为数据集中第3常出现的词。
按照惯例，0不代表任何特定的词，而编码为任何未知单词。
$ python imdb_attention.py
训练时间（每纪元）
 Val准确率
Val损失
所需Epoch数
0.8339
 0.3815
双向LSTM
				人工智能-项目实践-自注意力机制-通过BiGRU+注意力机制对关系进行自动抽取
数据集属于网上公开数据集（origin_data文件夹下relation2id;test;train）
vec.txt:谷歌公开词向量集
模型了参照paper文件夹下两篇论文（Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification，Neural Relation Extraction with Selective Attention over Instances）;亦包含对应的总结
				该代码为基于Keras的attention实战，环境配置： Wn10+CPU i7-6700 
、Pycharm 2018、 python 3.6 、、numpy 1.14.5 、Keras 2.0.2 
Matplotlib 2.2.2 经过小编亲自调试，可以使用，适合初学者从代码的角度了解attention机制。
3、相关技术
相比LSTM，使用GRU能够达到相当的效果，准确率不会差多少，并且相比之下GRU更容易进行训练，能够很大程度上提高训练效率，因此硬件资源有限时会更倾向于使用GRU。
GRU结构图如下：
4、完整代码和步骤
此代码的依赖环境如下：
tensorflow==2.5.0
numpy==1.19.5
keras==2.6.0
matplotlib==3.5.2
————————————————
版权声明：本文为CSDN博主「AI信仰者」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/qq_30803353/article/details/129108978
helps!
中文命名实体识别任务下的Keras解决方案，下游模型支持BiLSTM-CRF/BiGRU-CRF/IDCNN-CRF/single-CRF，预训练语言模型采用BERT系列（谷歌的预训练语言模型：支持BERT/RoBERTa/ALBERT）。如果对您有帮助，欢迎点Star呀~
中文文档参阅：
Future
project
currently
under
migration
tensorflow
which
				提出了一种基于 Attention 机制的卷积神经网络(convolutional 
neural network，CNN)-GRU (gated recurrent unit)短期电力负荷 
预测方法，该方法将历史负荷数据作为输入，搭建由一维卷 
积层和池化层等组成的 CNN 架构，提取反映负荷复杂动态变 
化的高维特征；将所提特征向量构造为时间序列形式作为 
GRU 网络的输入，建模学习特征内部动态变化规律，并引入 
Attention 机制通过映射加权和学习参数矩阵赋予 GRU 隐含 
				Matlab基于双向门控循环单元(BiGRU)的多变量回归预测，BiGRU回归预测，多输入单输出模型
多元回归预测 | Matlab基于双向门控循环单元(BiGRU)的多变量回归预测，BiGRU回归预测，多输入单输出模型，多输入单输出模型。
评价指标包括:MAE、MSE、RMSE和MAPE等，代码质量极高，方便学习和替换数据。要求2020版本及以上。
				  两周前的周末，笔者决定对人物关系分类进行再次尝试。
  为什么说是再次尝试呢？因为笔者之前已经写过一篇文章NLP（二十一）人物关系抽取的一次实战，当时的标注数据大约2900条，使用的模型也比较简单，为BERT+Bi-GRU+Attention+FC结构，其中BERT用作特征提取，该模型在原有数据集上的F1为79%。
  经过笔者一年断断续续的努力，现在的标注样本已经达到3900多条。鉴于笔者已做过BERT微调相关工作，当然希望在此数据集上进行再次尝试。
  现有的人物关系数据集大约3900多条，分布如下
```python
from keras.layers import Input, Embedding, LSTM, Dense, Dot, Activation
from keras.models import Model
from keras.optimizers import Adam
然后，我们定义我们的模型：
```python
# 定义参数
max_sequence_length = 100
embedding_dim = 100
lstm_units = 128
attention_dim = 50
output_dim = 1
# 定义输入
input_sequences = Input(shape=(max_sequence_length,), dtype='int32')
# 定义嵌入层，用于将输入的整数序列转换为密集向量
x = Embedding(input_dim=vocab_size + 1,
              output_dim=embedding_dim,
              input_length=max_sequence_length,
              mask_zero=True,
              name='Embedding')(input_sequences)
# 定义LSTM层，用于处理输入序列
lstm = LSTM(units=lstm_units,
            return_sequences=True,
            name='LSTM')(x)
# 定义注意力机制，用于给LSTM层的输出分配权重
attention = Dense(units=attention_dim, activation='tanh', name='Attention')(lstm)
attention = Dot(axes=(2, 1), name='Dot')([attention, lstm])
attention = Activation('softmax', name='Softmax')(attention)
# 加权求和，使用注意力权重加权LSTM输出
context = Dot(axes=(2, 1), name='Context')([attention, lstm])
# 最终输出，使用sigmoid激活函数进行二元分类
output = Dense(units=output_dim, activation='sigmoid', name='Output')(context)
# 定义模型
model = Model(inputs=input_sequences, outputs=output)
# 编译模型并训练
model.compile(optimizer=Adam(lr=0.001),
              loss='binary_crossentropy',
              metrics=['accuracy'])
此代码定义了一个具有嵌入层、LSTM层、注意力层和输出层的模型。在嵌入层中，我们将输入序列转换为密集向量。在LSTM层中，我们处理输入序列。在注意力层中，我们分配注意力权重。在输出层中，我们使用sigmoid激活函数进行二元分类。最后，我们使用Adam优化器编译模型，并使用二元交叉熵损失和准确率作为评估指标进行训练。
注意：以上代码是一种常见的LSTM Attention实现方法，但也可以有其他实现方式。