tensorflow中lstm,GRU

GRU

GRU

其实单看输出,GRU的输出是和简单的RNN一样的,都只有一个hidden_state。所以在tensorflow中它的输出其实和RNN layer一样:

1
2
3
4
5
6
7
import tensorflow as tf

inputs = tf.random.normal([32, 10, 8])
gru = tf.keras.layers.GRU(4)
output = gru(inputs)
print(output.shape)
>> (32, 4)

其中有两个可以传递给GRU的参数,一个是return_state,一个是return_sequence。两个值都是bool类型。如果单独传递return_sequence=True,那么输出将只有一个值,也就是每一个时间步的序列:

1
2
3
4
gru = tf.keras.layers.GRU(4, return_sequences=True)
output = gru(inputs)
print(output.shape)
>> (32, 10, 4)

如果单独传递return_state=True,那么输出将会是两个值,可以仔细看官方文档中的说明是Boolean. Whether to return the last state in addition to the output. Default:False.`也就是output和最后的hidden_state会一起输出,并且output会等于final_state:

1
2
3
4
5
6
gru = tf.keras.layers.GRU(4, return_state=True)
output, final_state = gru(inputs)
print(output.shape)
print(final_state.shape) # output=final_state
>> (32, 4)
>> (32, 4)

如果单独传递return_sequences=True,LSTM将只返回整个序列!

1
2
3
4
5
lstm = tf.keras.layers.LSTM(4,return_sequences=True)
inputs = tf.random.normal([32, 10, 8])
whole_seq_output = lstm(inputs)
print(whole_seq_output.shape)
>> (32, 10, 4)

那如果两个值都设置成True呢?这将返回两个输出,第一个输出是整个序列,第二个输出是最终的state。注意这里并没有output了,因为output其实是sequence中最后一个序列sequence[:,-1,:]

1
2
3
4
5
6
gru = tf.keras.layers.GRU(4, return_sequences=True, return_state=True)
whole_sequence_output, final_state = gru(inputs)
print(whole_sequence_output.shape)
print(final_state.shape)
>> (32, 10, 4)
(32, 4)

LSTM

轮到LSTM,因为架构上跟GRU有点区别,所以在返回结果上就多了一个carry_state.

LSTM

想要了解LSTM的具体计算,参考博客

在tensorflow中一样有return_state和return_sequences:

1
2
3
4
5
inputs = tf.random.normal([32, 10, 8])
lstm = tf.keras.layers.LSTM(4)
output = lstm(inputs)
print(output.shape)
>> (32, 4)

如果单独传递return_state,这里和GRU不一样的地方在于lstm有两个state,一个是memory_state一个是carry_state

1
2
3
4
5
6
7
8
lstm = tf.keras.layers.LSTM(4,return_state=True)
output, final_memory_state, final_carry_state = lstm(inputs)
print(output.shape)
print(final_memory_state.shape) # final_memory_state=output
print(final_carry_state.shape)
>> (32, 4)
>> (32, 4)
>> (32, 4)

如果同时设置True

1
2
3
4
5
6
7
8
lstm = tf.keras.layers.LSTM(4,return_sequences=True,return_state=True)
whole_seq_output, final_memory_state, final_carry_state = lstm(inputs)
print(whole_seq_output.shape)
print(final_memory_state.shape) # final_memory_state=whole_seq_output[:,-1,:]
print(final_carry_state.shape)
>> (32, 10, 4)
>> (32, 4)
>> (32, 4)

GRU vs LSTM

至于我们在训练模型的时候选择哪一个cell作为RNN的cell,cs224n课程给出的答案是:

Researchers have proposed many gated RNN variants, but LSTM and GRU are the most widely-used.

Rule of thumb: LSTM is a good default choice (especially if your data has particularly long dependencies, or you have lots of training data); Switch to GRUs for speed and fewer parameters.

LSTM doesn’t guarantee that there is no vanishing/exploding gradient, but it does provide an easier way for the model to learn long-distance dependencies.

在2023年的今天,lstm也不再是研究者青睐的对象,最火的模型变成了Transformer:

image-20230313095438649

这里也贴出2022年的最新WMT的结果