tensorflow中lstm,GRU

Posted on 2023-01-04 Edited on 2023-03-13 In tensorflow Symbols count in article: 2.7k Reading time ≈ 2 mins.

GRU

其实单看输出，GRU的输出是和简单的RNN一样的，都只有一个hidden_state。所以在tensorflow中它的输出其实和RNN layer一样：

import tensorflow as tf

inputs = tf.random.normal([32, 10, 8])
gru = tf.keras.layers.GRU(4)
output = gru(inputs)
print(output.shape)
>> (32, 4)

其中有两个可以传递给GRU的参数，一个是return_state，一个是return_sequence。两个值都是bool类型。如果单独传递return_sequence=True，那么输出将只有一个值，也就是每一个时间步的序列：

gru = tf.keras.layers.GRU(4, return_sequences=True)
output = gru(inputs)
print(output.shape)
>> (32, 10, 4)

如果单独传递return_state=True，那么输出将会是两个值，可以仔细看官方文档中的说明是Boolean. Whether to return the last state in addition to the output. Default:False.`也就是output和最后的hidden_state会一起输出，并且output会等于final_state：

gru = tf.keras.layers.GRU(4, return_state=True)
output, final_state = gru(inputs)
print(output.shape)
print(final_state.shape) # output=final_state
>> (32, 4)
>> (32, 4)

如果单独传递return_sequences=True，LSTM将只返回整个序列！

lstm = tf.keras.layers.LSTM(4,return_sequences=True)
inputs = tf.random.normal([32, 10, 8])
whole_seq_output = lstm(inputs)
print(whole_seq_output.shape)
>> (32, 10, 4)

那如果两个值都设置成True呢？这将返回两个输出，第一个输出是整个序列，第二个输出是最终的state。注意这里并没有output了，因为output其实是sequence中最后一个序列sequence[:,-1,:]

gru = tf.keras.layers.GRU(4, return_sequences=True, return_state=True)
whole_sequence_output, final_state = gru(inputs)
print(whole_sequence_output.shape)
print(final_state.shape)
>> (32, 10, 4)
(32, 4)

LSTM

轮到LSTM，因为架构上跟GRU有点区别，所以在返回结果上就多了一个carry_state.

想要了解LSTM的具体计算，参考博客

在tensorflow中一样有return_state和return_sequences：

inputs = tf.random.normal([32, 10, 8])
lstm = tf.keras.layers.LSTM(4)
output = lstm(inputs)
print(output.shape)
>> (32, 4)

如果单独传递return_state，这里和GRU不一样的地方在于lstm有两个state，一个是memory_state一个是carry_state

lstm = tf.keras.layers.LSTM(4,return_state=True)
output, final_memory_state, final_carry_state = lstm(inputs)
print(output.shape)
print(final_memory_state.shape) # final_memory_state=output
print(final_carry_state.shape)
>> (32, 4)
>> (32, 4)
>> (32, 4)

如果同时设置True

lstm = tf.keras.layers.LSTM(4,return_sequences=True,return_state=True)
whole_seq_output, final_memory_state, final_carry_state = lstm(inputs)
print(whole_seq_output.shape)
print(final_memory_state.shape) # final_memory_state=whole_seq_output[:,-1,:]
print(final_carry_state.shape)
>> (32, 10, 4)
>> (32, 4)
>> (32, 4)

GRU vs LSTM

至于我们在训练模型的时候选择哪一个cell作为RNN的cell，cs224n课程给出的答案是：

Researchers have proposed many gated RNN variants, but LSTM and GRU are the most widely-used.

Rule of thumb: LSTM is a good default choice (especially if your data has particularly long dependencies, or you have lots of training data); Switch to GRUs for speed and fewer parameters.

LSTM doesn’t guarantee that there is no vanishing/exploding gradient, but it does provide an easier way for the model to learn long-distance dependencies.

在2023年的今天，lstm也不再是研究者青睐的对象，最火的模型变成了Transformer：

这里也贴出2022年的最新WMT的结果

R-CNN vs SPP vs Fast R-CNN vs Faster R-CNN

Posted on 2022-11-02 Edited on 2025-05-08 In cv Symbols count in article: 6.2k Reading time ≈ 6 mins.

最近在object detection任务上读这几篇文章，见识到神仙打架。一开始我只是关注image segmentation的任务，其中instance segmentation任务中Mask-RCNN是其中比较火的一个model，所以就把跟这个模型相关的几个模型都找出来看了看。这里想记录下这几天看这几篇论文的心得体会，如果有写的不正确的地方，欢迎批评指正。

其实去仔细看这几篇论文很有意思，梳理一下时间线就是：

2014年Girshick提出了RCNN，用于解决accurate object detection 和 semantic segmentation。该模型有一个drawbacks是每次一张图片输入进来，需要产生~2000个region proposals，这些region的大小都是不一致的，但我们对图片进行分类的下游网络都是需要fixed size的图片，那怎么办呢？作者提出使用wraped方法，具体可以参考作者的论文。总之最终我们输入到SVM也就是分类器的region图片大小都是一致的。
为了解决每次输入网络的图片大小怎么样才能变成fixed size的vector，2015年he kaiming提出了SPP（spatial pyramid pooling），跟前者RCNN不一样的地方在于：1) 将region proposal的方法用在了图片输入cnn网络得到的feature map上，2)从feature map选择出来的region proposal不还是不一样大小么？作者没有使用wrap的方法，而是提出了一个SPP layer,这个layer可以接受任何大小的图片，最终都会转化成一个fixed size的向量，这样就可以轻松输入进SVM或者Dense layer进行分类了。
收到SPP的启发，Girshick在2015年提出了Fast-RCNN，将SPP layer重新替换成ROI Pooling，经过ROI pooling，输出的并不是SPP layer输出的金字塔式的向量了，而是只有一个。参考博客
经过前一轮的battle，虽然各自的模型都提出了自己的独特方法，但是无论是SPP Net还是Fast-RCNN都没有提出在选择ROI(region of interest)的方法。2016年He Kaiming再次强势入场，提出了产生region proposal的方法，它使用了一个单独的CNN网络来获取region proposal.得到了这些proposal之后再将他们传递给Roi Pooling layer，后面的过程和fast RCNN一致。这篇Faster-RCNN的方法作者中有he kaiming和Girshick，这里致敬下sun jian，感谢为computer vision领域贡献的灵感和创造。

Attention and Transformer model

Posted on 2022-10-27 Edited on 2025-05-08 In NLP Symbols count in article: 21k Reading time ≈ 20 mins.

斯坦福cs231n最新的课程中包含了attention的模型讲解，但是很可惜我们现在只能看到17年的老课程，在youtube上可以找到，课程主页是cs231n。可以在课程主页中下载对应的slides和查看推荐的blog，都是学习attention机制的好教材。另外我在学习cs231n课程过程中，也参考了吴恩达对于sequence model的讲解，它课程中也涉及到了attention机制，课后作业也包含了简单的attention机制的实现，可以作为辅助理解来看。这篇博客权当自己学习attention以及由此创造的attention系列模型比如transformer的记录。cs231n推荐的博客内容也是很通俗易懂，英文不好的同学有中文翻译可以参考。