TF2中的custom layer&model&training

Posted on 2023-02-13 Edited on 2023-04-28 In tensorflow Symbols count in article: 7.8k Reading time ≈ 7 mins.

在上Coursera上关于Tensorflow的高级用法课程时，老师简略介绍了custom layer和custom model的用法，但后来看到其实课程覆盖的内容比较简单，除了介绍了__init__和call两个可override的function外没有介绍其他的。偶然看到一篇博客详细介绍了在tensorflow中如何使用sub classing来搭建模型，写的非常好，这里贴上链接

我们知道在tensorflow中有三种搭建模型的方式： 1) sequential API 也就是想创建一个Sequential实例，然后通过add的方式把一个layer加到模型中去，如：

# declare input shape 
seq_model = tf.keras.Sequential()
seq_model.add(tf.keras.Input(shape=imput_dim))

# Block 1
seq_model.add(tf.keras.layers.Conv2D(32, 3, strides=2, activation="relu"))
seq_model.add(tf.keras.layers.MaxPooling2D(3))
seq_model.add(tf.keras.layers.BatchNormalization())

# Block 2
seq_model.add(tf.keras.layers.Conv2D(64, 3, activation="relu"))
seq_model.add(tf.keras.layers.BatchNormalization())
seq_model.add(tf.keras.layers.Dropout(0.3))

# Now that we apply global max pooling.
seq_model.add(tf.keras.layers.GlobalMaxPooling2D())

# Finally, we add a classification layer.
seq_model.add(tf.keras.layers.Dense(output_dim))

sequential的方式在researcher中用的不多，随着模型变得越来越复杂，可以看到tensorflow的application模块实现的官方模型代码中，已经见不到这种形式了。 2) Functional API 正如其名，就是用函数调用的方式来搭建模型：

# declare input shape 
input = tf.keras.Input(shape=(imput_dim))

# Block 1
x = tf.keras.layers.Conv2D(32, 3, strides=2, activation="relu")(input)
x = tf.keras.layers.MaxPooling2D(3)(x)
x = tf.keras.layers.BatchNormalization()(x)

# Block 2
x = tf.keras.layers.Conv2D(64, 3, activation="relu")(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dropout(0.3)(x)

# Now that we apply global max pooling.
gap = tf.keras.layers.GlobalMaxPooling2D()(x)

# Finally, we add a classification layer.
output = tf.keras.layers.Dense(output_dim)(gap)

# bind all
func_model = tf.keras.Model(input, output)

注意：这种方式最终要使用tf.keras.Model()来将inputs和outputs接起来。

Model sub-classing API 第三种方式是现在用的最多的方式。之前我没理解layer和model两种调用方式的区别，我觉得就是一系列运算，我们把输入输进来，return output结果的一个过程。但如果一个类它是Layer的子类，它比model的子类多了一个功能，它有state属性，也就是我们熟悉的weights。比如Dense layer，我们知道它做了线性运算+激活函数，其中的weights就是我们assign给每一个feature的权重，但其实我们并不只是想要这一类别的运算，比如下面的：

class SimpleQuadratic(Layer):

    def __init__(self, units=32, activation=None):
        '''Initializes the class and sets up the internal variables'''
        # YOUR CODE HERE
        super(SimpleQuadratic, self).__init__()
        self.units = units
        self.activation = tf.keras.activations.get(activation)
    
    def build(self, input_shape):
        '''Create the state of the layer (weights)'''
        # a and b should be initialized with random normal, c (or the bias) with zeros.
        # remember to set these as trainable.
        # YOUR CODE HERE
        a_init = tf.random_normal_initializer()
        b_init = tf.random_normal_initializer()
        c_init = tf.zeros_initializer()
        
        self.a = tf.Variable(name = "kernel", initial_value = a_init(shape= (input_shape[-1], self.units), 
                                                                    dtype= "float32"), trainable = True)
        
        self.b = tf.Variable(name = "kernel", initial_value = b_init(shape= (input_shape[-1], self.units), 
                                                                    dtype= "float32"), trainable = True)
        
        self.c = tf.Variable(name = "bias", initial_value = c_init(shape= (self.units,), 
                                                                    dtype= "float32"), trainable = True)
   
    def call(self, inputs): 
        '''Defines the computation from inputs to outputs'''
        # YOUR CODE HERE
        result = tf.matmul(tf.math.square(inputs), self.a) + tf.matmul(inputs, self.b) + self.c
        return self.activation(result)

上面的代码将inputs平方之后和a做乘积，之后再加上inputs和b的乘积，最终返回的是和。这样的运算是tf.keras.layer中没有的。这个时候我们自己customize layer就很方便。还有一个很方便的地方在于很多模型其实是按模块来的，模块内部的layer很类似。这个时候我们就可以把这些模型内的layer包起来变成一个layer的子类（Module），再定义完这些module之后我们使用Model把这些module再包起来，这就是我们最终的model。这时候我们就可以看到Model和Layer子类的区别了，虽然两者都可以实现输入进来之后实现一系列运算返回运算结果，但后者可以实现更灵活的运算，而前者往往是在把每一个模块定义好之后最终定义我们训练模型的类。 > In general, we use the Layer class to define the inner computation blocks and will use the Model class to define the outer model, practically the object that we will train. ---粘贴自博客

You can treat any model as if it were a layer by invoking it on an Input or on the output of another layer. By calling a model you aren't just reusing the architecture of the model, you're also reusing its weights

同样值得注意的是，model的子类也可以像layer那样使用functional API来调用，比如：

encoder_input = keras.Input(shape=(28, 28, 1), name="original_img")
x = layers.Conv2D(16, 3, activation="relu")(encoder_input)
x = layers.Conv2D(32, 3, activation="relu")(x)
x = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(32, 3, activation="relu")(x)
x = layers.Conv2D(16, 3, activation="relu")(x)
encoder_output = layers.GlobalMaxPooling2D()(x)

encoder = keras.Model(encoder_input, encoder_output, name="encoder")
encoder.summary()

decoder_input = keras.Input(shape=(16,), name="encoded_img")
x = layers.Reshape((4, 4, 1))(decoder_input)
x = layers.Conv2DTranspose(16, 3, activation="relu")(x)
x = layers.Conv2DTranspose(32, 3, activation="relu")(x)
x = layers.UpSampling2D(3)(x)
x = layers.Conv2DTranspose(16, 3, activation="relu")(x)
decoder_output = layers.Conv2DTranspose(1, 3, activation="relu")(x)

decoder = keras.Model(decoder_input, decoder_output, name="decoder")
decoder.summary()

autoencoder_input = keras.Input(shape=(28, 28, 1), name="img")
encoded_img = encoder(autoencoder_input)
decoded_img = decoder(encoded_img)
autoencoder = keras.Model(autoencoder_input, decoded_img, name="autoencoder")
autoencoder.summary()

我们以sub-classing的方式定义的model是没有办法调用summary来看模型架构的，作者也给出了解决方案：github comments

方法就是在Model的子类中添加build_graph方法：

1
2
3

def build_graph(self, raw_shape):
        x = tf.keras.layers.Input(shape=raw_shape)
        return Model(inputs=[x], outputs=self.call(x))

这样我们就可以正常调用summary()

cm.build_graph(raw_input).summary()
# 不仅如此还能使用tf.keras.utils.plot_model来生成png
tf.keras.utils.plot_model(
    model.build_graph(raw_input),                      # here is the trick (for now)
    to_file='model.png', dpi=96,              # saving  
    show_shapes=True, show_layer_names=True,  # show shapes and layer name
    expand_nested=False                       # will show nested block
)

作者同样推荐了一篇博客讲tensorflow中保存模型的各种方式：博客地址.非常推荐阅读

总结一下就是：

对于Functional API创建的模型，最好的保存模型和导入模型的方式是：

1
2
3

model.save('path_to_my_model.h5')
del model
model = keras.models.load_model('path_to_my_model.h5')

以上方式会将模型的架构，weights以及训练过程中的设定（也就是model.compile()）的内容全部保存。

对于sub class创建的模型，推荐的方式是用save_weights

1	model.save_weights('path_to_my_weights', save_format='tf')

如果想要加载weights，必须要知道原来用sub class建立模型的code。不仅如此，还需要用原来的code先build起模型，让模型知道输入tensor的shape以及dtype，如果没有build这一步程序将会报错。

1
2
3

new_model = MiniInception()
new_model.build((None, x_train.shape[1:])) # or .build((x_train.shape))
new_model.load_weights('net.h5')

tf.function

在我们定义custum training 过程中时我们经常会用到这个装饰器@tf.function

@tf.function
def train_step(step, x, y):
   '''
   input: x, y <- typically batches 
   input: step <- batch step
   return: loss value
   '''
    # start the scope of gradient 
   with tf.GradientTape() as tape:
      logits = model(x, training=True) # forward pass
      train_loss_value = loss_fn(y, logits) # compute loss 

    # compute gradient 
   grads = tape.gradient(train_loss_value, model.trainable_weights)

    # update weights
   optimizer.apply_gradients(zip(grads, model.trainable_weights))

    # update metrics
   train_acc_metric.update_state(y, logits)
    
    # write training loss and accuracy to the tensorboard
   with train_writer.as_default():
        tf.summary.scalar('loss', train_loss_value, step=step)
        tf.summary.scalar(
            'accuracy', train_acc_metric.result(), step=step
        ) 
   return train_loss_value

先看如果一个函数不加这个装饰器会如何：

def f(x):
    print("Traced with", x)

for i in range(5):
    f(2)
    
f(3)

输出为：

Traced with 2
Traced with 2
Traced with 2
Traced with 2
Traced with 2
Traced with 3

加上装饰器：

@tf.function
def f(x):
    print("Traced with", x)

for i in range(5):
    f(2)
    
f(3)

输出为：

1 2	Traced with 2 Traced with 3

可以看到第二种加了装饰器的方式，即便是循环了5遍，我们仍然只有一行打印了2.

如果我们在上面的代码中print之前加上一行：

@tf.function
def f(x):
    print("Traced with", x)
    # add tf.print
    tf.print("Executed with", x)
for i in range(5):
    f(2)
    
f(3)

程序的输出就变成了：

Traced with 2
Executed with 2
Executed with 2
Executed with 2
Executed with 2
Executed with 2
Traced with 3
Executed with 3

可以看到tf.print就可以正常按loop运行。注意一点: 被tf.function装饰的函数只能包含operations而不能定义variable比如tf.Variable()