How To Use Tensorflow Batchnormalization With Gradienttape?
Solution 1:
with gradient tape mode BatchNormalization layer should be called with argument training=True
example:
inp = KL.Input( (64,64,3) )
x = inp
x = KL.Conv2D(3, kernel_size=3, padding='same')(x)
x = KL.BatchNormalization()(x, training=True)
model = KM.Model(inp, x)
then moving vars are properly updated
>>> model.layers[2].weights[2]
<tf.Variable 'batch_normalization/moving_mean:0' shape=(3,)dtype=float32,numpy
=array([-0.00062087,0.00015137, -0.00013239], dtype=float32)>
Solution 2:
I just give up. I spent quiet a bit of time trying to make sense of a model that looks like:
model = tf.keras.Sequential([
tf.keras.layers.BatchNormalization(),
])
And I do give up because that thing looks like that:
My intuition was that BatchNorm these days is not as straight forward as it used to be and that is why it scales original distribution but not so much new distribution (which is a shame), but ain't nobody got time for that.
Edit: the reason for that behavior is that BN only calculates moments and normalizes batches during training. During training it maintains running averages of mean and deviation and once you switch to evaluation, parameters are used as constants. i.e. evaluation should not depend on normalization because evaluation can be used even for a single input and can not rely on batch statistics. Since constants are calculated on a different distribution, you are getting a higher error during evaluation.
Solution 3:
With Gradient Tape mode, you would usually find gradients like:
with tf.GradientTape() as tape:
y_pred = model(features)
loss = your_loss_function(y_pred, y_true)
gradients = tape.gradient(loss, model.trainable_variables)
train_op = model.optimizer.apply_gradients(zip(gradients, model.trainable_variables))
However, if your model contains BatchNormalization
or Dropout
layer (or any layer that has different train/test phases) then tf will fail building the graph.
A good practice would be to explicitly use trainable
parameter when obtaining output from a model. When optimizing use model(features, trainable=True)
and when predicting use model(features, trainable=False)
, in order to explicitly choose train/test phase when using such layers.
For PREDICT
and EVAL
phase, use
training = (mode == tf.estimator.ModeKeys.TRAIN)
y_pred = model(features, trainable=training)
For TRAIN
phase, use
with tf.GradientTape() as tape:
y_pred = model(features, trainable=training)
loss = your_loss_function(y_pred, y_true)
gradients = tape.gradient(loss, model.trainable_variables)
train_op = model.optimizer.apply_gradients(zip(gradients, model.trainable_variables))
Note that, iperov's answer works as well, except that you will need to set the training phase manually for those layers.
x = BatchNormalization()(x, training=True)
x = Dropout(rate=0.25)(x, training=True)
x = BatchNormalization()(x, training=False)
x = Dropout(rate=0.25)(x, training=False)
I'd recommended to have one get_model
function that returns the model, while changing the phase using training
parameter when calling the model.
Note:
If you use model.variables
when finding gradients, you'll get this warning
Gradients do not exist for variables
['layer_1_bn/moving_mean:0',
'layer_1_bn/moving_variance:0',
'layer_2_bn/moving_mean:0',
'layer_2_bn/moving_variance:0']
when minimizing the loss.
This can be resolved by computing gradients only against trainable variables. Replace model.variables
with model.trainable_variables
Post a Comment for "How To Use Tensorflow Batchnormalization With Gradienttape?"