Dropout Regularization in Deep Learning

Erika Oliver

Erika Oliver

· 3 min read
Regularization dropout

This course is a part of series of courses to learn deep learning with tensorflow.

Regularization: Dropout

Regularization is a set of techniques that prevent the learning process to completely fit the model to the training data which can lead to overfitting. It makes the model simpler, smooths out the learning curve, and hence makes it more ‘regular’. There are many techniques for regularization such as simplifying the model, adding weight regularization, weight decay, and so on. The most common regularization method is dropout.

Dropout is a technique that randomly ignores, or “drops out” a number of outputs of a layer by setting them to zeros. The dropout rate is the percentage of layer outputs set to zero (usually between 20% to 50%).

In Keras, we can add a dropout layer by introducing the Dropout layer.

Next, we introduce dropout layers:

Image

For this model, we get the learning curve in Figure 2. The validation MAE we get with the dropout is lower than without it. Note that the validation error is also lower than the training error in this case. One of the explanations might be that the dropout is used only during training, and the full network is used during the validation/testing with layers’ output scaled down by the dropout rate.

Image

Baselines: how to know the performance is reasonable?

Why do we need a baseline? For example, we have data consisting of 90% dog images, and 10% cat images. An algorithm that predicts the majority class for each data point, will have 90% accuracy on this dataset! That might sound good, but predicting the majority class is hardly a useful classifier. We need to perform better.

A baseline result is the simplest possible prediction. For some problems, this may be a random result, and for others, it may be the most common class prediction. Since we are focused on a regression task in this lesson, we can use averages or medians of the class distribution known as central tendency measures as the result for all predictions.

Scikit-learn provides DummyRegressor, which serves as a baseline regression algorithm. We’ll choose mean (average) as our central tendency measure.

Image

The result of the baseline is $9,190, and we definitely did better than this (around $3,000) in our previous experiments in this lesson.

Erika Oliver

About Erika Oliver

Erika Oliver is a successful entrepreuner. She is the founder of Acme Inc, a bootstrapped business that builds affordable SaaS tools for local news, indie publishers, and other small businesses.

Copyright © 2024 Stablo. All rights reserved.
Made by Stablo