How to use Neural Networks

Introduction

In the previous course we learned about Loss function, Backpropagation and Gradient Descent, In this blog post we will learn about neural networks and how to implement them. so let's get started.

A neural network, just like any machine learning method, learns how to perform tasks by processing data and adjusting its model to best predict the desired outcome. Most popular machine learning tasks are:

Classification: given data and true labels or categories for each data point, train a model that predicts for each data example what its label should be. For example, given data of previous fire hazards, our model can learn how to predict whether a fire will occur for a given day in the future, with all the factors taken into account.
Regression: given data and true continuous value for each data point, train a model that can predict values for each data example. For example, given the previous stock market data, we can build a regression model that forecasts what the stock market price will be at a specific point in time when the data is available.

Parametric models such as neural networks are described by parameters: configuration variables representing the model’s knowledge. We can tweak the parameters using the training data and we can evaluate the performance of the model using hold-out test data the model has not seen during training.

Take a look at the main components of a neural network learning pipeline depicted in the workspace:

Input data: this is used to train a neural network model you need to provide it with some training data.
An optimizer: this is an algorithm that based on the training data adjusts the parameters of the network in order to perform the task at hand.
A loss or cost function: this informs the optimizer whether it is doing a good job on the training data and how to adjust the parameters in the right direction.
Evaluation metrics: these tell us how well the current model performs on validation data. For example, mean absolute error for regression tells us how far the predictions are on average from the true values.

Predicting medical costs: loading the data

Every machine learning pipeline starts with data and a task. Let’s take a look at the Medical Cost Personal Datasets dataset, which consists of seven columns with the following descriptions:

We would like to predict the individual medical costs (charges) given the rest of the columns/features. Since charges represent continuous values (in dollars), we’re performing a regression task. Think about the potential implications of using sex or bmi to predict what the individual insurance charges should be? Should they be even used for prediction?

Our data is in the .csv format and we load it with pandas:

Next, we split the data into features (independent variables) and the target variable (dependent variable):

The pandas shape property tells us the shape of our data — a vector of two values: the number of samples and the number of features. To check the shape of our dataset, we can do:

Or, to make things clearer:

To see a useful summary statistics of the dataset we do:

Data preprocessing: one-hot encoding and standardization

One-hot encoding of categorical features

Since neural networks cannot work with string data directly, we need to convert our categorical features (“region”) into numerical. One-hot encoding creates a binary column for each category. For example, since the “region” variable has four categories, the one-hot encoding will result in four binary columns: “northeast”, “northwest”, “southeast”, “southwest” as shown in the table below.

One-hot encoding can be accomplished by using the pandas get_dummies() function:

Split data into train and test sets:

In machine learning, we train a model on a training data, and we evaluate its performance on a held-out set of data, our test set, not seen during the learning:

Here we chose the test size to be 33% of the total data, and random state controls the shuffling applied to the data before applying the split.

Standardize/normalize numerical features:

The usual preprocessing step for numerical variables, among others, is standardization that rescales features to zero mean and unit variance. Why do we want to do that? Well, our features have different scales or units: “age” has an interval of [18, 64] and the “children” column’s interval is much smaller, [0, 5]. By having features with differing scales, the optimizer might update some weights faster than the others.

Normalization is another way of preprocessing numerical data: it scales the numerical features to a fixed range - usually between 0 and 1.

So which should you use? Well, there isn’t always a clear answer, but you can try them all out and choose the one method that gives the best results.

The name of the column transformer is “only numeric”, it applies a Normalizer() to the ‘age’, ‘bmi’, and ‘children’ columns, and for the rest of the columns it just passes through. ColumnTransformer() returns NumPy arrays and we convert them back to a pandas DataFrame so we can see some useful summaries of the scaled data.

To convert a NumPy array back into a pandas DataFrame, we can do:

Note that we fit the scaler to the training data only, and then we apply the trained scaler onto the test data. This way we avoid “information leakage” from the training set to the test set. These two datasets should be completely unaware of each other!

Neural network model: tf.keras.Sequential

Now that we have our data preprocessed we can start building the neural network model. The most frequently used model in TensorFlow is Keras Sequential. A sequential model, as the name suggests, allows us to create models layer-by-layer in a step-by-step fashion. This model can have only one input tensor and only one output tensor.

To design a sequential model, we first need to import Sequential from keras.models:

To improve readability, we will design the model in a separate Python function called design_model(). The following command initializes a Sequential model instance my_model:

name is an optional argument to any model in Keras.

Finally, we invoke our function in the main program with:

The model’s layers are accessed via the layers attribute:

Next lesson is learn Layers in Neural Network.