Math for Deep Learning

Introduction

In this lesson, we hope to clarify the fundamental concepts that make deep learning possible, so you can build your own learning models with more precision than “pour the data into this big pile of linear algebra.”

Scalars, Vectors, and Matrices

To start, let us go over a couple of topics that will be integral to understanding the mathematical operations that are present in deep learning, including how data is represented:

Scalars

A scalar is a single quantity that you can think of as a number. In machine learning models, we can use scalar quantities to manipulate data, and we often modify them to improve our model’s accuracy. We can also represent data as scalar values depending on what dataset we are working with.

x = 5

Vectors

Vectors are arrays of numbers. In Python, we often denote vectors as NumPy arrays. Each value in the array can be identified by its index (location within the array).

x = np.array([1,2,3])

Matrices

Matrices are grids of information with rows and columns. We can index a matrix just like an array; however, when indexing on a matrix, we need two arguments: one for the row and one for the column.

x = np.array([[1,2,3],[4,5,6],[7,8,9]])

Tensors

Scalars, vectors, and matrices are foundational objects in linear algebra. Understanding the different ways they interact with each other and can be manipulated through matrix algebra is integral before diving into deep learning. This is because the data structure we use in deep learning is called a tensor, which is a generalized form of a vector and matrix: a multidimensional array.

A tensor allows for more flexibility with the type of data you are using and how you can manipulate that data.

Matrix Algebra

The following gifs walkthrough matrix multiplication, addition, and transpose. You can perform element-wise operations on tensors using matrix algebra as well.

Matrix Addition:

Scalar Multiplication:

Matrix Multiplication:

This is the most complicated, so spend time understanding how it is done.

Transpose:

This is all of the matrix algebra we need to proceed with the rest of our deep learning investigation! These concepts are the fundamental building blocks of why deep learning models are so powerful. When we are training our models, we are performing operations on tensors. This data is analyzed, manipulated, and shaped by the matrix algebra we have quickly gone over.

Neural Networks Concept Overview

Let’s take a look at the journey our inputs take inside of a neural network! By an input, we mean a data point from our dataset. Our input can have many different features, so in our input layer, each node represents a different input feature. For example, if we were working with a dataset of different types of food, some of our features might be size, shape, nutrition, etc., where the value for each of these features would be held in an input node.

Besides an input layer, our neural network has two other different types of layers:

Hidden layers are layers that come between the input layer and the output layer. They introduce complexity into our neural network and help with the learning process. You can have as many hidden layers as you want in a neural network (including zero of them).
The output layer is the final layer in our neural network. It produces the final result, so every neural network must have only one output layer.

Each layer in a neural network contains nodes. Nodes between each layer are connected by weights. These are the learning parameters of our neural network, determining the strength of the connection between each linked node.

The weighted sum between nodes and weights is calculated between each layer. For example, from our input layer, we take the weighted sum of the inputs and our weights with the following equation:

weighted_sum = (inputs ⋅ weight_transpose) + bias

We then apply an activation function to it.

Activation(weighted_sum)

The two formulas we have gone over take all the inputs through one layer of a neural network. Aside from the activation function, all of the transformations we have done so far are linear. Activation functions introduce nonlinearity in our learning model, creating more complexity during the learning process.

This is what makes activation functions important. A neural network with many hidden layers but no activation functions would just be a series of successive layers that would be no more effective or accurate than simple linear regression.

An activation function decides what is fired to the next neuron based on its calculation for the weighted sums. Various types of activation functions can be applied at each layer. The most popular one for hidden layers is ReLU.

Others commonly used, often for the output layer, are sigmoid and softmax. You will learn more about these functions as you use them later in this course.

Let's Visualize this process

Let’s bring all of these concepts together and see how they function in a neural network with one hidden layer. As you scroll over each section, you will see the inputs/weights/calculations associated with it and see how inputs get from the starting point and make their way to the end!

The process we have been going through is known as forward propagation. Inputs are moved forward from the input layer through the hidden layer(s) until they reach the output layer.

Next course is to learn the fundamental of every model in deep learning