Further to last week's fun kickstart into the world of tensor flow, I'm excited to continue the journey today, and I hope you are too. Let's buckle up for the ride.

Because of this, things get a bit deeper this time, so take your time over it, boot up your lappy, and enjoy the learning experience :)

**Neural Networks**

Sorry Sam I've completely forgotten what we are doing.. What's a neural network?!

Neural networks are a type of machine learning that is used to recognize relationships between vast amounts of data.

Despite the name (and lots of talk suggesting so), it does not work the way a human brain works. At best, you could say it is inspired by our understanding of how the brain works.

Last time we got ourselves looping through a series of beautiful digitized numbers called the MNIST data set. In case you missed the introduction, grab this link below :

Now that we know and have seen the data we are using to train our neural network, we can start to build it.

**Let's start the modeling**

This piece of code builds the model your neural network will train itself on. There is quite a lot going on here, so let's take the time to understand each argument.

```
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
```

Hmm, why a sequential model, and what are these layers, and just - what is all this about!?

Digging in...

**Layers**

This model has a bunch of layers, why? The building blocks of a neural network are layers. There are three types of layers in a neural network:

**Input layer**: The first layer that receives input data and passes it on to the next layer.**Hidden layer(s)**: The intermediate layer(s) that perform calculations to find hidden features and patterns. This is where the learning happens in a neural network.**Output layer**: The last layer that produces the final output or prediction.

Each layer performs a specific function, and the model encapsulates the input, the output, and the hidden layers.

You can see from the list above - there are four layers in this network.

Flatten, Dense, Dropout, and, err, Dense again.

Flatten is our Input layer.

The second Dense is the output layer.

We'll get into what these mean shortly.

**What is the Sequential model?**

When we talk about models, we're talking about the type of structure a neural network can use.

There are a number of other neural network structures/models, all of which we will come across at a later time (Recurrent, Convolutional, Multilayer Perceptron etc.) For now, the best way to think of it is that there isn't one type of neural network, there are lots of flavours depending on what you want to do. Each type of model has strengths and weaknesses dependent on the task at hand.

The sequential model is a type of neural network that is good for processing data that has a clear order or sequence, such as text, speech, or time series. It can learn from the previous inputs and outputs and capture the temporal dependencies in the data. (i.e. the data involves a relationship with time).

A sequential model can also handle variable-length inputs and outputs, which makes it suitable for tasks such as natural language processing, speech recognition, and sequence prediction. Oooh, Natural language processing you say! I'm interested in that :D ... cool.

Okay, so talk to me about the layers.

**Flatten**

Flatten (input_shape = (28, 28))

So this one is a bit of a strange concept... As humans, we see the image below and it makes sense to us that this is a 7. We can see the shape of it through the 28 * 28 two-dimensional image. It aligns with our learned structure of what a seven looks like.

The flatten function turns this two-dimensional image, into a one-dimensional vector. A stream of values between 0-255.

Because that is how our machine is going to see it, not as the 7 we see before us, but as a stream of digits 784 characters long. (784 because of 28 * 28).

Fun huh, that's the input :).

**Dense **

Dense(128, activation='relu')

A bit of unpacking is needed for this one. Ha ha. I seriously find myself quite funny with these puns. Maybe too funny.

A dense layer is a type of layer that connects each input to each output with a weight.

This line of code means that you are creating a dense layer with 128 neurons and using the rectified linear unit **(ReLU)** activation function.

Wait, what? Let's break it down a bit, don't worry if it doesn't make sense at first. It's a tricky beast to get your head around. Just read it a couple of times, and if it doesn't make sense come back to it later.

A ReLU activation function is a type of function that returns the input if it is positive, and zero otherwise.

**Why 128 neurons? **

This is typically a power of 2 number that is related to how complicated the question we are trying to answer is and how much data we have to train it. There is some logic behind why we use 128 and not say 64 or 256, but this is a topic for another day :)

**What is an activation function?**

An activation function is a function that determines the output of a neuron in a neural network based on the input it receives.

It helps to introduce non-linearity into the network, which allows it to learn complex patterns and relationships in the data. An activation function also decides whether a neuron should be activated or not, by applying a threshold to the input.

If the input is above the threshold, the neuron fires, otherwise it remains inactive.

There are many types of activation functions, but we'll be focusing on the ReLU to begin with.

**Tell me more about ReLU**

ReLU stands for Rectified Linear Unit and it is defined as: ReLU(x)=max(0,x)

This means that ReLU will output the input value if it is positive, and zero if it is negative.

For example, ReLU(5) = 5, ReLU(0) = 0, and ReLU(-5) = 0.

ReLU is the most commonly used activation function in neural networks, and we use it to provide the following properties to our models:

It is easy to compute and has a simple gradient : What do you mean by gradient?! A simple gradient in a neural network is a measure of how much the error of the network changes with respect to a small change in one of the weights.

It helps to overcome the vanishing gradient problem, which occurs when the gradients become very small and the network stops learning. This is important and we will talk more about this later

It introduces non-linearity and sparsity in the network, which can improve performance and generalization. Sparsawhaty?

**THISS ISS SSPPAAARRR..sity.**

Okay let's pretend I understood any of that, what does it mean by non-linearity and sparsity?!

**Sparsity **

Sparsity means that most of the values in a neural network are zero or close to zero. For example, if you have a neural network with 100 neurons and only 10 of them are activated (have a non-zero output) for a given input, then this is a sparse network.

Sparsity helps to reduce the number of parameters and computations in a neural network, which makes it more efficient and less prone to overfitting.

Again overfitting is important and we will come back to that soon

Sparsity also helps to focus on the most relevant features and patterns in the data and ignore the noise and redundancy.

Non-linearity means that the output of a neural network cannot be reproduced by a simple linear combination of the inputs, it allows the neural network to model more complicated and realistic relationships between the input and the output.

Moving on...

**Dropout**

Dropout(0.2)

The dropout layer has a rate parameter that determines the fraction of the input units to drop, usually between 0 and 1. A higher rate means more regularization, but also more information loss.

The dropout layer is a layer used in neural networks to prevent overfitting. It randomly nullifies the contribution of some neurons to the next layer, as if they were not part of the network. This reduces the dependency of the network on specific neurons and makes it more robust

Wait does this have something to do with overfitting then?

YES! It does! :D

Dropout and overfitting are connected because dropout is a technique to prevent overfitting in neural networks.

Overfitting is a problem that occurs when a neural network learns the noise or specific details of the training data and fails to generalize well to new or unseen data.

Dropout is a way to regularize or simplify the neural network by randomly dropping out some of the neurons or inputs during training. This forces the network to learn more robust and general features that are not dependent on a few specific neurons.

**Dense **

Dense (10)

**Dense AGAIN?! Why, and no ReLu function this time?!**

The reason why the last dense layer does not have a ReLU activation function is that it is the output layer of the neural network, and it needs to produce a different kind of output than the previous, hidden layers.

The output layer of this neural network has 10 neurons, corresponding to the 10 possible classes of digits (0 to 9).

The output layer converts the logits (raw scores) of each neuron into probabilities that sum up to 1.

The function allows the network to output a probability distribution over the 10 classes, indicating how confident it is that the input belongs to each class.

For example, if the output layer produces [0.1, 0.2, 0.05, 0.05, 0.1, 0.1, 0.3, 0.05, 0.05, 0.0], it means that the network thinks that the input has a 30% chance of being a 6, a 20% chance of being a 1, a 10% chance of being a 0, 4, or 5, and so on.

The ReLU function, on the other hand, is more suitable for hidden layers, where the network needs to learn non-linear features and patterns from the input data. The ReLU function also helps to avoid the vanishing gradient problem, which is not an issue for the output layer.

Therefore, the output layer does not use a ReLU activation function.

**Okay, that's our model.**

So that's our model in detail. Our next line of code is to use the model to start making predictions about the data.

```
predictions = model(x_train[:1]).numpy()
predictions
```

If you run that you should see something like this...

```
array([[ 0.3977482 , 0.05716583, 0.2599235 , -0.2866959 , -0.8014836 ,
0.4897349 , -0.54852885, -0.20476958, -0.03569295, 0.16433768]],
dtype=float32)
```

You'll notice there are 10 outputs here, which is exactly what we would expect as our final Dense argument specified 10.

What these random values mean, is another story, however!

**Summary**

So that was a pretty in-depth tour of the model creation.

Next time we will look at what these outputs mean, as well as start to train and evaluate our neural network!

Exciting times. See you soon :D

## Comments