Neural Network Absolute Basic
Before we go build an Artificial Neural Network, let’s take a moment to appreciate our own real Neural Network.
The most extraordinary thing in the Universe is inside your head. You can travel through every inch of outer space and very possibly nowhere find anything as marvelous and complex and high functioning as the three pounds of spongy mass between your ears. -
The Body - By Bill Bryson
The great paradox of the brain is that everything you know about the world is provided to you by an organ that has itself never seen that world. The brain exists in silence and darkness, like a dungeoned prisoner. It has no pain receptors, literally no feelings. It has never felt the warmth of the sun or a soft breeze. To your brain, the world is just stream of electrical pulses, and out of this, it creates for you a vibrant, 3D, sensually engaging universe.
The Body - By Bill Bryson
There are more neuron connections in a single cubic centimeter of brain tissue than there are stars in the Milky Way.
The goal of this post is to develop some intuition behind a neural network. If you have the patience to read it end to end, you will understand the basic building blocks of a neural network - a Perceptron.
What comes to mind when you think of a Neural Network?
The heart of deep learning is this wonderful object called a Neural Network. You might have heard this so many times, that is has become a cliche: Neural Networks vaguely mimic the process of how the brain operates, with neurons that fire bits of information. Sounds unbelievable, but take a look at this image, this is how a neural network looks.
As a matter of fact, this one here is a deep neural network. It has lots of nodes, edges and layers. Information coming through the nodes and leaving, it’s quite complicated. But when you look at Neural Networks for a while, you will realize that it is much simpler than that.
So, when I think of a neural network, this is the image that comes to my mind (top). Imagine your kid playing with some random blue/red sea shells on a beach. You ask your kid to draw a line to separate out the blue and red shells and your kid draws this line (top image). That’s it, that’s exactly what a neural network does.
Given some data, in the form of blue or red points, the neural network will look for the best line that separates them. And if the data is a bit more complicated (bottom right), then we will need a more complicated algorithm - like a deep neural network, that will find a more complex boundary that separates the points.
Understanding Classification model
So, let’s start with one classification example. Let’s say we are the admissions office at a university and our job is to accept or reject students. So, in order to evaluate students, we have two pieces of information, the results of a test and their grades in school. So, let’s take a look at some sample students.
We’ll start with Student 1 who got 9 out of 10 in the test and 8 out of 10 in the grades. That student did quite well and got accepted. Then we have Student 2 who got 3 out of 10 in the test and 4 out of 10 in the grades, and that student got rejected. And now, we have a new Student 3 who got 7 out of 10 in the test and 6 out of 10 in the grades, and we’re wondering if the student gets accepted or not.
So, our first way to find this out is to plot students in a graph with the horizontal axis corresponding to the score on the test and the vertical axis corresponding to the grades, and the students would fit here. The students who got three and four gets located in the point with coordinates (3,4), and the student who got nine and eight gets located in the point with coordinates (9,8).
How do we decide?: And now we’ll do what we do in most of our algorithms, which is to look at the previous data. This is how the previous data looks (see fig above). These are all the previous students who got accepted or rejected. The blue points correspond to students that got accepted, and the red points to students that got rejected. So we can see in this diagram that the students who did well in the test and grades are more likely to get accepted, and the students who did poorly in both are more likely to get rejected.
This data can easily be separated with a straight line, shown below. It seems that most students above the line get accepted and most students below the line get rejected. So, this line is going to be our model. This model makes a couple of mistakes since there are a few blue points that are under the line and a few red points over the line. But, we will not care about those just yet. We can say that it is safe to predict that if a point is over the line, the student gets accepted, and if it is under the line, the student gets rejected.
So based on the model, we look at a new student and see whether they lie over/under the line and make our decision.
Big Question: How do we find the line?
We can eyeball it, but the computer can’t. There are algorithms that will find this line, not only for just this example, but for more general and complicated cases.
We will start by labeling the horizontal axis (test scores) as
X1 and the vertical axis (grades) as
X2. So, this boundary line that separates the points is going to have a linear equation, say:
2X1 + X2 -18 = 0. What does this mean? This means that our method for accepting or rejecting students simply says the following. Take this equation as the
Score = 2*Test + Grade - 18 . If
Score > 0 :Accept and
Score < 0 :Reject.
So, that linear equation is our model!
In the more general case, our boundary equation will be of the form:
w1x1 + w2x2 + b = 0. It can be expressed in vector notation as
WX + b = 0, where
W = (w1, w2) is the weight vector, and
X = (x1, x2) is the feature vector. We simply take the product of the two vectors.
Xis the input and
Wis the weights and
bas the bias.
yis the label that we are trying to predict. If
y=1student is accepted.
y-hat is our prediction, it is what the algorithm predicts the label will be. In this case, when the algorithm predicts
y-hat is 1, the student gets accepted, which in turn means the point lies above the line. And similarly, if the algorithm predicts
y-hat is 0, then the student gets rejected, which means the point lies below the line.
Goal is make y-hat and y close enough
Let’s introduce the notion of a perceptron which is the building block of a Neural Network. What you will see is
Perceptron is just an encoding of the mathematical equation in a graph.
The way we build the perceptron is quite simple. Take the data and the model that we just fitted and put it inside a node. Next, we add nodes for the inputs:
grade. Here, we can see an example, where
Test = 7 and
Grades = 6. Now, what the perceptron does is, it plots the point
(7,6) and checks if the point lies above or below the line. If it is in the positive zone, then it returns a Yes, and if it is in the negative area, it returns a No.
So, let’s recall that our equation is
Score = 2*Test + 1*Grades - 18 and that our prediction consists of accepting the student if the
Score >= 0 and rejecting if
Score < 0. These weights
2, 1 we will label the edges coming from their corresponding nodes.
And the bias can be represented as a node with value 1 and its weight will be
-18. Then what the node does is it takes the sum of the product and checks if the resulting score is greater than 0. If so, it will output a
1 indicating Yes and if it is less than 0, then it will output a
More generally, this is how the nodes look:
Then the node calculates the linear equation
Wx + b. This node then checks if the value is greater or equal to 0, and if it is, then it returns a 1.
Note that we are using an implicit function here, called the Step Function.
The Step function returns a 1, if the input is positive or zero. And returns a 0 if the input is negative.
So, in reality, these perceptrons can be seen as a combination of nodes, where the first node calculates the linear equation on the inputs and the weights, and the second node applies the step function to the result.
More generally, these perceptron’s can be graphed as follows:
In the future, we will use different step functions.
We started by first finding a line that best divides your data, and then developed a mathematical notation for representing that line in n-dimensions. Next, we introduced the notion of a perceptron with its nodes and weights and finally leart about one simple activation function - step function.