If you like reading tech news or articles, you've probably come across the term neural networks at some point. People love throwing this term around and treat it like a black box with the ability to perform any task, but there is much more to it. In this post, I will give an intuitive and formal explanation of neural networks. This post talks about what neural networks are, not how they are trained or why they work so well. I will dissect those concepts in later posts.
P.S. If you have already studied neural networks, I would recommend FORGETTING whatever you have learned before reading this. It'll help you get a new perspective.
Academics describe neural networks as a range of functions represented by directed acyclic graphs where vertices represent some kind of constrained functions, and edges represent the composition of those functions. This explanation is intuitively useless (although very important in a formal setting). When most people describe a neural network, they actually describe a fully connected neural network, which leads to confusion later on.
To answer the question of what exactly is a neural network, we take small steps
For those who don't remember, a family of functions is a group or set of functions with a similar characteristic feature. For example,
f(x) =
x
+2
represents a family of functions.f(x) = x + 2
andf(x) =
2x
+2
are part of the same family. Here parameterizes the family and ties all the functions together (The similar characteristic feature I mentioned).
A
.Parameter set refers to the set of values that the parameter(s) is/are allowed to take.
To answer this question, I will go on a small tangent; please bear with me. What if I told you that every single phenomenon or mechanism in this world has an underlying function? Better yet, what if I told you that it is possible to approximate these functions? Let me explain with an example:
Suppose I take all the pictures in the universe, and I have to decide whether that picture contains a car or not. To do so, I will look for the features of a car: 4 wheels, a medium-sized body, windows, headlights, etc. We look for these features because they make sense to us, it's how we have been taught to identify cars. But maybe there are more abstract features that we cannot identify, but are unique to the given task. Moreover, these features can be used by a machine or mathematical model to identify whether a car is present in the picture. So when I talk about the underlying function, I talk about a function that can extract these underlying features from the image and use them to identify the presence of a car. More often than not, these functions exist and are smooth and well-behaved.
Neural networks try to approximate these complex underlying functions by manipulating their parameters. I will not go into details about how because that requires its own blog, but for now, just take my word for it (Or do some research of your own). Now, let us try and formulate our very own definition of neural networks one more time
A neural network with a given architecture and parameter set represents a parameterized family of functions. Given a task T
, the neural network tries to approximate the underlying function responsible for T
by manipulating its parameters in accordance with a training algorithm A
.
So the next time somebody asks you what a neural network is (Yes, I mean in interviews as well), rattle off something like the above definition and I guarantee they will be impressed.
But in all honesty, there is still a lot that I have not said about neural networks as I only wrote this blog intending to create a general and easy-to-understand definition of them. I will not be explaining fully connected neural networks because there are plenty of materials on them already. However, I will be talking about the implications of different architectures, loss functions, and the representation perspective of neural networks in the future.
Until then, happy learning :)