Neural Networks have always been sort of a black box when it comes to it’s implementation, and how it produces good results. I came across some material that shows visually, how the neural networks morph the problem space so that they are separable.
Simple Data
Here’s a sample graph that is not linearly separable:

When we try to use a linear model to discriminate the two data, we get a poorly separated model:

Neural Networks, with the interactions of their hidden layers and nodes, are able to learn more complex information about the graph to plot a non-linear separation:

What a Neural Networks does is that it warps the space of the problem so that it becomes more separable. The hidden layers in the network transforms the problem space by representing it in a different way

By warping the problem space with the hidden layers, we see that it’s able to linearly separate the two distributions. That’s pretty cool! So what the neural network is doing is finding the most optimal way to represent the problem that is discriminative.
So what happens if the data distribution is too complex, or your neural network model is too simple (too shallow) that it can’t properly represent the data?
Spiral Data
Given a complex data set that resembles a spiral shape, and a neural network model that is too simple, we can see it struggling to find a representation that is separable. This means that there is not enough hidden layers and hidden nodes to transform the data. We need to go deeper!

Here’s the same spiral graph, but with enough hidden layers and nodes to transform the spiral data to a separable space. We can see the model separating the data very clearly

More Complex Data
In the last example, we see a more complex example, and see how a neural network can separate the data.

Given a circular topology data, a shallow neural network will have difficulties trying to separate the data from the inside and outer ring. We see it trying to pull apart the data like how it did with the spiral data, but it fails to do so

By introducing more hidden layers and nodes and going deeper, we see that the data is able to be extracted out into another dimension, making it separable!

Conclusion
In this post, I wanted to show how neural networks warp the space of the data make them separable, and how a shallow network might fail to perform well.
By adding more hidden layers and nodes, we are able to morph and warp the space into different dimensions, representing them differently and making them discriminative
Leave a Reply