Neural Networks
What are neural networks?
Earlier, we touched upon neural networks, and mentioned how this key innovation was a major factor in the rise of modern AI. Now, it's time to look at this technology in more detail.
It's based on an idea that first cropped up in the 1940s – that's around the same time that Alan Turing was active. It was put forward by Warren McCulloch – a professor of psychiatry – and Walter Pitts – a student mathematician.
Their idea was this: neurons in the brain could basically be viewed as binary gates, just like the ones in a computer. By extension, if you built a man-made network of binary gates, connected together with great complexity, it would potentially be able to perform the same processes as a brain.
About a decade later, in 1957, an American psychologist called Frank Rosenblatt managed to put the ideas put forward by McCullock and Pitts into practice.
He constructed a network of node-like neurons, which he referred to as the Mark I Perceptron. Incredibly, this network used photocells to 'look' at images, and recognize objects within them.
The Mark I Perceptron was only one layer thick – imagine a 2D net of nodes, as opposed to the 3D web of a real human brain. This limited the number of connections between nodes, which in turn limited the model's potential for human-like cognitive processes.
But nowadays, thanks to hundreds of innovations, we've found ways to build multilayer networks. They're still a long way away from the complex connections of a human brain. But they have enough connections to perform some pretty powerful processes.
It's worth pointing out that a neural network isn't usually a physical object. These artificial neurons aren't physical nodes linked together in a physical web.
Instead, it's a computational model: a set of digital nodes in a digital web. Just think of it like a piece of software. You can even download some neural networks, and install them on your personal computer.
Physical neural networks (PNNs) are occasionally used as well. But as you can probably imagine, they're much more fiddly to build than their digital counterparts, and harder to run at the equivalent level of complexity.
Layers
The layers in a modern neural network are usually arranged like this. You have an input layer, one or more hidden layers, and an output layer.
When you ask an AI to do something, you're interacting with the input layer. For example, you might show it a photo of an animal, and ask it "is this a cat or a dog?"
The input layer will send that data down into the hidden layers. As this data bounces through the web of nodes, the network is effectively 'thinking'. Assuming this model was designed to identify cats from dogs, it will try to work out what kind of animal is present in your photograph.
Eventually, the data hits the output layer. "It's a cat," the AI announces.
Interestingly, while each hidden layer might have hundreds of nodes, an output layer could have as few as two or three.
For example, in that example model we talked about, which tells the difference between cats and dogs, there are only three possible outputs: "it's a cat", "it's a dog", or "it's neither". All that 'thinking' in the hidden layers is just filtering to one of those options.
Depending on the nature of the input, the network will take a different path through the hidden layers. If you fed it a photo of a greyhound, for example, it would 'think' about that photo in a different way than it might think about a photo of a chihuahua.
But both paths would still lead to the same output node. The AI would announce: "It's a dog".
That cat/dog model is just a simple example. Another neural network might have hundreds of nodes in the output layer. It depends how many possible outputs the model needs to produce.
It's the same with the number of hidden layers. A simple neural network might only have one, but a more complex model might have hundreds. As a general rule, more hidden layers mean more possible paths through the web of nodes, and more powerful decision-making processes.
This principle is what brought us some of the world's most famous AI models, like AlphaGo and ChatGPT. Supposedly, the latest version of ChatGPT (GPT-4) uses a neural network with 120 hidden layers, and an enormous number of nodes.
Parameters
So, a neural network is a series of layers. These layers are made of interconnected nodes.
And here's an important thing to add: every connection between two different nodes has a numerical parameter attached to it. This numerical parameter is what scientists call a weight.
As the AI works its way through the hidden layers, following connections from node to node, these different weights will help it decide which node to jump to next. It’s more likely to choose a connection with more weight – that’s how it’s programmed to behave.
You can think of the connections in a neural network like a tangled forest. When the network has to 'think', it's like following a path through that forest.
This path has lots of different branches. Some of them are narrow and overgrown, while others are wide and open. If you were walking, you'd probably take the open branch, just as an AI is more likely to choose a connection with more weight.
This process is essentially how a neural network makes decisions. Whichever path it takes through the web of nodes will result in a different output.
Weights aren't the only type of parameter that you'll find in a neural network. The other main one is something called a bias.
Unlike weights, which are attached to the connections between nodes, a bias is attached to the nodes themselves. They're basically there to give the network an extra little nudge in one direction or another.
Say you had two possible connections, each with a weight of 1. The network might struggle to decide which connection to follow. But the bias nudges it down the second connection. To continue with that forest analogy, it's like a little signpost: "if in doubt, go here."
Biases can also be negative. "If in doubt, do not go here."
Deep learning
Now, you might remember what we said at the start. Neural networks were the driving force behind the modern AI spring. But why are these models so important?
As it happens, these webs of nodes are extremely good at learning.
This learning is most effective when a neural network has lots of hidden layers. Deep learning is the official name for it. 'Deep' because of all those layers.
Remember: when we say that a machine is 'learning', we really just mean that numerical parameters are changing. And that's exactly what happens with a neural network: the model is able to adjust its weights and biases.
Once a neural network has performed a task, it can check the loss function afterwards. For example, if it was solving a complex math problem, how close did it get to the right answer?
After checking the loss function, the neural network uses a technique called backpropagation. This is a special algorithm which travels back up the path that the AI just took through all those layers of nodes.
Along the way, it adjusts the weights and biases according to the size of the loss function. “Actually, this was a bad path to take – let's lower the weight on this one, and this one, and bump up the bias right here."
With plenty of time, and thousands of iterations, a neural network can finetune its parameters to the point that it starts reliably following the most effective path.
That's not always the same path. Different inputs will require different paths. That's what the model is ultimately learning – for every single input it could possibly receive, it needs to know exactly which path to take in order to produce the best and most appropriate output.
Imagine, for example, that you wanted your AI to tell the difference between types of fish. You input thousands of photos of different fish, and it learns the best path for each of them. If it sees a fish with these markings, it should take this path. If it sees a fish with these fin-shapes, it should take this path.
Eventually, it will choose the right path for every input you throw at it.
Just to be clear: neural networks aren't the only type of AI model which is capable of machine learning. But as things stand, they have a couple of advantages over a lot of other approaches.
First of all, these networks are extremely versatile. You can train them to analyze data for you. You can also train them to play games, or control self-driving vehicles. You can train them to speak, or recognize images. The list goes on and on.
They're also extremely powerful, especially deep learning models. More layers mean more nodes, and more weight and biases. In other words, more detailed and complex ways for the AI to learn to behave.
According to some numbers leaked in 2023, ChatGPT uses a neural network with more than a trillion different parameters. Just imagine how many paths you could take through such a complex neural network.
Types of neural network
It's worth pointing out that there are a few different types of neural network.
Earlier we mentioned physical neural networks (PNNs). These are pieces of hardware, which use networks of physical nodes and connections, rather than the digital versions which are much more commonly used.
Another example is a recurrent neural network (RNN). This one is actually quite simple. In a classic multilayer neural network, data is passed from hidden layer to hidden layer in one direction. Let's say from layer 1, to layer 2, to layer 3.
But in a recurrent neural network, the data will also loop back to previous layers. Effectively, this gives the network a memory – each loop reminds the previous layers what kind of data has already come through.
The loop-back function of an RNN is useful in loads of contexts.
Imagine, for example, that you want an AI to finish this sentence: "The color of the sky is [something]."
If it only remembers the final word ("is"), it might output something random like "yellow" or "tasty", which logically follows "is", but doesn't make sense in the context of the sentence as a whole.
If each word is looped back though, and 'remembered' by the network, it's more likely to give an answer that fits the context of the sentence as a whole: "The color of the sky is blue."
Here's another example. This time, imagine some satellite images of a hurricane out at sea. You want your AI to calculate whether the hurricane will hit any landmasses.
If you gave the AI just one image of that hurricane, in its current position, it would be hard to predict the trajectory.
But if you gave it ten images, showing the hurricane's progress from initial position to current position – and your AI could 'remember' these positions, in order – it would do a much better job predicting where the hurricane will go.
Any time you're working with a sequence of data – be it words, or images, or something else – a recurrent approach is often more effective than a classic neural network.
Throughout the rest of this pathway, we'll also encounter a few other types of neural network.
Another important one is a convolutional neural network (CNN). We'll take a proper look at CNNs when we get to our tile on Computer Vision, but for now, the main thing you need to know is that they're amazing at analyzing images.
One more type is an generative adversarial network (GAN). We'll be learning more about this one in our tile on generative AI – but it essentially works by taking a pair of neural networks, and tasking them to compete against one another.
There are plenty of other examples. Part of the reason why neural networks are so popular is the fact they can be used in so many different ways.