Deep learning typically involves multiple layers of processing and uses algorithms inspired by the structure and function of the brain
What is deep learning?
Deep learning is a family of techniques within machine learning that can process vast amounts of data to find relationships and patterns. ‘Deep’ refers to the organization of neural networks into multiple layers, meaning that there are many steps between the input and output.
With the power of modern hardware and the availability of data, neural networks are recognized by many AI researchers as offering the **best performance in key pattern-matching activities**, including perceptual activities, such as computer vision and speech and language processing.
Deep learning techniques are typically referred to as neural networks due to early work on modeling neurons within the brain – yet the resemblance to such structures is now considered **superficial**.
Within a very short time, deep learning, which incorporates neural networks, has overtaken established machine learning techniques in many AI specialisms. The search capabilities of Google, for example, are reliant on many of the cutting-edge technologies created by AI leaders at DeepMind as part of its commitment to *“solving intelligence, to advance science, and benefit humanity.”*
What is a neural network?
A neural network is not a physical network, like the connections between neurons in the human brain. Instead, it is best defined as a **complex mathematical object** – a set of computational algorithms able to learn and recognize patterns in data.
Each network is built from a set of **artificial neurons or nodes** – algorithms specialized in recognizing aspects of the data that the network is designed to handle. For example, a visual system is likely to look for features such as straight lines, combining them at higher levels to identify edges and increasingly complex patterns, such as a ball, car, face, and so on.
During the training phase, answers from the network are compared with those of a human, and the difference is fed back, changing weightings (strengths of connections between the neurons) until, over time, the network regularly achieves the correct answer.
Adaptations and improvements following training improve pattern matching and reinforce appropriate connections consistent with learning.
Onions, ogres, cakes and ... deep learning?
Deep learning typically refers to a **layered approach to learning**. Indeed, complex algebraic circuits are layered to form multiple steps between input and output, resulting in processing powers far superior to linear and logistic regression methods with shorter paths.
**Simple feedforward networks only have connections in 1 direction**. Each layer of nodes multiplies its input by ‘weights,’ adds a bias, and applies a non-linear activation function to the weight before passing its results onto the next layer within the network.
Layers that sit between the input and output layers are called ‘hidden layers.’ According to the ‘universal approximation theorem’ a sufficiently large feedforward network with appropriate activation functions and enough hidden units can achieve any desired level of accuracy.
To improve prediction in neural networks, **backpropagation algorithms**, along with plenty of complex maths, adjust weights backward from output toward input, narrowing the difference between system outputs and desired outputs. Over time, it gets nearer to the correct answer.
Convolutional neural networks
When analyzing an image, the adjacency of pixels really matters. And yet, if we were to train with a fully connected network, it wouldn’t matter if the pixels had been mixed up or undisturbed – the results would be the same.
Another issue is that a typical one-megabyte image and a network where the input was fully connected to the first hidden layer would have 9 trillion weights – that’s huge, both in terms of memory and processing power.
One solution to this problem can be found with **Convolutional Neural Networks (CNNs)**. CNNs can take account of the spatial relationships between pixels, and therefore can understand the complexity of an image much better than a traditional neural network.
Convolutional networks and kernels
Convolutional Neural Networks (CNNs) use ‘kernels’ to contain spatially-related local information, at least in the early layers.
A **kernel** is a filter that moves across an image, and captures the features within a local area. For example it might extract the spatially related information.
The word ‘convolution’ refers to the **process of applying the kernel to the pixels of the image**.
The first layer, known as the ‘convolutional layer,’ acts as a filter for the scanned image. Next, a pooling process summarizes sets of adjacent units from the previous layer, extracting dominant features that are positionally and rotationally invariant – meaning, for example, it recognizes gestures in multiple positions and angles.
Recurrent networks
You are using ‘**recurrent neural networks**’ (RNN) more than you realize. RNNs are a type of artificial neural network that use sequential or time series data, and they are found in everything from language translation to image recognition. The reason is that **they feed intermediate or final outputs back into their own inputs**, so are very good at modeling sequence data.
It means that an **RNN is a dynamic network with its own internal state or memory**, with knowledge of where it has come from and the ability to predict where it is going. For example, if you have multiple snapshots of a ball rolling down a hill, you should have a better chance of determining its origin and its destination than if you had just one snapshot.
Additionally, each ‘cycle’ has a delay, with inputs from an earlier time affecting the network’s subsequent response to its current input. Several specialized RNNs have even been created to preserve information over multiple time steps, extending the duration of its memory.
Training neural networks
To train a neural network, we must modify its parameters to minimize its **’loss function’** – the word loss refers to the difference between the expected outcome and the outcome that results from the machine learning model.
A neural network is typically initialized with random weights when we start out. In turn, its results will be poor, and loss function high. Over time, with training, and as we fine-tune the network’s weights, the results and the loss function improve. ‘Gradient-based optimizers’ help minimize the loss so that the model performs better by back-propagating error information from its output layer to its hidden layers.
Most of us don’t realize that, when we use our smartphones or the internet, **we are most likely adding data to a deep learning system** – potentially one we unknowingly trained in the first place.
Generalization
While training our neural network is crucial, we shouldn’t forget that our goal is to ‘generalize’ to a new set of previously unseen data. This involves several key stages.
We begin by identifying the right **network architecture**, one that generalizes well – taking into account that this may vary depending on the data type. For example, convolutional architectures generalize well on images, with feature extractors functioning across a spatial grid. Limiting the specificity of the model (known as **regularization**) and adding penalties also aid generalization, improving the overall training of the neural network.
Finally, ‘dropouts’ reduce the test-set error of a network, applying one step of back-propagation by switching off a randomly chosen set of units. In effect, we are adding noise during training time, to help the network become more robust and forcing it to consider multiple features rather than just one.
Unsupervised learning in neural networks
As with other forms of machine learning, we can train neural networks without labeling each example with a value for the target function.
Unsupervised learning algorithms take a training set of unlabeled examples and attempt to learn new features that make identification easier. They then discover a generative model, usually in the form of a probability distribution, to identify future data.
After all, this is more like the way our brains work. If a young child is shown a hippopotamus, they can recognize them in future situations – they don’t need to sit and look through thousands of zoo pictures.
**Unsupervised learning removes the need for labeling lots of data, as does transferring the experience of one learning task to help another.**
Deep learning applications
Deep learning has proven beneficial in solving problems that are challenging for more traditional AI approaches. Indeed, the success of the *AlexNet* deep learning system showed the potential of neural networks in the 2012 ImageNet competition, when it was tasked with learning 1.2 million training images in 1,000 categories. Its error rate was only 15.3% compared with the next best system – at more than 25%.
And that was only the beginning. Since then, improved network design, training methods, and computing power have dropped the error rate at the top to **less than 2%**. Not only does that seem good, but it’s even below a typical human failure rate of 5%.
Such accuracy is valuable in multiple domains, particularly in self-driving cars, where real-time, near-perfect accuracy is crucial. CNNs are particularly adept at recognizing different road parts, analyzing moving images, and making appropriate decisions.
Deep learning potential
If you take 2 passport photos of the same person seconds apart, they will look almost identical to a human observer. And yet, at a pixel level, they are completely different. A normal computer would see them as 2 very different images. Deep learning neural networks can identify the features that show that both are pictures of the same person.
Through being able to spot patterns, even ones that humans miss, **deep learning is helping with everything from robot vacuum cleaners to creating musical compositions.**
Indeed MuseNet, a deep neural network, is creating 4-minute long musical compositions for up to 10 different instruments. And yet, it has not been programmed to understand music; instead, through unsupervised learning, it has discovered harmony, style, and rhythm.
It is even possible for ‘human’ composers to work with MuseNet, asking them to create compositions in the style of everyone from Chopin to Taylor Swift. As a tool to support human creativity, we are only beginning to scratch the surface of what AI can do.