The first programmer
AI theory and concepts have a surprisingly long history, dating back to the earliest pioneers of computing in the 1800s. In fact, widely regarded as the first computer programmer and possibly the one with the fanciest job title ever, Augusta Ada King, Countess Lovelace, recognized that machines had applications beyond their calculations.
But she also identified existing limitations that would still hold more than a century later. In 1843, Lovelace said that the 4-ton analytical engine she was coding had “no pretensions whatsoever to originate anything […] it can do whatever we know how to order it to perform.”
It wasn’t until much later that humans would become better at telling machines what to do, or indeed before computers could take on and win at human-like challenges such as games.
Who was Alan Turing?
We cannot downplay the importance and legacy of Alan Turing – his work at Bletchley Park was vital to code-breaking during the Second World War, and his vision of computing and AI was in so many ways ahead of his time.
Turing provided a ‘recipe’ for how machines could take over the work of human ‘computers’ by being given a set of internal rules or algorithms. Such ‘Turing Machines’ became the template for almost all modern computers. After the war, in 1948, he described how ‘simple Boolean neural networks’, which consist of artificial ‘neurons’ that combine 2 inputs to create a single output and connected to form a pattern, could be educated through a series of on and off switches.
Yet, perhaps Turing’s greatest gift to AI was his 1950 paper, entitled Computing Machinery and Intelligence, which opened with the words, “I propose to consider the questions ‘Can machines think?’” It describes the ‘Turing Test’ that is still used today to assess whether we can really describe intelligence to a machine.
McCulloch and Pitts and the first neural network
Despite the vastly different backgrounds of 42-year-old Warren McCulloch and 18-year-old Walter Pitts, they became close friends when they met at the University of Chicago in the 1930s. From there, they went on to develop groundbreaking theories at odds with existing wisdom.
They believed the workings of the brain could be explained through logic and complex networks of neurons and synapses. Each artificial neuron in their model could be set to ‘on’ or ‘off’ – dictated by the presence or absence of stimulation from neighboring neurons.
Crucially they recognized that any computable function was capable of being computed by a network of connected neurons and that it had the potential to learn by modifying the connection strengths between them.
Based on an earlier paper by Turing, McCulloch and Pitts showed the immense computation power of simple elements connected in a network.
Inventing Artificial Intelligence
If you put enough clever people in the same place, at the same time, with the same impossibly ambitious goal, something incredible is almost inevitable. And that’s exactly what happened in 1956.
Computer scientists and information theorists, including Marvin Minsky and Claude Shannon, and 2 future Nobel prize winners, Herbert Simon and John Nash, gathered at Dartmouth College in Hanover to create the field of Artificial Intelligence.
In fact, while other terms also were offered up, the phrase ‘Artificial Intelligence (AI)’ stuck and a new discipline was born. But not only that, these passionate eggheads also came up with AI’s defining goals, including computer vision, machine translation, speech recognition, robotics, and machine learning.
And if that weren’t enough, a program known as the ‘Logic Theorist’ was demoed at the conference and is now considered the first AI program ever developed.
While, for some, the Dartmouth conference was considered a disappointment, with even the phrase Artificial Intelligence coming under criticism, many now recognize it as a vital signpost of things that were to come.
The 50s and 60s weren’t all rockabilly and hippies
As computers transformed from monster machines based on vacuum tubes to faster, smaller ones built from integrated circuits, their storage capacity and processing power exploded.
During this time, there was a proliferation of papers on books on the field of AI, mostly falling into 1 of 2 camps.
The first approach, led by Marvin Minsky, suggested that AI should be based on symbolic programming, lines of commands, and ‘if..then’ statements.
Such methods led to ‘Eliza’ becoming an unlikely hit and the world’s first chatbot in 1965. Created by MIT professor Joseph Weizenbaum, Eliza used basic concepts to emulate the dialogue of a psychoanalyst – convincing some clients that the program was indeed a real person.
Frank Rosenblatt heralded the second approach. He believed that AI should use systems similar to brain-like neural networks, which he called ‘perceptrons.’ And, while his Mark 1 Perceptron showed some potential, even using cameras to differentiate between 2 images, its single layer offered little cognitive ability.
In response, Minsky created a crude neural net from hundreds of vacuum tubes and the leftovers from a B-24 bomber. He used his book Perceptrons to criticize the work of Rosenblatt, and the idea of neural nets was shelved until the 1980s.
Winter is coming and a game of chess
In the 3 decades that followed the 1956 Dartmouth Conference, money, time, and resources were thrown at the ‘problem’ of AI. After all, its highly intellectual attendees had expected that major breakthroughs were just around the corner, promising great insights into AIs’ potential and answering questions that had long been unanswered – what goes on inside our heads?
And yet, it wasn’t forthcoming. Money, and even interest, hit rock bottom.
While interest in AI waned – known as the ‘AI winter’ – work continued in other areas, such as ‘expert systems.’ It wasn’t until the 1990s that early predictions started to come true. And, in 1996, a curious game of chess caught the attention of millions around the world.
The chess GrandMaster Garry Kasparov was beaten in 2 of 6 games during a match against IBM’s Deep Blue, which itself had the raw computing power able to process 200 million positions per second. In an interview sometime later, Kasparov said that what was worse was that neither he nor the AI had played well.
And yet, despite very little real intelligence or even the ability to learn from experience, Deep Blue’s win led to a significant shift in public opinion and, perhaps more importantly, in scientific thinking about the potential for AI.
The real world is messy
While the goals of AI have remained mostly the same, its approach has changed dramatically. Rather than trying to solve the whole problem from the top down, breaking it into ever smaller pieces, it has become increasingly clear that this may not be the best approach when dealing with the messiness of the real world.
In fact, relatively simple statistical tricks, plus a whole lot of data, can deliver behavior that is incredibly complex and appears intelligent.
Let’s take driving. You could spend the next few years writing out line after line of statements defining what to do and not to do, and you still wouldn’t capture its complexity and unpredictability.
You edge your car out at a junction based on information, experience, and what is happening around you. According to David Danks from Carnegie Mellon University, “a 4-year-old has more theory of mind than a driverless car.”
And yet, in 2009, in an attempt to reduce the 1.25 million worldwide deaths from car accidents, Google launched the self-drive car incorporating rounded car shapes to maximize the field of view for rooftop sensors, along with radar (LiDAR) and cameras detecting objects in all directions.
Elementary, my dear Watson
When, in 2007, IBM first came up with the idea of training their AI, Watson, to play a game show, let alone win it, there was some hilarity. After all, it had begun as a relatively blunt instrument, once answering ‘Wonder Woman’ when asked to name the first woman in space. Now that’s a Marvel film I’m yet to see!
And yet, undeterred, in 2011 ‘Watson’ took on the challenge of playing the US quiz show Jeopardy armed with a handful of rules, vast amounts of memory and processing power, and the capacity to crunch lots of data.
It had at its ‘fingertips’ 200 million pages of text, including newspapers, books, and encyclopedias. It was also programmed with probabilistic reasoning – the ability to parse human language and interpret word-play, including puns. It was ready!
Competing against 2 current champions, Watson did more than take the lead; it won first place – a prize of $1 million. That’s a lot of driverless cars for Watson to cruise around in.
2,500 years to become to master the game
While the challenge of winning a game show based on trivia was considerable, attempting to beat experts in a 2,500-year-old game called ‘Go’ with 10,171 potential layouts was to dwarf anything that had gone before.
And yet, in 2016, experts at DeepMind created the AI ‘AlphaGo,’ using reinforcement learning to choose actions that maximized game score, taking on revered South Korean Go master Lee Sedol on home turf. Despite a confident prediction by Lee of a landslide win, he suffered a devastating loss – 5-0.
Subsequently, in 2019, DeepMind created MuZero, which combines a learning model that focuses on the most critical aspects of the environment with AlphaGo’s powerful look-ahead tree search to master Go, chess, and shogi without using any rules – strategic ones or even how to play the games.
Incredible steps forward in natural language processing
AIs continue to develop at a pace. Natural language processing is one of the most exciting and topical research areas and is likely to impact how we interact with technology and our environment significantly.
Advances within ‘large language models’ make it possible to understand not only conversation and requests for actions to be performed but also comprehend the, sometimes incomprehensible, language of scientific papers.
Artificially Intelligent agents such as PaLM, DALL-E 2, GPT-3, and SPECTER can find semantically-related texts and even explain their relevance to searches. They are supported by transfer learning, leveraging existing labeled data from a related domain or task, and ‘transformers’ that use attention to boost the speed at which models can be trained.
These large language models have the potential to be excellent tutors, identifying and correcting learning paths and giving students what they need when they need it.