What is Cognitive Load Theory, and How does it Explain Difficulties with Multitasking and Short-Term Memory?

How cognitive load can get in the way of learning, and how to harness it for your own good.

What term is used to describe the level to which elements of a piece of knowledge must be processed simultaneously?

Element interactivity

What term is used for problems with non-specific goals?

Goal-free problems

Sides of the Brain

Have you ever tried to learn something and failed because it suddenly got too hard? Have you ever given up after the introduction to a course in a new area, because you no longer understood what was going on?

The chances are you have – well over 90% of people typically drop out of an online course within the first few weeks.

Cognitive load theory suggests that people have a limited amount of cognitive resources, or mental processing power, which affects how much information they can take in and remember at one time.

The theory has been used to explain a variety of phenomena, including why people have difficulty multitasking and why many people complain about their short-term memory.

If you’ve ever walked into a room and immediately forgotten why you were there, or opened your phone to message someone and instead found yourself scrolling through Reddit 10 minutes later, then you’ve experienced what having too much cognitive load feels like.

To understand cognitive load, we need to first understand the difference between long-term and working memory, and how the two relate to one another.

Long-Term Memory and Schemas

The most influential researcher on cognitive load theory is the Australian psychologist John Sweller. Sweller’s most famous assertion is that **“If nothing has changed in long-term memory, nothing has been learned.”** It sounds simple, but many educational practices and assessment systems don’t take this basic principle into account.

A schema is an encoded pattern of knowledge that helps us organize information and make sense of the world. It does this by organizing knowledge and memories into groups, making it easier to categorize future knowledge. Schemas, at their most basic, could be thought of as stuff that you ‘just know’ in a particular area.

Say you get invited to a party, at a house you’ve never been to before. Despite having no knowledge of what will actually happen at this party, you still have a rough idea of what to expect – drinks, food, conversation, maybe some music.

This is because you have a schema of what constitutes a party in your head, based on prior experience. These schemas are what form the building blocks for new knowledge.

When you learn new information, your starting point is your pre-existing schemas about the topic. Adding to our long-term memory is a process of refining those schemas, and then building on them further.

Working Memory

When learning new things, we require both working memory and processing power.

Let’s start with working memory. Imagine you’re learning how to write basic code. Your teacher has just taught you how to write a ‘for’ loop, and also a ‘while’ loop. You need to retain this knowledge in your memory if you want to use it in the lesson.

Working memory is the primary structure that processes incoming information from the environment but it is very limited in capacity and duration. When presented with a list of items, most people can remember around 4 of them a few seconds later, and usually they can only retain those 4 for about 20 seconds.

So working memory would be the part of your brain that retained the ‘for’ and ‘while’ loop in your head until you came to use them. However, at the end of the lesson your teacher also gives you a problem to solve, without specifying how to use these loops. This is where mere memory isn’t enough – you need to use your brain’s processing power.

Processing New Information

Like working memory, your brain’s processing power can only handle small bits of new information at a time. In fact, we can only process 2 to 3 items of new information at once. This means that learning new things can be very hard, particularly when we have to remember several concepts at the same time, as is often the case in mathematics or the sciences.

Cognitive load is the term used to describe this twin problem – that, in terms of both processing and working memory, our brains can only handle so much information at once.

Holding information in your working memory is much like juggling. You can handle 3, or at a push 4 bits of information at once – any more will bring things crashing down. Sure, there are people out there who can juggle 9 burning chainsaws, just as there are people who can write quantum theory or play Rachmaninoff on piano.

But for most of us, 3 to 4 ‘balls in the air’ at once in our working memory is about the maximum cognitive load that we can handle.

Intrinsic and Extraneous Cognitive Load

There are 2 ways that information can create cognitive load in our brains.

Intrinsic cognitive load is caused by the nature of the information to be learned, irrespective of how it is presented or taught. This is learning that cannot really be simplified – the subject matter is inherently complex. For example, most people find mathematics, physics and chemistry as topics with the highest cognitive load.

Extraneous cognitive load comes from the way information is presented, rather than from the information itself. The art of good teaching is the art of minimizing this extraneous cognitive load. Fortunately, there is a well-established playbook to help with this. Let’s look at some techniques.

Element Interactivity

Working memory can store around 4 individual elements at the time. However, imagine you are learning a completely new topic – let’s say quantum computing.

You might start learning about qubits, vectors, computational states, the bracket notation, and superposition. Feeling lost already?

To explain anything about quantum computing, all of these terms need to be comfortably held in your working memory to understand just a simple sentence.

The above is an example of **element interactivity – the elements that must be processed simultaneously in working memory because they are logically related.** Like juggling, when we have more elements than we can handle, we have to let some balls fall.

Element interactivity can be reduced if the interacting elements are first taught separately. The best way to teach quantum mechanics would not be to hit the reader with a list of terms, as above, but instead to break down each of those terms first.

This would gradually allow the learner to build up their schema for these different concepts, before building on them with higher-level concepts.

Multimodal Information Processing

One aspect of our brain that actually is pretty good at handling multiple elements at once is our ability to process different modes of information.

**Our working memory is multimodal.** This means that there is a different processor in our brains that deals with what we hear (auditory information) and with what we see (visual information). One model of working memory describes it as having an audio loop and a sketchpad. If we are careful about the design, we can stick more than just 4 things into our memory using the combination of both auditory and visual channels.

The easiest way to create multimodal learning is to complement spoken teaching with visual cues. It’s easy for learners to process images at the same time as listening to a lecture – and doing so actually allows more information to enter into their working memory without the same damaging cognitive load. However, these should not replicate each other (i.e., audio should not be a narration of written text) as this causes a ‘redundancy effect.’

The Split Attention Effect

Imagine you are labeling a pie chart describing people’s favorite scent. If you would like for people to remember that 60% of people prefer floral, while 25% of people love woody and only 15% like oriental, then you should put the label of each of these percentages directly next to their segments of the chart.

Unfortunately, Microsoft Excel won’t do this by default – instead putting the labels down in the corner of the chart. This small difference can actually make remembering the information much harder. This is called the ‘split attention’ effect, and fixing it can be one of the quickest ways to make information more memorable.

Split attention occurs when **content forces learners to split their attention between at least 2 sources of information.** For learning, we need to mentally integrate disparate sources of information. Whenever possible, information should be presented so that disparate pieces of information are situated close together. Forcing the learner to find the links in a bunch of disorganized information creates major extraneous cognitive load.

Optimizing Problem-Solving for Cognitive Load

Limited working memory is a big obstacle when solving problems – it’s hard to keep all that information in there while you are also processing potential solutions. There are 2 ways to overcome this: worked examples and goal-free problems.

The first is to **provide worked examples** that show a step-by-step, expert solution to a problem for the learner to emulate. In other words, this is learning by copying.

Of course, the problem with this is that it is easy for learners to skip straight to the solution, without understanding the mechanics of how it was achieved.

One approach to prevent such passive learning is a paired approach of study and example, then solving a problem, or using completion problems – that is, partial worked examples where the learners need to complete some key solution steps on their own.

Goal-Free Problems

Another approach to problem-solving that avoids too much cognitive load is to create **a goal-free problem**. This means replacing a conventional goal-oriented problem with a problem with a non-specific goal.

For example, rather than asking someone a question such as ‘Here is a triangle, find the length of the hypotenuse,’ you would say ‘Here is a triangle, tell us everything you can about it from the information given.’

This style of teaching has 2 advantages – firstly, it allows learners to develop schemas more naturally, without having to cloud their cognitive load too much with the thought ‘I need to solve x problem.’

It also allows them to develop broader schemas, because they can’t work backwards to solve just one problem – they have to think about a broad range of possible problems and solutions.

Optimizing Intrinsic Cognitive Load

Intrinsic cognitive load that exceeds working memory capacity disrupts learning. While extraneous cognitive load can be decreased by presenting material in a smart way, there is not much that can be done to lower intrinsic cognitive load.

Intrinsic cognitive load should be **optimized rather than decreased.** For example, if intrinsic cognitive load requires fewer working memory resources than available, increasing the intrinsic cognitive load can actually enhance learning. This can be achieved by increasing element interactivity (e.g. introducing 2 simple and related concepts together, or reducing/varying the level of support in worked examples or problem-solving).

”Some

Another approach is to help learners develop basic and specific prior knowledge (e.g. definitions, vocabulary and basic concepts) through pre-training before material with high element interactivity is introduced.

This is what we are currently working on with Kinnu – we build and consolidate your knowledge over time, so that you can continue learning increasingly complex concepts.

Cognitive Load Optimization from Novice to Expert

Many of the ways to reduce extraneous cognitive load work particularly well with learners who are completely new to a topic. Lowering extraneous cognitive load through better content and problem design are great places to start.

However, once learners increase their expertise in a topic, **adaptive fading** should be used. Earlier we spoke about the redundancy effect. This can be a huge cause of cognitive load. It occurs when information is included that learners already know.

Imagine you want to learn about the Battle of Stalingrad – but the first lesson you have about it starts with ‘what is a battle?’ Not only would this be frustrating, but it would also create cognitive load, making it harder to learn the non-obvious stuff.

Obviously, over time, that means fading out more and more stuff – you don’t need to keep answering ‘what’s an equilateral triangle?’ when you’re 3 months into your geometry course.

‘Borrowing and Reorganizing’

The borrowing and reorganizing principle is quite simple. It states that most of the information in our schemas is borrowed from other people’s schemas. This borrowed information is also reorganized by each person to better suit them and their environment.

For example, we discussed earlier how you probably understand the concept of a ‘party’ as a part of your schema. You weren’t taught this concept in school – you absorbed it from the people around you growing up, perhaps your parents, and from popular culture.

You ‘borrowed’ this knowledge from the schemas of those around you. In turn it formed a part of your own schema.

The ‘reorganization’ aspect is illustrated in the fact that your understanding of the word ‘party’ has some differences to that of your parents. You may have different ideas of what to wear, how to greet your friends, or what music to dance to.

So, even though your understanding of what a party is has come from them and other people, you’ve reorganized that knowledge in your schema to align with other values and priorities.

‘Randomness-as-Genesis’

While borrowing and reorganizing knowledge accounts for a large amount of learned concepts, it does not actually allow for the creation of completely new ideas.

Cognitive load theory suggests that totally new information is created in the brain through the ‘randomness-as-genesis’ principle. A simple way of understanding randomness-as-genesis is that it is basically guessing until something sticks.

To return once again to the party example – maybe you were never taken to any parties as a child, and your parents didn’t socialize much. In this instance you have no prior schema about what parties are and what people do at them.

The only way to learn in this scenario is by trial and error. So you go to as many parties as you can, trying out different behaviors and topics of conversation. Each time you come up with your own idea of what to do at a party, and you have those theories validated or disproven over many iterations.

This is randomness-as-genesis – your brain naturally selecting what does and doesn’t work from a series of randomly generated ideas over time.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.