Our human ability to share what we are thinking requires complex planning and production processes
The importance of language production
Perhaps surprisingly, more is known about language comprehension than production, despite their evident interrelatedness. Analysis of speech comprehension involves the mental processes that turn sounds into ideas, while speech production involves how we turn ideas into sounds.
While seemingly similar, the latter involves many motivational and social factors in the mind of the speaker, which are more difficult for the researcher to control.
Speech production may seem effortless, but at approximately **150 words a minute**, processing demands are high and take place at multiple levels. The first, the ‘**semantic level,**’ includes the meaning behind what is to be said.
While the second, the ‘**syntactic level**,’ involves the structure of the words. Thirdly, the ‘**morphological level**’ engages the basic units of meaning, or ‘morphemes,’ and finally, the ‘**phonological level**,’ is the basic units of sound or ‘phonemes.’
While the order may seem logical, in reality, it is messier. Processes frequently overlap and often start before the preceding one has finished.
Planning what we are going to say
**Speech production typically begins with planning**. And research confirms what we would expect, that the more complex the syntactic structure of what we want to say, the longer that planning takes.
And **planning occurs at multiple levels** – when setting up the subject and verb (clause) and expressing a single idea (phrase).
The errors we make in our speech production offer further insight. For example, swapping 2 sounds in a sentence typically occurs over shorter distances than exchanging entire words. The former is likely to be the result of late planning immediately before speech occurs.
And yet when we attempt to start speaking rapidly rather than fluently, we may sacrifice planning and risk making errors in what we say or how we say it. Therefore, **speech planning is typically flexible and shaped by immediate goals and situational demands**.
The errors in what we say
Much of what we say is **coherent and accurate** – but not always. Indeed, a study in 2002 suggested that our speech error rate may be as high as 1 in every 500 sentences.
If relatively infrequent, why do such mistakes matter to us? Well, when things go wrong, they offer insight into the underlying processes, especially as typically they are not random but are primarily systematic.
**‘Spoonerisms’** occur when the first letters of 2 words are swapped, such as when we say ‘Hobin Rood’ instead of ‘Robin Hood.’ **‘Freudian slips,’** or spoken distortions, can suggest something we want to say but feel we can’t or unrealized feelings. One example of a Freudian slip is when you call your spouse by the name of your ex, which is always a dangerous move!
**‘Semantic substitution’** is evident when words are replaced by something of a similar meaning, such as “Where is my tennis bat?” rather than ‘racquet.’
And yet, while we make errors, **we are also good at spotting and correcting them**. However, we are even better at identifying others’ mistakes than our own.
Multiple theories of speech production
Even if we agree on 4 potential levels of processing in speech production: **semantic, syntactic, morphological, and phonological**, there are differing theories on how they relate.
The **‘spreading-activation theory’** suggests the processing occurs **simultaneously**, with each level able to influence the other. Such a setup offers flexibility but also, potentially, a degree of chaos.
On the other hand, the **‘WEAVER’** (Word-form Encoding by Activation and VERification) computation model posits a **multi-stage feedforward system** where processing is carried in one direction from meaning to sound.
We begin by choosing a word based on meaning and context, checking for errors, defining its form and sound, before its final articulation. This model suggests more structure and rigidity.
The 2 models are not as dissimilar as they initially seem, and there may be a meeting point between them. After all, too little interaction between processing levels could inhibit speakers’ creativity and production of new sentences, while too much could result in excessive speech errors.
Brain-damage patients
As with other areas of cognitive neuropsychological research, brain-damaged patients offer opportunities to challenge and validate language production theories. And we have much to learn from the resulting **‘aphasia’ – difficulty with the patient’s language or speech** that results from their injury.
Patients with damage to the **Broca’s area** of the brain are left struggling to form syntactically correct sentences, despite no loss of language comprehension. In contrast, individuals with an injury to the **Wernicke’s brain region** can produce grammatical speech but lack semantic meaning. Together, they suggest **a high degree of localization for syntactic and semantic processing**.
And yet, not all brain injuries are the same, and neither are the resulting deficits. After all, while distributed, aspects of language processing appear interconnected in highly complex ways.
As a result, research has shifted in the direction of more specific types of aphasia, such as **’anomia,’ the difficulty in naming objects**, and **’agrammatism,’ where sentences contain nouns and verbs but no function words**, such as ‘the,’ ‘over,’ and ‘in.’
Moving away from the monologue
**Most speech is found in conversations with one or more individuals rather than monologues**. After all, unless we’re Hamlet of Denmark, we don’t typically spend our time walking around making speeches, devoid of interruptions, questions, answers, and challenges to what we are saying.
British philosopher of language **Paul Grice** proposed 4 maxims that a speaker should observe for successful communication. The first maxim of **‘relevance’** involves staying appropriate to the situation, while the second, the maxim of **quantity,’** requires being as informative as is necessary.
The third maxim is **‘quality,’** remaining truthful, and the fourth and final, is the maxim of **‘manner,’** ensuring each contribution is easy to understand.
And yet, the need for 4 maxims has been challenged – the first maxim, relevance, implies the other 3 and may make them unnecessary. Also, motivation is an important factor.
If our goal as a salesperson is to sell our audience something, or we are a politician wishing to influence or even mislead another, we may distort the truth.
Audience design
Truly effective communication requires appropriate ‘**audience design**’ – meaning that we account for the specific needs of the listener. Creating a common ground for those listening to what we say requires us to make **assumptions regarding their general knowledge, shared personal experiences, and even their preferred language**.
While speakers sometimes begin by planning what they will say without considering audience design, they typically monitor the feedback they receive and shift toward more common ground.
And yet, revising plans while speaking seems computationally difficult, so tactics such as priming introduces concepts early on to help reduce cognitive demands on speech production. **Syntactic priming involves copying elements of what the other person is saying.**
Additionally, the speaker is more able to focus on audience design when the listener’s needs are clear and straightforward. For example, bilingual speakers find it relatively easy to confine themselves to speaking a specific language when their listener is monolingual.
Non-verbal communication
**Not all communication is verbal**. Indeed, a great deal can be shared by how we stand, the expressions on our faces, and the gestures we use. After all, before our ancestors could vocalize what they thought, they most likely relied heavily on non-verbal communication.
A 2011 study found that, **when gestures were used to describe an elaborate dress, they became significantly more informative when the speaker could see the person they were communicating with**.
And it seems that gestures varied depending on the feedback received. When the listener showed an understanding of what was being said, the number of gestures decreased. Yet, when they sought further clarification or correction, they became more precise and elaborate.
Perhaps surprisingly, blind since birth speakers also make use of such non-verbal communication, even when in dialogue with blind listeners.
It seems that **gestures are an essential aspect of communication** and a worthwhile investment in increasing the clarity of what is being said, especially when flexible and responsive.
Finding the groove
The words we choose are vital when we wish to share a thought verbally, yet so too is ‘how’ we say them – this is known as ‘**prosody**.’ **Rhythm, stress, and intonation** can entirely change the meaning conveyed; we can suggest anger, shock, upset, and even switch between an exclamation and a question.
After all, does the sentence “The grumpy men and women sat together” mean both the men and the women were grumpy or only the former? Clarity can be improved by how the sentence is spoken without changing its structure.
Similarly, “He shot the man with the rifle” begs the question, “who had the rifle?” And yet, with appropriate stress, the right interpretation can be encouraged.
However, **speakers even use prosody when there is no ambiguity**, suggesting that they are not responding in any way to the listener and that it must form part of the speaker-planning process.
**Prosody is a particular problem for synthetic speech**. Without it, AI-produced voices can sound fake, unengaging, put emphasis in the wrong places, and struggle with unusual words. This can be especially problematic when trying to deliver something dramatic or emotional. So, don’t even think about asking Alexa or Siri to deliver your eulogy.
Discourse markers
When we listen back to a recording of ourselves in conversation or presenting, we may be frustrated by the number of times we utter words such as ‘um’ and ‘eh’ – or ‘ano’ for Japanese speakers! And yet, ‘**discourse markers**’ as they are known, despite seeming irrelevant to the speaker’s message, **do assist communication**.
In fact, they help clarify the speaker’s intentions. For example, ‘oh’ and ‘um’ can be us sharing that we have problems deciding what to say next. And using phrases such as ‘you know’ can indicate the speaker is confirming understanding or looking for approval.
It seems they also vary depending on who the conversation is most relevant to. **A 2006 study found that ‘oh’ is used 98.5% of the time when a new topic is directly relevant to the speaker, while ‘so’ is present in what is said 96% of the time when it most concerns the listener.**
While not contributing directly to the content, **discourse markers are an important aspect of conversation**.