In speech, the main domain for intersocial communication, understanding can only function if all essential aspects of a message are successfully conveyed. Besides the linguistic content of an utterance also paralinguistic aspects play a role, such as the emotional state of the speaker, his or her gender, approximate age etc. This data will make it possible for the receiver to interprete the context of the situation. It is the prosody of an utterance that will tell about the emotion the speaker attaches to the message. However, an attentive listener might perceive more information by being able to detect emotional content which the speaker does not necessarily intends to transfer.
Approaching certain aspects of the complex issue of how speech is affected by emotions, in this essay I am referring to several papers but focusing mainly on one, which deals with how psychosocial factors (in this case, experimentally induced psychological stress) can affect the production and recognition of speech from both perspectives, that of the speaker and of the receiver. I will refer to the study of interest as study 1 and to the other two as study 2 and 3 respectively, although it is important to notice that they are completely independent from each other and differ in numerous aspects.
The goal of study 1 was to explore how induced stress changed the production and recognition of vocalized emotions and the hypothesis was that stress would need to have some effect on it. The data was acoustically analyzed and underwent thorough statistical analysis in order to find the correlations and significances between the factors. The study was separated in two parts. In the first part, results demonstrated that naive (neither professional actors nor trained) listeners could detect that naive speakers who were put under stress sounded more stressed. Besides, it was shown that negative emotions produced by stressed speakers were not recognized as easily as the same emotions produced by non-stressed speakers and also postive emotions produced by stressed-induced speakers were recognized easier than negative from the same group. The reason for this, as proposed in the paper, might be that the variation of volume produced by the speakers did not fit the expectation of the volume variation expected by the perceivers.
Another theory indicated in the paper was that speakers, suffering from mild stress, found it alleviative to express positive emotions in this situation. In any way, this outcome proved that the judgement made by the receiver is affected by the stress level of the speaker. In the second half of the study participants who were supposed to later make a prosody recognition task (speakers needed to read sentences in an angry, disgusted, pleasently surprised, fearful, happy and neutral tone of voice, giving the receivers a wide range what to recognize the emotions from) were induced with the feeling of stress before the task and afterwards performed worse than those participants who were not under stress. Therefore, overall the findings indicate that interpersonal sensitivity in communication deteriorates due to induced stress.
Study 2 hypothesised that emotion influences speech recognition accuracy (particularly for the artificial speech recognition domain) and in their acoustical investigation focused mainly on the pitch as an important parameter indicating the differences. Moreover, the study had the goal to explore how emotional states influence continuous speech recognition performance (different than in study 1, here accuracy of recognition of the content was in question) and found that angry, happy and interrogative sentences lead to lower recognition accuracy compared to the neutral sentence model. In study 2, the speakers were trained to utter the sentences in a particular emotional state, whereas in study 1, they were not. Summerizing the results shortly, emotional states lead to variation in speech parameters and this causes a problem for speech recognition systems which use baseline models. Therefore it is important to find how emotion influences the parameters and to systemize those changes, which however remains a difficult task due to the complexity of a great database needed for it and other systematic difficulties.
Study 3, briefly, was another analysis of variability in articulation in emotional speech. In this study it was considered that studying acoustics was not sufficient when other paralinguistic factors such as the talker, the linguistic conditions and the types of emotion could influence so much. Therefore, direct measurements from the articulatory system by Electromagnetic Articulography and real- time MRI which made the static and dynamic processes of the organs visible were used. A part of the videos was collected into an openly accessible corpus for further systematic research of articulation and prosody (all of the data taken from professional actors and actresses). The target emotions here were angry, happy, sad and neutral.
There are some interesting details to be added about study 1. As the participants were untrained speakers who needed to pronounce different sentences giving them the tone of different emotions, it might be argued that the data cannot be precisely applied to real-life emotional prosody. However, the speakers were required to imagine themselves in situations in which they felt the emotions in question before voicing, which might have improved their performance enormously, but it remains speculative. In order to induce stress on the partcipants, a subpart of the Trier social stress test was used, in which the participant had to solve an arithmetic task, precisely counting back from 1022 in steps of 13. If an incorrect answer was given, the participant had to start again from 1022. The level of stressed was measured subjectively on a 0-15 scale. Some of the participants were not susceptible for priming as stress induction did not work in their cases. Their data was excluded from analysis.
As to the selection of test material, there were no prior guidelines of how emotional sentences had to be like pronounced by stressed versus non-stressed speakers. For that reason material was statistically classified based on 7 standard acoustical parameters, namely the mean, minimum and maximum of pitch, the mean, minimum and maximum of intensity as well as the mean of duration. Discussing acoustical information about the pitch parameter in study 1, it clearly showed that angry, fearful or happy utterances are characterized by higher pitch and louder voice. Sad expressions are pronounced using lower pitch, reduced volume and more slowly generally. Stressed speakers who expressed disgust, pleasant surprise or happiness were using a reduced pitch range.
There are references to other studies which suggest that women, generalized, when speaking in stressful situations, are analysed to use lower pitch and intensity and do not use as much of their aerodynamic capacity as men. As in the present study 1 all of the participants were female due to a lack of male volunteers, and indeed for some of the emotions the stressed induced group tended to use smaller pitch. For improved future studies, it is suggested in study 1 to use different procedures in order to evoke emotions in participants and to try to balance the genders of the persons tested among male and female to get better data.
One of possible explanations for the observed phenomena is the assumption that stress disturbs the ability to control the vocal apparatus the way a speaker would do in a non-stress situation. According to the message and the context, we adjust our articulatory highly sophisticated mechanism and this procedure is affected easily by our emotional state and of course also the emotion we want to transport to the outside. This leads to various challenges concerning classification and speech recognition. All in one, the data collected in study 1 is supposed to provide an insight about how stress affects our en- and decoding abilities.