A machine learning approach to laughter
By Disa Sauter, Roza Kamiloğlu and Rui Sun
Think of the last time you laughed. What was it like? What made you laugh? What did your laugh sound like? Most of us laugh dozens of times every day, but not every laugh is the same; some are contagious chuckles, others are deep belly laughs, yet others are conceited snickers or polite giggles.
In this project, we wanted to find out whether there are different kinds of laughter: are laughs that are triggered by different kinds of events (e.g., tickling, jokes) systematically different in their acoustic structure? There has been some discussion of this question in the scientific literature, because laughter is an evolutionarily ancient behavior that is produced during play in many species of non-human animals. At the same time, laughter is a socially complex behavior that occurs in many situations that are specific to humans, like verbal jokes. Are these laughs variations of the same thing or are they in fact fundamentally different kinds of behaviors?
Past attempts to create taxonomies of laughter have been based on researchers’ intuitions of which distinctions are meaningful. In this project, we wanted to use a new approach to best understand laughter with a bottom-up approach. Thanks to a grant from the eScience Center’s Small Scale Initiatives call, we were able to use machine learning to address the question of how to understand laughter. Computational methods are relatively new in the social sciences, and particularly so in research on social signals like facial and vocal expressions. Laughter is nevertheless an ideal domain for using machine learning. This approach can enable the discovery of systematicity in multidimensional data, such as hundreds of different acoustic features measured from laughter clips.
Our first challenge was establishing a sufficiently large corpus of real-world laughter clips. We scoured thousands of YouTube videos to collect laughter clips that would be pure enough to analyze acoustically. The videos also had to be clear enough for us to judge the context: was the laughing person watching a funny cat video, chuckling at seeing someone fall over, or were they being tickled or hearing a joke? After analyzing more than 800 videos, four different kinds of laughter triggers could be discerned from the situations in which people were laughing: being tickled, verbal jokes, someone else’s misfortune, and watching something funny. But were the laughs in these situations different, or was each one just an idiosyncratic version of the same behavior? To find out, we measured dozens of acoustic features from each laughter clip.
We visualized these high-dimensional data using a statistical method known as a t-distributed stochastic neighbor embedding (t-SNE), which assigns a two-dimensional, and thus visualizable position to each data point. This method seemed to show that tickling laughter was quite clearly distinct from laughter produced in the other three types of situations: the most meaningful distinction was tickling laughter vs the rest. We then used supervised machine learning methods (such as random forest analyses) to test the extent to which the acoustic patterns would be predictive of the behavioral contexts in which the laughs had occurred. In fact, we used several machine learning libraries for the estimation of predictive distributions.
Our mentors at the eScience Center (Patrick Bos, Florian Huber, and Jisk Attema), provided us with practical guidance in the use of such libraries, which provided an invaluable learning opportunity. For instance, we learned about how the data quality (i.e., sample size, uneven distribution of certain features) might affect the machine learning outputs, which resulted in taking the time to evaluate and scope data with meticulous data integration and data exploration. They also helped our conceptual understanding: what could the results actually tell us? Our machine learning results confirmed quantitatively what the t-SNE had shown qualitatively: tickling laughter was acoustically distinct from the other three types, while laughter produced in reaction to verbal jokes, someone else’s misfortune, and watching something funny were not systematically different from one another. An experiment with human participants also confirmed that tickling laughter is perceptually distinct from the other types; listeners could tell whether a laugh was produced by a person who was being tickled or not with remarkable accuracy. Our results made a lot of sense: tickling is a play behavior that is evolutionarily ancient and shared with other animals, whereas the other kinds of situations are all much more cognitively demanding, and probably unique to humans.
Extracting feature importance told us which acoustic features were most distinct, pointing us to the possibility that tickling laughter is less controlled than other kinds of laughter. To really understand what differentiates laughter produced in tickling contexts from other situations, we complemented the computational analyses with human perceptual judgments. We ran a new listening task in which naive participants (who did not know about the context in which the laughs were produced) were asked to judge the extent to which the laughter sounded controlled, energetic, and so on. The results showed that laughter produced during tickling was judged to sound like the laughing person was not in control of their actions, in a state of high arousal, and in a situation involving physical contact with a familiar other.
Meanwhile, we also analyzed the visual content of the videos in order to see whether the types of situations that we had inferred qualitatively would be distinguishable by a quantitative analysis of what was actually in the videos. Maybe verbal jokes would involve more conversations, and videos involving someone laughing at another’s misfortune would feature more people slipping? To test this, we ran the videos through Google Video Intelligence API, which picks out categories of objects and events. For example, this analysis revealed that the tickling laughter videos involved a lot of body parts, while people laughing when they were watching something funny often involved screens and animals. Machine learning analyses showed that the four types of situations could be well differentiated from just the visual contextual information in the video clips, demonstrating that the distinctions we had made were indeed meaningful, even though some of the differences in context did not translate into acoustically different types of laughter.
And there you have it! We bet that after this blog and our analyses you’ll be thinking twice about your laughter. What makes you giggle? What makes your belly laugh? You’re also probably wondering, what’s next?
The next step for this project will be to tie the different strands together into a manuscript, accompanied by interactive online illustrations, which will be submitted for publication in a peer-reviewed journal. The consultation with our excellent mentors at the eScience Center provided an inspiring setting for discussing our ideas in a constructive and fun atmosphere. The guidance we received will be useful not only for this project but also for our future research.
Dr. Disa Sauter Associate Professor in the Department of Psychology at the University of Amsterdam. She studies emotions, focusing on nonverbal expressions with a particular interest in positive emotions.
Roza Kamiloğlu is a PhD candidate in psychology at University of Amsterdam. Her research interests include nonverbal expressions, emotion, and computational modeling.
Dr. Rui Sun is a guest researcher at the Department of Psychology, University of Amsterdam. She is interested in positive emotion, wellbeing, and social media research.