Psycasm is the exploration of the world psychological. Every day phenomenon explained and manipulated to one's own advantage. Written by a slightly overambitious undergrad, Psycasm aims at exploring a whole range of social and cognitive processes in order to best understand how our minds, and those mechanisms that drive them, work.
My posts are presented as opinion and commentary and do not represent the views of LabSpaces Productions, LLC, my employer, or my educational institution.
Please wait while my tweets load
So there’s been a bit of hype surround a paper entitled “Humor ability reveals intelligence, predicts mating success, and is higher in males.” It seems a lot of people don’t like it, but I fear their dislike is something a knee-jerk reaction, and I feel like a little information could at least add nuance to the critics’ position.
Don’t get me wrong. I don’t like the paper, either. It’s just that most of the criticisms I’ve read seem to trend along these lines:
a) Scientist waste grant money proving something we already know (i.e. Women find funny men attractive); or
b) You can’t measure how funny someone is, therefore their conclusions are wrong; or
c) More Evo Psychology crap; or
d) Any combination of the above
Here’s why I don’t like that paper:
a) Given the data and the methodology the evolutionary hypothesis is overstated, and I feel out of place.
b) I feel that the methodology could’ve been far more rigorous and more real-world appropriate
c) The conclusion that humour predicts mating success (‘such as lifetime number of sexual partners’, as per the abstract) is misleading (though not entirely untrue, given the data)
d) Some of the correlation and alpha values are not convincingly strong (but I’m no expert)
Here’s some important key points lost in the reporting:
a) The relationship between Humour and Mating Success is mediated by Intelligence (specifically, verbal intelligence)
b) These should be considered preliminary findings, not definitive.
So let’s get into the guts of this paper, and address as many concerns as possible.
200 men and 200 women (undergrads) were recruited to take part in the study (Average age was 20.6 years (+/−4.7, range 18–57)). Here I quote:
Participants' self-reported ethnicity was 58% white, 29% Hispanic, 5% Asian-American, 4% American Indian, 3% African American, and 2% other. UNM is a large state university with low entrance requirements, and many minority, nontraditional, mature, and first-generation students. Thus, UNM students show high variance (and low restriction of range) in intelligence, sexual attitudes, mating strategies, political values, religiosity, and other demographic, psychometric, and mating-relevant variables.
To use some technical terminology, that sample is fucking huge. Is it too big? I’m not sure. It can arise that with a big enough sample you’ll find just about anything. Want to find a correlation between limping and hair-length? Test 100,000 people and you might just find it. This is why it’s super important to have directional hypotheses. I don’t know if this sample is too big, but I would find it more convincing if they found the effect with a sample only half or a third as big. However, let’s not let this cloud the issue.
Intelligence was a composite score of 12-items from Raven’s Matrices (a measure of abstract reasoning), and is a good measure of abstract reasoning.
Intelligence was also measured on a 46-item verbal task which asked participants to match two words of similar meaning.
Based on the description of the task, it may look something like this: … the word Magniloquent is closest in meaning to:
Both of these measures are good quick and dirty approximations of IQ. They’re not perfect, but they are consistent and reflect the relationship with IQ on abstract reasoning and verbal aptitude respectively. Some may argue that IQ is a questionable index, but it is a useful measure and I feel they haven’t over-extended what it means in this context.
Humour was examined through the use of a cartoon captioning context. Participants were given 3 cartoons from previous New Yorker cartoon captioning contests. Participants were given 10 minutes to write as many funny captions as possible (for all three cartoons). Unsurprisingly, most cartoons were rated as ‘not funny’. In some ways this is a good thing – you’re only going to be able to find this effect at the top end of the spectrum. It’s also a fairly artificial way in which we are funny and so it’s not necessarily unexpected, yet it was a way of maintaining consistency for all participants. Furthermore, I see no reason why intelligence and humour production couldn’t be related in this context (as opposed humour generated through slap-stick, which may not rely on one’s IQ as heavily).
Does it matter that humour is subjective? That what I find funny is different from what you find funny? Perhaps, perhaps not. However, given that 6 people rated the captions (and that 4 of them were male!) I think this is a genuine concern. Their results (and hypothesis) are fundamentally related to gender-differences, and so should be reflected appropriately.
At an absolute minimum I feel the number of raters should’ve been doubled, with a 50/50 gender split.
A more acceptable option would’ve been to get every participant back in the lab rating a couple dozen cartoons each. The mean number of captions for each cartoon was 3.5, across three cartoons each participant produced ~ 11.5 captions. If each participant rated ~72 cartoons (6 other participants) then I feel we would have much tighter and more valid mean values for humour, with variation for preference and style (potentially attributed to gender, as well as other differences) spread through the volume of ratings.
In the world of limitless time and money I would hook each participant up to an EEG as they did this, and record their behavioural responses, too. I feel this could give a more honest indication of humour appreciation than a self-reported critique of ‘is this funny?’.
Finally, I wonder how much redundancy was evident. How many people made the same joke? And, was this joke (the repetitive one) a good one? Potentially the most obvious joke would not be related to IQ, and so greater deviation from the obvious joke may more honestly reflect IQ. I’m not sure, but I suspect there was huge redundancy, and I suspect there’s something interesting in that data alone…
Returning to how they examined humour (and the world of limitless time and money), an alternative (to my mind) would be getting groups of people together and providing some exercises that involve each person giving a short speech (tell us why you choose to study x, what’s your favourite past-time / hobby and why, etc…) and recording (without their knowledge) the proceedings. Independent raters could assess the number of jokes, attempts vs. successes, amount of laughter, and nature of the humour, etc, and potentially get a more real-world measure of how people use humour, and not just if they’re capable of it. This, however, has its own downsides.
These points should address the criticisms that humour can’t be measured, and therefore invalidate the results. Humour can be measured. It has been measured here. The problem is, IMO, that it was not rated appropriately.
Continuing on with their ratings of humour, they include the following:
“Funniness ratings were highly skewed, with most captions rated not funny at all, and even the funniest students producing only a few captions per cartoon that were even moderately funny. So, from each judge's ratings of each caption for each cartoon, we took the highest rated caption as most representative of the participant's humor ability. Then we averaged these high scores across the six judges and the three cartoons to yield an overall humor ability score. Internal consistency scores (Cronbach's alphas) of ratings across the six judges averaged 0.72 across the three cartoons, which is somewhat higher than in other cartoon-captioning studies (Feingold & Mazzella, 1993; Masten, 1986).”
The skew, which I’ve mentioned previously, is unsurprising. I included this because… I have no idea what they’re talking about. An alpha of .72 is probably appropriate given that humour is so subjective, but I have to wonder if it was a product of whatever they did to the data. Perhaps this is a standard procedure, I don’t know, I can’t follow the process. However, I would think that on something like judging humour an alpha value ought to stay relatively consistent between studies. It’s not like they’ve manipulated the participants or the judges to be funny or to be susceptible to mirth, all they’ve got is an original set of cartoons and corresponding humour judgements – why should it be higher than other studies?
Questions aside, here’s the good stuff.
Vocabulary and Raven’s are correlated significantly, at a moderate value. My understanding of intelligence is that this is probably appropriate. Given that Raven’s was designed to be administered without any language, it’s unsurprising it’s not related more highly. If the test was of mathematical reasoning we could probably expect higher r value, moderated by something like years in school. Raven’s and Vocabulary measures are quite distinct, yet (ideally) are part of the same, broader, construct.
Intelligence is correlated weakly to moderately with humour ability, with a greater correlation between Vocabulary than abstract reasoning. This is consistent.
The surprise is that mating success is not significantly correlated with humour ability. This should be surprising, given that this paper has widely been reported as humour => mating success.
First, however, are the questions (and data) that relate to ‘mating success’:
How reliably is this data? My guess is reasonably. It is self-report, so the title of the paper should read ‘Humour ability reveals intelligence, predicts self-reported mating success, and is higher in males’. That – right there – is why this paper has drawn negative criticism. It kinda feels dishonest. The conclusion people drew - the conclusion people were inevitably going to draw from that title - is that Funny = Smart + Sexy.
What they found, validly, was that Humour mediates the effect of intelligence on [self reported] mating success.
Wiki explains mediation:
In statistics, a mediation model is one that seeks to identify and explicate the mechanism that underlies an observed relationship between an independent variable and a dependent variable via the inclusion of a third explanatory variable, known as a mediator variable. Rather than hypothesizing a direct causal relationship between the independent variable and the dependent variable, a mediational model hypothesizes that the independent variable causes the mediator variable, which in turn causes the dependent variable. The mediator variable, then, serves to clarify the nature of the relationship between the independent and dependent variables.
The long and short of it is that this was an indirect effect. Not an easy A => B situation, but A => B =>C with no A=>C relationship (where C = mating success, A = Intelligence, B = Humour).
However, here’s the kicker – they didn’t and couldn’t (given their methods) - assess the degree to which females preferred the humour generated by funny men (or funny women; or the preferences of men for funny women or other men). What did they do? They correlated humour rating with measures of IQ and self-reported measures of sexual success.
What they really, really needed to do was to take a range of captions from men and women, and present them to independent men and women (probably from a whole new sample) and associate the captions with either a particular man or a particular women (much like a dating study). Then ask the participants to rate their perceptions of the joker on values of i) humour (obviously); ii) intelligence; iii) mate potential in a) the long, and b) the short term; and iv) likeability.
Perhaps some jokes are funny when said by men, some by women. I’ve already suggested there’s probably a lot redundancy – did the redundancy trend along gender lines? Are men generally funnier than women, and so an average joke by a man is funny when the same joke by a women is not? Is a women, who is associated with male-generated humour, perceived in the same way as a man with the same joke? A vice-versa? Maybe men prefer jokes by men, and women enjoy female humour. There probably are answers to these questions… though I didn’t see any report of them in the paper.
Maybe funny men are perceived as a higher quality mate. That would lend support to the hypothesis. Maybe funny women are somehow handicapped by being like a funny man (this is a possible implication from background data cited in the introduction). There’s a lot of ifs and maybes here, and I would like to see these conclusions hold beyond self-report.
Finally, you may be thinking ‘Ok, ok, ok, but they still claimed that this was an evolutionary advantage’… and you’d be right. They did. They said:
A good sense of humor is sexually attractive, perhaps because it reveals intelligence, creativity, and other ‘good genes’ or ‘good parent’ traits….
These results suggest that the human sense of humor evolved at least partly through sexual selection as an intelligence-indicator….
Also, sex differences in reproductive strategies may explain why females value humor production ability more in mates (Lundy, Tan, & Cunningham, 1998), why females laugh and smile more during conversations, especially in response to humor produced by the opposite sex (Provine, 2000), and
why women tend to like a man who will make them laugh, while men want a woman who will laugh at their humor (Bressler et al., 2006). …
Humor, intelligence, and mating success may have especially important relationships, which this paper investigates. Intelligence has been much better studied than humor as a mental fitness indicator: general intelligence is one of the most sexually desirable traits for both sexes (Buss, 1989), is highly heritable (Plomin & Spinath, 2004), and is correlated with many fitness-related traits such as physical health andlongevity (Gottfredson & Deary, 2004), body symmetry (Banks, Batchelor, & McDaniel, 2010), physical attractiveness (Langlois et al., 2000) and semen quality (Arden, ottfredson,
Miller, & Pierce, 2008). If humor production ability is an honest indicator of intelligence, humor production ability should positively correlate with intelligence.
I have no reason to doubt any of those claims. Yet at best all this information serves to support the proposal of an hypothesis that humour has adaptive qualities in mate selection. I feel this is probably true, I suspect better papers have linked the two previously (though I stand to be corrected). This paper, however, does not. It shows that humour mediates the effect of intelligence on self-reported mating success. It does not demonstrate an evolutionary link. It really doesn’t even try. I wrote about a paper in my last post that shows an implicit link between humour styles and shared genetic and environmental factors. They used twins and found a small effect. This paper, in my opinion, is an example of a bad evo psychology paper. It’s papers like this, with an ad hoc kind of ‘it’s probably adaptive’ just-so story that tarnishes all the work of other authors who attempt to bring multiple converging lines of evidence together to support further enquiry. The claim that [humour] evolved at least partly through sexual selection as an intelligence-indicator isn’t even fair. There’s no reason it couldn’t have evolved for a myriad other reasons and just happened to be related to intelligence.
You know there’s correlations foot-size and intelligence? It’s true. As your feet get bigger, you get smarter. This correlation exists because foot size increases as you age, as does your intelligence. Though humour is a more complicated topic than foot-size, it would be wrong to suggest that ‘foot size evolved at least partly through sexual selection as an intelligence-indicator…’.
So, in summary, I agree. This is not a great paper. It seems to deserve criticism. However, the reporting of it sucked, and the re-transmission of it sucked even more. It presents a small meditational relationship between intelligence, humour, and sexual behaviour which is potentially thought provoking and can lead to some more interesting hypotheses. However, I do feel that some of the criticisms were unjust even though the conclusions of many were probably fair. I do feel, however, that a little more depth in enquiry is the way to treat this topic.
Yes, it was a bad evo paper. No, it wasn’t something people already knew. No, humour isn’t some impossible magic construct beyond the understanding of science. Yes, the title is misleading, but No, the results (in light of certain considerations) are valid.
My thoughts are that evo psych has a lot to offer the study of human behaviour, but the more people who run around half-cocked with just-so stories, the less reputable and less valuable the field becomes as a whole.
I’m sure there are biologists who cringe when a new astrobiology paper hits the media – arsenic life, fossils in a meteor, anyone? Many biologist probably scream silently ‘No, for the love of science, this is not good research!’ Stop paying attention, stop publishing!’. Is this special pleading for a field without repute? I don’t think so. Sometimes it’s bad science, sometimes it’s bad reporting, sometimes it’s just wrong. I feel that evo psych is the same thing. There’s the good, the bad, and the ugly. The good is sound and rigorous (and often palatable), and the ugly is the stuff that might be true but people don’t like… and this paper is just bad. It wouldn’t be half as bad though if it didn’t make a partial claim on the ultimate origins of humour. It probably wouldn’t have even been noticed.
Greengross, G., & Miller, G. (2011). Humor ability reveals intelligence, predicts mating success, and is higher in males Intelligence, 39 (4), 188-192 DOI: 10.1016/j.intell.2011.03.006
This post has been viewed: 5164 time(s)
Mmm. Psycasm, did they mention the power of the study anywhere in the article? (sorry I can't get to the article at this moment).
Just to reinforce some of your point, and critizice others others...
The n is somewhat irrelevant, it depends on how much difference you want to evaluate. Just playing with Gpower3, a difference of 0.3 with a power of 80% and an alpha of 5% gives you a total sample of 143 (for X2). But if the difference is smaller (like detecting small behavior changes/characteristics) 0.1, then the sample is more than 1200. The other thing is that they don't seem to have corrected for multiple testing, which makes the pvalue that is statistically significant much lower (although again, I can only go with what you showed). Did you see something as a Bonferroni correction somewhere?
My main critique about your statements above is this. Are you more conviced if the effect was found with a smaller sample? I think you are using your statistical knowledge in a twisted fashion. If the effect is true (let say that it's actually reality) having a smaller sample will give you a higher variance, with a higher risk of giving values that are way off the "true" value, so it could be all over the range of possible results. The increase in sample size will not make the sample result more or less real, what it does it that it decrease the effect of variance on the sample result, giving you more (not less) confidence that the result is "true".
What you are saying about the relationship between limping and hair-length is not an issue of sample size, it's an issue of multiple testing without correction, which can be solved by either increasing the size of the sample or correcting for the strength of the correlation. For the best ever explanation of this check http://xkcd.com/882/
Ah, statistics. My guilty pleasure...and even with that I still get it wrong half of the time. Hopefully (50% of chance as per my previous statement) I am not wrong this time.
And let's not talk about the wrong assumptions leading to the hypothesis of this article. Agree with you. The article is terrible. But I shouldn't say anything, as I am terribly biased against evopsych. Unless proven otherwise.
You make some excellent points, and I concede to just about all of them.
I guess I pointed out the n because it's so big. I can't imagine a reason why they'd need it so big on such a simple experiment; nor do they explain the recruiting procedures.
Additionally, no, there's no power values or any indication they used corrections. I must admit it's good to have someone correct me when I make mistakes like that. Though I've done all my undergrad stats courses, it's a skill that declines rapidly without regular use... and it's clear I've not been working on it enough lately, to make such mistakes.
Personally, I feel this paper didn't even need an evo explanation (crap attempt, though it was). It's basically a social paper. Women like smart men, we know this. Humour is a desirable trait, too - again, we know this. It's hardly a leap to suggest the two may be related, and for it to be reflected in the way women choose men. I mean, go to any online dating site - what do women list as important 'Intelligence and a good sense of a humour'....