Skip to content

“Science Says…” Except It Probably Doesn’t

Woman in Light Blue Blazer Holding her Hair

Table of Contents

It’s not for nothing that my Second Law of the Media is, When an article claims, “science says…” or “new study shows…”, assume that it doesn’t until proven otherwise. Especially when the headline is making an extraordinary, attention-grabbing claim. And doubly especially when it’s from the field of social “science”.

Claims such as “Women With Tattoos More Approachable, Study Finds”, or “Science Proves It: Men Really Do Find High Heels Sexier”, and “Guitarists Really Are Hot, Studies Confirm”.

All of those claims were trumpeted by the media, based on studies by French psychologist Nicolas Gueguen. The only problem is that they’re almost certainly bullshit. The “guitarists” study was eventually retracted, but there’s plenty of reason to regard Guegen’s work with suspicion (not least his ridiculously prolific output: he publishes a huge number of papers per year, usually listed as the sole author. Some studies appear to list his own laboratory as ethical supervisor, often for research that would struggle to pass an ethics committee, as it placed female confederates in sexualised situations, such as lying in public in bikinis, waiting in bars or being filmed from behind to judge the “sexiness” of their walk.

Since 2015, a pair of scientists, James Heathers and Nick Brown, has been looking closely at the results in Gueguen’s work. What they’ve found raises a litany of questions about statistical and ethical problems. In some cases, the data is too perfectly regular or full of oddities, making it difficult to understand how it could have been generated by the experiment described by Gueguen.

One study that caught their eye was one that claimed men were more likely to help a woman depending how she wore her hair.

The study had tested whether women’s hairstyles influenced people’s inclination to be helpful. On a busy city street, a female collaborator wore her hair loose, in a ponytail, or in a bun, and dropped her glove while walking. The bystanders were given a score that indicated how helpful they had been – if they returned the glove, they got three points; if they warned the woman that she’d dropped it, that was two points; and if they did nothing, one point.

The only problem was that the scores seemed suspiciously regular. Working backwards, the pair reconstructed the set of raw data that could have produced the final outcome (when Gueguen sent them the raw data, they were precisely correct). What did they find?

What this turned up was even stranger. There was only one combination of scores that worked: for every condition, each score appeared 6, 12, 18 or 24 times. For example, women in the bun condition had 12 scores of 1, and 18 scores of 2. “The chances of this happening randomly for all six combinations of participant sex and hairstyle are [one in 170 million],” write Brown and Heathers.

The probability calculation makes certain statistical assumptions that might not be justified, but, regardless of the precise odds, it’s certainly a surprising set of scores.

Then they dug back into Gueguen’s papers published in the last five years, singling out 10 in particular that had curious statistics.

Nearly impossible means and standard deviations, as well as odd-looking data sets, were problems that cropped up repeatedly throughout the 10 papers. And the results derived from that data are often oddly dramatic. In the paper about hairstyles and helpfulness, the difference between how often men helped the glove-dropper compared to how often women helped her was huge.

Brown and Heathers point to a statistical measurement called “Cohen’s d,” which captures the differences in scores between two groups, to show how wide that gap was.

For instance, an easily noticed effect such as the average height difference between men and women scores around d=1.80.

Cohen himself gave a rule of thumb about effect sizes. He suggested d = 0.20 was a small effect size of no real importance, while d = 0.50 was a medium effect, and d = 0.8 was a large effect.

The difference between men’s and women’s helpfulness when faced with a lost glove belonging to a woman with loose hair was d = 2.44. This “would constitute a remarkably large effect in any form of science,” write Brown and Heathers, “let alone social psychology”, a field that typically deals in more subtle effects.

Another Gueguen study claimed that women were more likely to give out their phone numbers when the sun was shining.

After being asked for their numbers, the paper reports, women were asked for their ages. All of the women who were stopped were between 18 and 25 years old.

It seems surprising that the experimenter hadn’t stopped a single woman outside the target age bracket – after all, guessing a stranger’s age precisely is not an easy task. But it’s also odd that every single woman provided an answer to the question of how old they were – they did so even if they’d refused to give out their phone number.

“This implies that not one woman, when approached by the confederate who asked for her number, decided to walk away and ignore him,” they write.

Ars Technica

Does anyone think that that’s remotely likely?

Of course, no such questions ever seem to have occurred to the editors of peer-reviewed journals, so the likelihood that the scientific illiterates of the legacy media are even going to stop to think about them is as unlikely as getting a random woman’s phone number.

Don’t take anything the legacy media say about science at face value.

Latest