Sunday, April 18, 2010

Do celebrities die in threes?

Check out purple line on the left in the plot above, which shows the arrangement throughout the year of the dates of death of some of the celebrities who died in 2009. It kind of looks like the deaths are bunched together. Then look to the right. Those are randomly generated death dates, which, because human brains like to see patterns, also look like there is some clustering.

It seems like every time two celebrities die, there is speculation about who will be the third, as though two celebrity deaths necessarily means a third is on its way. Although it would be nice to wait to post this until this old superstition gets dug up again when two celebrities die in close time proximity, it will certainly happen again and probably soon.

An ideal time to have posted this would have been in June of last year when Michael Jackson and Farrah Fawcett died on the same day, the 25th, and the internets were a-twitter with talk of this old wives' tail. Depending on how you count it, this supposed death troika was rounded out by Ed McMahon (the 23rd) or Billy Mays (the 28th). Although lots of other people have posted about the invalidity of this superstition, I have yet to see any plots depicting the statistical insignificance of this event. And, as I learned in 2nd grade when I failed to actually show that I had indeed mentally carried that 1, the policy is no work, no credit. So, here we go for full points, please...

In order to do any sort of testing, we have to define what it means to "die in threes." Seriously, what does that mean? It isn't enough that they die in clusters of any size, in which case, I would probably be talking about self-exciting processes... yes, I did just throw that in so I could say "self-exciting processes." No, the superstition is specifically that they die in threes.

What I propose as a definition of this is that for any three deaths to count as a triple, the time from the first death to the last death in this set must be less than or equal to the time elapsed from the last event prior to the triple, and it must also be less than or equal to the time until the next death succeeding the triple. The three deaths have to be separated in time from the other deaths.

For example, let's consider the {Ed, Michael, Farrah} candidate triple, in which case the time from the first (Ed) to the last (Michael and Farrah.. I'm only counting this down to a resolution of one day) is two days. In order for this to count as a triple, no celebrities could have died within one day of either end of this triple-- there must not have been any celebrity deaths on the 22nd or the 26th. In order for the {Michael, Farrah, Billy} candidate triple to be a triple by this definition, no other celebrities would have died from the 23rd until the 30th.

One other piece that needs defining is who counts as a celebrity. I used this website (and I really do apologize for the lovely anus ad at the top of that), so that I could not be accused of cherry-picking my list of celebrities. You could still make that claim because I removed a few people I did not consider celebrities: children and criminals, for example. I just didn't feel right including a child. And yes, I hand transcribed all of the celebrity deaths in 2009-- that is truly a labor of love.

I calculated that there were 28 triples by my above definition in this data set of the 157 celebrity deaths of 2009. I then randomly generated 10,000 sets of 157 death dates, where the dates are randomly selected (with replacement, of course) over the course of the entire year, and I calculated the number of triples in each of these completely random data sets.

This histogram of the number of triples from each of the randomly generated death dates shows (1) a remarkably normal shape and (2) that 28 triples is a totally reasonable number to have seen if celebrities die at random days in the year. The number of triples last year (the pink line) falls in about the 70th percentile of what we would expect under completely random death dates-- far from anything anyone would consider statistical significance.

One might argue that many of the people on that list are not celebrities. I actually don't know who most of them are. So, I re-ran this simulation, using only the deaths I had heard of. (This should, sadly, sync up pretty nicely with the list of deaths reported on perezhilton, as that is one of my few sources of "news".) A similar histogram to that shown above appears below for the analogous simulation with 23 deaths. Again, nothing spectacularly exciting is going on. We would expect to see this number of clusters under complete randomness.

So, there you have it. You can be the judge, but as far as I'm concerned, I'm convinced. There isn't significant evidence that celebrities die in threes.


  1. Love it. Also, apparently this funny statistic (the count of the number of triples in a random sample of N deaths), is asymptotically normal. Can't say I saw that coming.

  2. You have shattered my world, dear... shattered it. ;)

  3. Manually transcribing all the deaths ? Holy crap girl, I hope Alan does not read your blog ;)

  4. I think you can find some work by Deheuvels who uses empirical process to study the clustering of i.i.d. uniform random variables. I think there is quite a bit of litterature in the math-stat field of empirical process, about that kind of results :)

    See for example the study of the law of uniform spacings:

  5. Sweet, JuJu, thanks!! I'll have a look at that in my copious spare time.. oh wait, writing this crap does imply that there is, in fact, copious spare time.

  6. May I know the reason behind surveying celebrity deaths? Though graphical way used here is quite good but find it little out of context as you are here talking about deaths and unless you are trying to establish some fact no point in displaying it graphically.