I wonder if that will convince my wife. Probably not. Her stats superstitions drive me crazy. Ever heard of the Chinese birth calendar? For example: http://www.webwomb.com/chinesechart.htm. 90%+ accuracy should be an easy claim to bust. Unfortunately for me it’s been right for our kids 2 out of 2 times. Why are stats always so hard to sell over anecdotal experience?
OK, great. This seems easy enough to test. What follows is my first episode of
(btw, Jamie and Adam, if you are out there, can I please pretty please be the MythBuster’s statistician? You can even use me as Buster II if you want, as long as I get to do math-fun while being blown to smithereens.)
I downloaded data from the website for the Centers for Disease Control and Prevention. Specifically, the data set on births from 2006 in the US territories because it was both recent and smallish. I wrote some quick pythony-goodness to clean that up so I could move it directly over to R– my one true love.
I only consider births in which all of the necessary fields (sex of baby, date of mother’s last menstrual cycle, age of mother at time of birth) are complete, which leaves me with a sample size of 50,079 birth records to play with. Fun!
Ready for the results? The Chinese birth calendar was correct with 49.70 % accuracy on this dataset. With this many observations, the only point of a hypothesis test will be to have one more darn example of a hypothesis test for proportions on the internet. I say the more fun statistics floating around the better, so...
Let’s start by being lenient and test the hypothesis in the classical way that the
Chinese birth calendar is up to anything but complete random chance. We'll even give it credit if it can do a good job at predicting the opposite! At least if we know that it will be useful for something.
That is, the null hypothesis is that the probability of the Chinese birth calendar being correct is p0 = .5. Relying upon asymptotic normality (I’d say that 50,079 is pretty darn close to infinity), the fact that I still remember this stuff after four years of grad school, and wikipedia (it does not lie!), we have a z statistic of -1.32, which falls in about the 9th percentile of the standard normal distribution, implying a two-sided p-value of 0.187. To use normal stats lingo, we have to fail to reject the hypothesis that the Chinese birth calendar is anything but a complete load of baloney. My poor Chinese granny probably just rolled over in her grave. Maybe that wasn't normal stats lingo. Oops.
Again, for the sake of more fun stats floating around somewhere, what about testing the hypothesis that p ≥ .9, as is claimed on the website? Well, in the classical hypothesis testing framework, I think that would either require integration or a likelihood ratio test, to which I am morally opposed. So, as a shout-out to my Bayesian homies, I’ll just slap a conjugate prior on p (a beta(1,1)= uniform). This results in a posterior distribution for p, p | data ~ beta(24891, 25187), which implies that the posterior probability that p ≥ .90 is about nill. Yep, zero. No fucking chance does that predict births with greater than 90% accuracy. So, there we go, I’m going to go ahead and call this one busted, Jamie.
(Drats, there I go dreaming again...)