Tuesday, April 20, 2010
Princely programming hotness
Monday, April 19, 2010
The Chinese birth calendar is total bunk
Hoelle said:
I wonder if that will convince my wife. Probably not. Her stats superstitions drive me crazy. Ever heard of the Chinese birth calendar? For example: http://www.webwomb.com/chinesechart.htm. 90%+ accuracy should be an easy claim to bust. Unfortunately for me it’s been right for our kids 2 out of 2 times. Why are stats always so hard to sell over anecdotal experience?
OK, great. This seems easy enough to test. What follows is my first episode of
MathBusters.
(btw, Jamie and Adam, if you are out there, can I please pretty please be the MythBuster’s statistician? You can even use me as Buster II if you want, as long as I get to do math-fun while being blown to smithereens.)
I downloaded data from the website for the Centers for Disease Control and Prevention. Specifically, the data set on births from 2006 in the US territories because it was both recent and smallish. I wrote some quick pythony-goodness to clean that up so I could move it directly over to R– my one true love.
I only consider births in which all of the necessary fields (sex of baby, date of mother’s last menstrual cycle, age of mother at time of birth) are complete, which leaves me with a sample size of 50,079 birth records to play with. Fun!
Ready for the results? The Chinese birth calendar was correct with 49.70 % accuracy on this dataset. With this many observations, the only point of a hypothesis test will be to have one more darn example of a hypothesis test for proportions on the internet. I say the more fun statistics floating around the better, so...
Let’s start by being lenient and test the hypothesis in the classical way that the
Chinese birth calendar is up to anything but complete random chance. We'll even give it credit if it can do a good job at predicting the opposite! At least if we know that it will be useful for something.
That is, the null hypothesis is that the probability of the Chinese birth calendar being correct is p0 = .5. Relying upon asymptotic normality (I’d say that 50,079 is pretty darn close to infinity), the fact that I still remember this stuff after four years of grad school, and wikipedia (it does not lie!), we have a z statistic of -1.32, which falls in about the 9th percentile of the standard normal distribution, implying a two-sided p-value of 0.187. To use normal stats lingo, we have to fail to reject the hypothesis that the Chinese birth calendar is anything but a complete load of baloney. My poor Chinese granny probably just rolled over in her grave. Maybe that wasn't normal stats lingo. Oops.
Again, for the sake of more fun stats floating around somewhere, what about testing the hypothesis that p ≥ .9, as is claimed on the website? Well, in the classical hypothesis testing framework, I think that would either require integration or a likelihood ratio test, to which I am morally opposed. So, as a shout-out to my Bayesian homies, I’ll just slap a conjugate prior on p (a beta(1,1)= uniform). This results in a posterior distribution for p, p | data ~ beta(24891, 25187), which implies that the posterior probability that p ≥ .90 is about nill. Yep, zero. No fucking chance does that predict births with greater than 90% accuracy. So, there we go, I’m going to go ahead and call this one busted, Jamie.
(Drats, there I go dreaming again...)
Sunday, April 18, 2010
Do celebrities die in threes?
Tuesday, April 13, 2010
Someday I hope I can be this badass
For those of you without easy access to academic journals, I reproduce for you a few of the choicest excerpts.
The abstract:
Sunday, April 11, 2010
Seasonal Insults
Wednesday, April 7, 2010
A princess story part II
In terms of modeling assumptions, the slight tweak required to accommodate our dear princess's independent spirit (or desire to never have to utter the words "I don't know who my babydaddy is."-- thanks, L, for that.) is the addition of a component that determines the wait time between relationships. In the current incarnation of this simulation, the princes arrive back to back, and she is constantly collecting data. Exhausting!
Alternatively, let's consider a model in which there is a period of latency between princes. At the end of each prince's tenure, let's now suppose that the princess waits some exponentially distributed amount of time with mean independent of her mean relationship duration.
How does this change the best strategy?
As expected, she should auto-reject fewer princes on average if she's going to wait a long time between them. Makes sense. Keep in mind that this is under the assumption that she is not collecting data on anyone during the waiting period.
Tuesday, April 6, 2010
A princess story
But ahhh! In high school, back when this whole game started, Lance Bass was pretty much my ideal man. (Don't judge! My bff had already called JT.) We all know how well that would have worked out for me if I'd gotten my wish. Thank goodness I've gotten to update my understanding of which man-attributes I like since then. Turns out shy and effeminate isn't quite as appealing to me as I once thought... Strange.
Lastly, let's ground ourselves in reality. As much as I'd like to keep playing until I find the perfect person, there is only finite time in which to play this game. While long-term relationships use up a lot of your allotted game time, they also allow you to learn a lot more about the attributes you appreciate in a person.
Feel free to skip to the bottom now; you won't hurt my feelings. But, for the brave...
I will include parameters that dictate:
- The noise with which the princess observes her utility for a prince after knowing him for only one day.
- The average duration of a relationship. (This will be modeled with the exponential distribution, though really a think a mixture distribution with a point mass at 1 day would be pretty appropriate for most of my friends. Not me, of course. I'm a lady.)
- The number of attributes that go into a princess's weighting of how much she likes a guy. According to the partner in crime in this project, there should only be two attributes... jerk! jk
- Your time limit for picking a mate. I'm setting this to 12 years.
- How sure you are about your initial guess at the importance of different attributes, and how far away this is from the truth.
- What percentile of awesome does the guy you end up with have to be in to make you happy. If you get someone who is top 10 out of 100, is that good enough? (I'm setting this to be top 5%. No soulmates here!! Why 5%? Ask whomever made it the magic number in hypothesis testing, I don't know)
But, before we get to my decision re: the manfriend, let's look at how one's strategy should change over different values of one of the parameters just to get some intuition about how this simulation is working...