Thursday, January 17, 2013

How many eligible universities for the Google US/Canada Fellowship?-

Google has offered this fellowship the last few years to badasses in lots of Googly fields. Thing is, only "eligible schools" are allowed to nominate two students... and this list of eligible schools is apparently super top secret. Wikipedia doesn't even know, so the information probably doesn't exist. 

Why do I care, you might wonder. To get a sense of  the chances that a someone gets picked, given they've made it past their university's nomination stage, duh. Imagine my frustration at not being able to find the necessary denominator!

All I can find is a list of past fellows and their institutions. This seems like the perfect opportunity to whip out some stats magic (and the most magical of stats methods, at that) to make a guess. So, here's the plan. I'm going to look at the universities that got picked each year (if more than one student from the same university is picked in a given year, that university just gets counted once) to make this estimate.  

Capture-recapture. Multiple systems estimation. Two names for one of the more surprisingly cool uses of a simple glm. The idea is that if you have several lists of a finite group of items, by looking at the overlaps among the lists, you can estimate the total number of items. In this case, items are eligible institutions. In other cases, items are fish in ponds.

As stats methods are wont to have, there are some built in assumptions: 
(1) the list of eligible schools doesn't change throughout the years 
(2) each eligible university has the same chance of being picked 
(3) picking a university in one year doesn't effect the chances of being picked in the other years
(4) the universities have unique names and can be identified as the same (or different) from one year to the next

Most of these assumptions probably aren't true. I expect they've added schools to the list over the years, and it seems as though some schools have a better chance of being picked each year than others (Stanford got picked every year, but Purdue only got picked once). (3) is reasonable-- maybe they don't like to pick the same places twice in a row... or maybe the good people come in streaks? No clue here. (4) Check. 

But if they're approximately true, maybe that's good enough. So, let's just run with it, ignoring the possible modifications that could be made to remedy the likely infractions....

Data. Code. doot doot doot doot doot doot doot..... Results...

I think there are probably between 28 and 38 eligible universities with a point estimate of 31. That's anywhere between 2 and 12 more than the 26 that have already been picked. 

Seems like the chances are pretty decent once you get to that stage. Aaaaaaaand I'm satisfied. Back to real work. 

Who am I kidding? Back to watching old episodes of One Tree Hill...