Thursday 13 April 2017

Hypothesis testing: open data edition

The graph looked plausible. It didn't really fit my experience, but it didn't seem implausible either.

So I took a 5 minute jaunt over to Berkeley's SDA engine, which draws on US GSS data. First I ran a basic regression of happiness on age, age squared, with a high score in wordsum (a vocabulary test) as my nerd interaction term.



Berkeley lets you run the regression right in the website. I'm not sure I've got this one right - can't guarantee that I properly excluded where they used a code for missing data.

Then I downloaded the data to plot things a bit more nicely in Excel, because you can generate custom data extracts on the fly. I'd not done that before; learning how to do it took 5 minutes (mostly because I forgot to include the happiness variable the first time through). Deleted all the missing data lines. Then a quick pivot table let me plot average happiness for the top wordsum scorers against the average of all wordsum scores on happiness, by age:


The orange line is noisier - as you'd expect as there are fewer observations over which it averages at each age group.

What do we take from this? Happiness is not quadratic in age. Maybe wellbeing is, but I don't think that's in this dataset. And I can't made diddly out of any difference between top scores and the average. I'm sure there are plenty of refinements that people could make: there's great educational attainment data and other stuff in there that could better identify nerds, and maybe there is something in there that's closer than happiness is to well-being. But I didn't have more than 10 minutes to play with it.

Quick and simple real data test of a plausible-looking hypothesis. And there are plenty of reasons why splitting happiness data up by education, or vocabulary, and by age would be of more serious research interest.

Here in New Zealand, getting access to the Confidentialised Unit Record File from Statistics New Zealand would take a 3-week application process and signing a pile of forms and promising to delete the data when done with it and making sure the data is stored securely and checking with them before publishing and results from it.....

No comments:

Post a Comment