## Monday, 8 February 2010

### Youth rates revisited

My prior post noted the big jump in youth unemployment rates since the abolition of the separate youth minimum wage. Let's go back to this briefly.

If we assume that the youth rate will always be some fixed amount above the adult rate, then the current run-up, as I noted earlier, is highly anomalous and seems very plausibly explained by the minimum wage change.

Some folks reckon the better measure is the ratio: the youth unemployment rate will always be some multiple of the adult rate. If you measure the ratio of the two over time, the current ratio is high, but there isn't an obvious break point in 2008.

The graph below has (thanks Stephen Hickson!) the unemployment rate for those aged 15-19 and the unemployment rate for everyone else (aged 19 and up). It looks to me like the proper relationship is a combination of a level shift and a multiplicative effect. When the adult rate is very low - below four percent or so - the youth rate bounces around at a point about 10 to 12 points higher than the adult rate. When the adult rate is high, the youth rate exceeds that constant by a multiple of the adult rate.

As always, I take this kind of thing over to Stata to find out what's going on. First, let's rule out that what we have going on is only a level shift or only a multiplicative effect. I run ordinary least squares with the youth unemployment rate (15-19 year olds) as dependent variable and the adult rate (20 and up) and a constant as independent variables.  If it's just a level shift, the coefficient will be significant, close to 1 in magnitude, and with a significant constant term around 10. If it's just a ratio effect, the constant will be insignificant and we'll have a coefficient somewhere around 3.

Both the constant and the adult rate come up highly significant. So, over the period 1986 to present, we can expect the youth rate to be 1.44 times the adult rate (the multiplicative effect - about 44% above the adult rate) plus a constant of 9 percentage points. So if the adult rate is 5, the youth rate should be 16.2. We've ruled out the "it's just ratios" argument - there is a constant term in there; we've also ruled out that it's just a level shift because the coefficient is significantly greater than 1.

Moreover, when we plot the residuals, we find something pretty interesting.  Recall that the residuals are the difference between the model's expected youth unemployment rate and the actual youth unemployment rate.  A positive residual means that youth unemployment was higher than the model predicted; negative means it was lower.

If we look at the top graph, we see youth unemployment rates went up a lot during the recession of the early 1990s. But over that period, youth unemployment rates were never more than a couple of points above what the very simple model predicted (residuals graph, above). In recessions, it does look like the youth rate gets hit harder than the adult rate. But look at what happens starting around fourth quarter 2008. We now have residuals that blow up the model. Something really weird starts happening to the youth unemployment rate at the end of 2008. Youth unemployment is now about 10 points higher than we'd expect using the simple model. Again, the residual here is telling us that the current youth unemployment rate is about 10 points higher than would be expected given the prior relationship between the youth and adult unemployment rates.

I tried a few different variations allowing the constant and the slope to shift for high and for low levels of adult unemployment.  But none of that made any substantial difference.  Putting in a variable allowing the slope and constant to vary with regime (youth rate or no youth rate) made a big difference, but you'd of course expect that given the residuals plot above.

This remains very much a first cut: something I may someday assign as an honours project for more thorough sorting out.  The econometrics here are very simplistic and do nothing to account for differences in labour force participation rates or the obvious problem of serial correlation in the time series data.  But the simple model is still pretty telling.  If we allow youth unemployment rates to vary both as a level shift above the adult rate and as a multiple of the adult rate, which is what we're doing when we run the simple regression with a constant term, we still have a jump in the current youth unemployment rate that is well above that seen in prior recessions.

My first cut explanation remains the abolition of the youth minimum wage.

1. Couple of criticisms:

The numbers on sickness & invalid benefits have increased significantly over last decade, now approx. 80% higher. These need to be added into the general unemployment rate.

The number of 15 - 19 yo.s entering the job market is not constant. Due to NZs demographics we had a low number of them entering the market during the 1995 - 2005 period; followed by higher numbers entering after 2005. Also we had a glut in 1990.

Correcting the graph for these will reduce the ratio in 1990, increase the ratio in 1995 - 2005 and reduce the peak in 2009. Basically make it look more like a slope and less like a hockey stick.

2. This is Household Labour Force Survey data, not numbers of unemployment beneficiaries. But you're right - somebody on long term benefit is unlikely to be listing himself as in the labour force looking for work. I'd be very surprised if correction for either of those would remove the big kink at end 2008 though. It might be a bit smaller in magnitude, but unless those changes hit with particular force end-2008, it won't change the inflection point and won't do much to the slope. From the other side, what's happened to DPB rates for youth? It'll have the same effect but in the opposite direction of your corrections, right?

3. Young mothers are not likely to participate in labour market, so whether they are on DPB or not seems irrelevent.

If a shift was used to progressively hide unemployment (compared to that acknowledged pre-1999) in the sickness/invalid lists. The 1990 recession (youth rate 23%, total 10%) compares to the current recession (26%, 6%), but if you correct the later figure for the 50,000 or so who have been moved to the sickness/invalid our current recession becomes much more comparable (26%, 6% + 2%).

And on to the demographics, its not a constant supply of new entrants.

Numbers of 15 - 19 yo.s in NZ:

1986 ~ 300,000
1991 ~ 290,000
1996 ~ 270,000
2001 ~ 280,000
2006 ~ 315,000
2011 ~ 320,000 (projected)

4. DPB recipients are unlikely to be in the labour market in the same way that sickness/invalid beneficiaries are unlikely to be in the labour market.

Would those 50K or so be in the labour force absent being on those benefits? All of them, or just some? Are they basically the group that would have been exempt from welfare work requirements in the US because of large numbers of barriers to work?

If your demographics story is the right one, then the kink in the curve should have come at 2006, not at end-2008. Numbers look stagnant over the relevant period....

5. There is significant age based disparity in job market participation across 15 - 19 year olds that perhaps induces lag...

6. If the bulk of workforce entry hits at age 17, it would require that a huge blip of 17 year olds hit the workforce end '08.

7. A recession occured in 2008.

The normalisation is flawed. 2006 data shows a 12.5% increase in 15-19 yo.s and you are inferring this to be insignificant. However you do find a derived constant of 9% of the same 15-19 yo.s to be "highly significant" and use this to justify your conclusion. 12.5% > 9%.

8. Look at that residuals graph again. If your story were right, we'd expect the residuals to be tracking all over with the changes in the 15-19 age group. So we should have a dropping residual from '86 to '96, then increasing slowly to '01, then the jump to '06 and leveling off. Instead, it rises sharply from '86 to '94, levels off through '00, then lots of noise around a zero mean through '07, then slight rise before the big spike end '08.

Why does the dropping proportion of 15-19 year olds '86-'96 not result in a drop in the residual?

There are a billion things that a more thorough analysis could and ought to correct for. But eyeballing the path of the residuals doesn't suggest that the age-cohort numbers is a big omitted variable problem.

9. Why does the dropping proportion of 15-19 year olds '86-'96 not result in a drop in the residual?

What? You mean its a straight line when it comes to justifying your argument, but a complex, subtly, variable dataset when useful in dismissing demographic change? Cool.

Yes, if a residual model of employment market is accepted, then incorporating demographics predicts a slight decrease followed by an abrupt spike. Which I challenge you to find less accurate a prediction than a straight line.

There is one only significant increase in the size of the 15-19 yo. demographic over the observed period which (with correction for lag) falls bang on the only significant spike in youth unemployment. Incorporating demographics makes a better model.

10. BTW - from the other posting, I don't see a spike in 1994.

11. Think hard about what a regression residual means and what omitted variable bias looks like in a plot of residuals.

I don't know what you mean by "residual model of unemployment". Again: the plot above shows the difference between the actual youth unemployment rate and the one that's predicted by a simple model that uses only a constant and the adult unemployment rate.

Simple example of what a serious omitted variable problem would look like: Suppose that, for whatever reason, youth unemployment would jump up in any year that ends in the number 8, and I didn't correct for years ending in 8 in my model. We'd then expect a big positive residual in any year that ends in 8. Suppose that we see a big jump at the end of 2008 and someone said "Aha! That's just because it ends in 8 and everyone knows that it goes up in years ending in 8!" But if we don't see the big spikes up in the residual in other years ending in 8, then that probably isn't something that's really causing a big omitted variable problem surrounding years ending in 8 (and the posited "8-related unemployment hypothesis" is likely false). There could of course be two omitted variables, with one cancelling the other out in all of the other "8" years, but that's less likely than that it's just not that big a problem.

From your numbers, the 15-19 population group drops by 30K over the period 86-96. Over that same period, the residual rose considerably. So if there's an omitted variable bias induced by population, it's suggesting that over this period, dropping population in that age cohort correlates with increased youth unemployment. From the 2001-2011 period, your numbers suggest a 50K increase (plausible that the increase to end 2009 is about the same magnitude as the prior drop) and saying that that's what's causing the current very high positive residual.

I'm suggesting that it makes no sense at all to expect that the omitted variable problem, if there is one, swiches sign half way through the time series. It's worse than the "missing 8 variable" illustration above: it's as though the prior 8s were associated with negative residuals rather than positive ones.

If the residual were declining rather than increasing from 86-96, I'd go back and re-run the regression - I'd then suspect potential omitted variable problems. But as it's increasing over that period, I can't see it being the cause of major concern (though you should feel free to go and run your own regressions if you feel otherwise).

12. I mean that the residual constant and multiplication you apply presents a false picture. Therefore decrease in supply 86-96 cannot be considered an inverse analogy to the increase 01-11.

If the market is near or over saturation a decrease in supply will have little and an increase in supply will have great effect on unemployment. The period 86-96 is one of increasingly high total unemployment, which suggests a high degree of saturation.

For analogy sake imagine there were a car company call Chysler that make plasticky trash product and in 2008 the bottom fell out of the car market. Now Chysler have always had an inventory problem so they had derived a formula based on an industry 20-yr average inventories multiplied by a factor and with an add-on constant to define the scope of the problem. So when recession hits Chysler turn to the man who devised the formula and ask him "What should we do?" and the man says "Don't cut production, because according to my formula last time there was a market slow down and you cut production your inventory increased." The man points to graph showing increased inventory during previous slowdown, as clear and unequivicol proof that a cut in production resulted in an increase in inventory. The man (citing some sweet omitted variable theory) determines that a cut in production led to an icrease of inventory and therefore to solve the current problem says "Chysler must increase production".

13. I am rather convinced you're wrong. But I encourage you to get a decent statistical package, get the data, and show me otherwise.