The Impact of Vocational Training for the Unemployed in Turkey: an inside look at my latest paper

My latest working paper (joint with Sarojini Hirschleifer, Rita Almeida and Cristobal Ridao-Cano) presents results from an impact evaluation of a large-scale vocational training program for the unemployed in Turkey. I thought I’d briefly summarize the study, and then discuss a few aspects that may be of more general interest.

The studyIntervention: vocational training programs, averaging 3 months in duration, covering a wide range of vocations, with accounting and computer skills the most common. The courses are offered by a mix of private and public providers. The Turkish Government dramatically increased access to these training programs from 32,000 individuals trained in 2008 to 214,000 in 2010 and 250,000 in 2011. They are available for free to the unemployed, who also receive a small stipend to attend.Study population: Unemployed individuals who applied for 130 of these courses offered throughout Turkey between October and December 2010. The average applicant is 27 years old, and approximately 73 percent of them have completed high school. Sixty-one percent of the evaluation sample is female. 63 percent have worked before.Sample size: 5,902 applicants – 3,001 allocated to Treatment, 2,901 to control (randomization at the individual level within course).Take-up rate: 72% of the treatment completed training, 69% got a certificate; 3% of the control took a training course.Follow-up: Follow-up survey at 1 year, with 6% attrition; administrative data linked to the social security records allows us to trace impacts on formal employment for up to 3 years post treatment.

Results: The average impact of training on employment is positive, but close to zero and not statistically significant, although there is a weakly significant impact on formal employment/job quality: the figure below shows treatment impacts on key employment outcomes (along with 90 percent confidence intervals). The program results in a 2 percentage point increase in the likelihood of working at all one year after training, with a 95% confidence interval of [-0.5, +4.4] percentage points. This is relative to a control mean of 49.4 percent working. There is a 5.5 percent increase in labor income, which is also not significant. The impacts are similar in magnitude when we consider formal employment and formal income, although these impacts are significant at the 10 percent level. The impact on an aggregate index of all labor outcomes (including also hours worked, job quality, being employed at least 20 hours a week) is also positive and significant at the 10 percent level, but relatively small in magnitude.

When we examine heterogeneity in impacts, impacts are stronger (and statistically significant) for courses offered by private (rather than public) providers – and this continues to hold even after controlling for observable differences in the types of courses offered by private providers, and in the characteristics of people applying for these different courses.

Using administrative data, we see that after three years there are no significant impacts on formal employment or formal earnings (about 80% of employment is formal).

Overall then, these results suggest relatively modest impacts of vocational training (see below for whether we should think of them as small or large).

General points of interest to people working on Impact Evaluations
There are a few things we do in the paper and issues that come up that I thought would be of special interest to people working on impact evaluations.

Use of a pre-analysis plan: This is the first impact evaluation I’ve completed that has a registered pre-analysis plan. It was a useful roadmap for the paper, and I don’t think it restricted us too much. We found stronger effects for private providers, and so then explore in the paper why this might be, explicitly noting that these explorations weren’t pre-specified. The one thing that I still haven’t completely figured out is what to do about specifications in your pre-analysis plan that are predicated on you finding a treatment effect in the first place, and then building on this to see if in turn affects other outcomes. We said we would look beyond labor market outcomes to see if higher employment in turn affected subjective well-being, mental health, household expenditure, and empowerment – but of course since we find very little in the labor domain, there is very little reason to expect to find the changes in these other domains that might come about from higher labor incomes. We report some of these (insignificant) results anyway, and footnote others, but perhaps the answer is to be a bit more conditional in the pre-analysis plan – (e.g. if we find an impact on X, we will also look to see if there are impacts on Y and Z, but if we don’t find the impact on X, we won’t look/report at impacts on Y and Z).

Randomizing into treatment, control, and wait list: we had course providers select at least 2.2 times as many people as there were spaces – and then randomized into three groups – T, C, and waitlist. It is common in many training interventions for providers to be paid according to number who show up – so that if not all the treatment group show up, they scramble to fill the spots with others – by giving them a waitlist group they could use for this purpose, we avoided having the control group getting pulled into treatment and therefore having our power reduced.

We had 173 individuals apply to more than one course. We control for multiple lotteries following the approach suggested in this old blog post.

Comparing results to expectations: it is always tough to know ex post whether any effect you find should be considered small or large. In this paper we provide one way to benchmark results- by comparing to the expectations of the applicants, and to those of the policymakers in charge of the program. We asked the applicants at baseline what they thought would be the percent chance they would be employed in one year if they received the program, and if they didn’t receive the program; and similarly we asked staff at ISKUR, the Turkish employment agency what they thought would be the employment rate of the control group, and the additional likelihood of being employed for the treatment group.

Expectations were reasonably accurate for the control group status: the control group thought they had a 31 percent chance of being employed without the training, in reality it was 36 percent (working 20 hours + a week).

Both applicants and policymakers dramatically overestimated the treatment effects: applicants thought they would be 32.4 percentage points more likely to be employed; ISKUR staff thought they would be 24.3 percentage points more likely to be employed – whereas our LATE estimate is 1.9 percentage points. So the effect sizes are much smaller than those offering the course and those applying for it expect.

5. The not completely realized promise of admin data: we were excited about the possibility of using administrative data from the social security records to track the trajectory of impacts. However, it took more than 2 years to get data from this due to them working on upgrading their systems and other delays, and in the end we didn’t get everything we wanted. But the data is still incredibly useful – both for tracking the trajectory of impacts, and also for allowing us to check that there is no selective attrition on outcomes – we find those who don’t respond to the follow-up survey don’t have different employment status from those who are employed.

Comments

'The Iron Law of Evaluation: “The expected value of any net impact assessment of any large scale social program is zero.”

The Iron Law arises from the experience that few impact assessments of large scale2 social programs have found that the programs in question had any net impact. The law also means that, based on the evaluation efforts of the last twenty years, the best a priori estimate of the net impact assessment of any program is zero, i.e., that the program will have no effect.

The Stainless Steel Law of Evaluation: “The better designed the impact assessment of a social program, the more likely is the resulting estimate of net impact to be zero.”

This law means that the more technically rigorous the net impact assessment, the more likely are its results to be zero - or not effect. Specifically, this law implies that estimating net impacts through randomized controlled experiments, the avowedly best approach to estimating net impacts, is more likely to show zero effects than other less rigorous approaches. [pg5]

The Brass Law of Evaluation: “The more social programs are designed to change individuals, the more likely the net impact of the program will be zero.”

This law means that social programs designed to rehabilitate individuals by changing them in some way or another are more likely to fail. The Brass Law may appear to be redundant since all programs, including those designed to deal with individuals, are covered by the Iron Law. This redundancy is intended to emphasize the especially difficult task in designing and implementing effective programs that are designed to rehabilitate individuals.

The Zinc Law of Evaluation: "Only those programs that are likely to fail are evaluated".

Of the several metallic laws of evaluation, the zinc law has the most optimistic slant since it implies that there are effective programs but that such effective programs are never evaluated. It also implies that if a social program is effective, that characteristic is obvious enough and hence policy makers and others who sponsor and fund evaluations decide against evaluation."