Is it reasonable to expect that accountability sanctions by themselves should have any significant impact on pupil performance?

Since the 1980s, the use of accountability regimes has grown significantly worldwide. Theoretically, accountability could be one way to make up for the lack of extrinsic incentives to improve outcomes among schools by using carrots and sticks. On the other hand, accountability may have deleterious unintended effects, examples of which include decreasing intrinsic motivation among school staff to do their jobs well, shifting focus to pupils on the borderline of different grade boundaries, excluding poorly performing pupils from accountability assessments, and outright cheating. In a new study, Economists Thomas Ahn and Jacob Vigdor investigate the effects of the largest, and possibly most contentious, accountability programmes worldwide: the No Child Left Behind Act of 2001 (NCLB). NCLB required states to evaluate performance among schools receiving Federal Title 1 funding, distributed by the federal government to schools with high proportions of disadvantaged pupils. If performance falls below a certain established threshold, schools are threatened by a string of sanctions, which begin to kick in if schools perform under the threshold for a second consecutive year. These become more draconian for each year the school remains underperforming. Using data from North Caroline, the authors analyse the impact of different threats and sanctions on pupil performance. Since threats and sanctions are only triggered if schools fall below a certain threshold, they can compare those very close to the threshold on each side. This means that the schools are very similar and that the impact of accountability sanctions can be estimated. Overall, pupils attending schools that barely missed the threshold improve slightly more in mathematics than pupils attending schools that just met the requirements. The effect size amounts to about 0.02 standard deviations (SD), which is equivalent to about 2 PISA points. However, there is no effect on reading scores. The problem is that these results lump together all different threats and sanctions that are triggered when schools fall under the threshold, and it is therefore impossible to distinguish differential impacts. And when the authors investigate the impact of specific sanctions and threats, a more nuanced picture emerges. Among the schools that fall under the threshold for the first time, the effects amount to about 0.05 SD in mathematics, when threatened by the fact that pupils will be allowed to transfer to other, non-failing public schools in case the school falls under the threshold the year after as well. When the first threat instead is mandatory tutoring, the impact is about 0.03 SD. In reading, the effect size is about 0.02 SD regardless of threat type, but it’s sensitive to the specific bandwidth around the threshold that is used – it is not always statistically significant. But when analysing the actual exposure to sanction rather than threat, there is no impact at all, apart from the two final sanctions: schools that fail to reach the threshold a fifth time in a row are required to implement a restructuring plan – with the last threat being the actual implementation – which has a negative impact on reading performance, amounting to about 0.06 SD. There is no impact in mathematics. At the same time, the final penalty – the actual implementation of the restructuring plan – which applies to schools that fail to reach the threshold six consecutive years, has a positive impact on pupil performance in both subjects, amounting to 0.03 SD in reading and 0.05 SD in mathematics. The authors also provide evidence that restructuring increases the likelihood of staff turnover significantly, suggesting that part of the effect is likely be due to this channel. The positive effects in mathematics are generally concentrated among lower-performing pupils, suggesting that schools do start focusing more on these when faced with a sanction. Indeed, schools also start to focus on pupil sub-groups that caused them to miss the threshold. However, the effects are never negative among highly performing pupils, which in turn suggests that the gains among low-performing pupils are not occurring at their expense. In reading, on the other hand, effects are very similar across the board, but rarely statistically significant. Again, the largest exception to this is the impact of the final punishment, the implementation of the restructuring plan, which has similar effects across the board in mathematics (and to some extent also reading). Overall, therefore, it is clear that NCLB has had very small positive effects on pupil achievement overall in North Carolina, although the final punishment, restructuring, does have a more consistent and larger positive impact. At the same time, lower-performing pupils tend to gain the most, while high-performing pupils do not lose out. The reason might be that North Carolina has its own accountability system, which also focuses on improvements in pupil test scores, rather than their absolute achievement, which gives schools incentives to focus on high-performing pupils as well. While the study provides important new findings in the accountability literature, it does not deal persuasively with potential gaming that could inflate the scores. For this reason, we cannot be sure that the small positive effects that did occur are not due to different forms of gaming. Nevertheless, it is intriguing that the most clear positive effects are found when schools are forced to actually implement a restructuring plan, which is often accompanied with staff turnover. This supports prior findings in the literature, which suggests that school turnaround does not occur unless new teachers and headteachers are brought in. It is also interesting that the threat, but not the actual sanction, of giving pupils choice is enough to improve schools more than most other threats and sanctions. If we ignore potential gaming, the policy implications are that accountability sanctions might improve lower-performing pupils’ scores somewhat, without hurting high-performing pupils (at least with the proper design), but that it might be better just to require restructuring than other elaborate sanctions that have little impact on their own. Since choice threats, but not actual implementation, have positive effects on results, it is also plausible that schools reacted to potential competition, which then never materialised. One way to improve this mechanism is to increase the number of choices available to pupils within a reasonable distance, for example by making it easier for private providers to enter the market. At the same time, the effects are hardly transformative, so there is little reason to expect accountability by itself to push schools to produce higher achievement; hoping to radically improve poorly performing schools via accountability does not appear to be realistic. Gabriel H. Sahlgren This comment piece is also the Editor's Pick in the CMRE Monthly Research Digest (October 2014). The piece reviews a paper by economists Thomas Ahn and Jacob Vigdor, 'The Impact of No Child Left Behind's Accountability Sanctions on School Performance: Regression Discontinuity Evidence from North Carolina', NBER Working Paper No. 20511.

