Yesterday the National Education Policy Center (NEPC) released a critique of the notorious analysis of teachers’ value-added published by the Los Angeles Times last summer. (For the original story see here) For the NEPC critique, see here). Two professors at the University of Colorado re-analyzed the data on which the Times’ report was based, and concluded that the Times’ research was “demonstrably inadequate to support the published rankings.” This apparently straightforward statement conflates two different claims, which in honesty ought to be treated separately.
The first of these is an argument about research. The re-analysis of the Times data under somewhat different assumptions produced results that departed in significant ways from the results published by the Times. Most notably, a significant number of teachers rated “effective” by the Times researchers were rated “ineffective” by the Colorado scholars, and vice versa. In addition, the learning gains produced by about half of all teachers in the sample could not be statistically differentiated from the gains produced by the “average” teacher at the 95 percent level of confidence.
To scholars who have paid attention to the growing body of research on value-added assessment this is tediously familiar. Estimates of teachers’ value-added have been shown over and over again to be highly sensitive to model specification. Moreover, even the most sophisticated models produce results that at best clearly distinguish teachers at the very top and very bottom of the effectiveness distribution while leaving most teachers in the undifferentiated middle.
From a scholar’s point of view there is no surprise in the NEPC findings, and no grounds at all for casting aspersions on the quality of the Times research or the integrity of their research team. Of course the Times research has “serious weaknesses”—so has the research conducted by the Colorado team, and so has every other piece of research conducted on teachers’ value-added. Different scholars adopt different assumptions, and different assumptions produce different results. This is hardly news.
The second claim is an argument about policy, and its relationship to research. Given the fragility of the Times’ estimates the question whether they provide sufficient support for the rankings that were published in the newspaper has an easy answer: they do not. But this begs some further questions.
One of these would ask whether research on teachers’ value-added will ever produce results that support perfectly stable and reliable rankings of individual teachers. A second might ask whether the Colorado scholars commissioned by NEPC would endorse the use of their own research findings to rank LA teachers. If the answers to these questions are negative then the NEPC critique leaves us face to face with an ugly syllogism: “Imperfect research should not guide policy. No research is perfect. Therefore, research should not guide policy.” I don’t ascribe this position to NEPC, but it follows directly from their critique of the Times.
There is another way to think about the problem, however. Instead of asking whether teachers’ value-added scores are a perfectly valid and reliable indicator of teacher performance, as NEPC does, we might instead ask what kinds of imperfect information should be brought to bear in making judgments about teacher effectiveness. As we know well, data on teachers’ credentials and experience offer almost no useful information about their performance, and reports from “drive-by” observations by overworked administrators offer little more. Under these circumstances, the flawed and fragile information we obtain from value-added assessments may offer a useful complement and corrective to the imperfect information we obtain from other sources. (For a quick review of many of the issues associated with value-added assessment, see here).
Needless to say, this does not mean that teachers’ value-added scores should be published in the Los Angeles Times. Ranking individual teachers by name in the newspapers on the basis of a single number was and remains a bad idea for many reasons, but the quality of the research on which the rankings were based is not among them. NEPC’s critique misses the point.