Test Scores Key to Evaluating Teachers

We can’t keep pretending that all teachers are the same.


One of the largest debates raging in education centers on the question of whether teachers can be rated effectively by the test score gains of their students. In wonky parlance, “value-added” data attempts to look at students’ baseline test results, then measure again after a year with a particular teacher.

The idea is to measure the teacher effect – essentially, to strip out any disadvantage that comes from teaching kids who start the year farther behind by measuring the gains of the kids rather than the absolute results.

The arguments against the use of this data range from the intentionally misleading (using test scores will discourage teachers from working with poor kids) to the weak (we all hate standardized tests!) to the more interesting (value-added scores are too volatile and don’t align with good teaching).

Friday’s release of preliminary results from the Gates Foundation’s Measures of Effective Teaching project  strikes a blow against value-added opponents’ strongest arguments. The Foundation’s researchers noted several interesting findings including the general consistency of teacher results from year to year, and the alignment between student perceptions of their teachers (measured through surveys) and the value-add data of the teachers.

In the New York Times, Sam Dillon wrote about the student survey finding:

Teachers whose students described them as skillful at maintaining classroom order, at focusing their instruction and at helping their charges learn from their mistakes are often the same teachers whose students learn the most in the course of a year, as measured by gains on standardized test scores, according to a progress report on the research.

Education Week’s Stephen Sawchuk focused on the consistency of teacher effects from year to year:

The analysts found that, in every grade and subject studied, teachers’ value-added histories were strongly predictive of their performance in other classrooms. While they found a degree of volatility in the estimates from year to year, that volatility “is not so large as to undercut the usefulness of value-added as an indicator of future performance,” the study says.

None of this suggests that teachers should be measured solely through test scores. If anything, it suggests that there may be a number of additional useful tools – from student and parent perceptions to classroom observations looking for particular teacher actions – that may be excellent ways of measuring performance.

But consistency is the hallmark of effective evaluation in any field, and if test score gains show a reasonably high level of consistency – for the same teachers over the years, and with other ways of analyzing effectiveness – then tests have to be part of the equation.

This matters enormously as the nation strives to improve its lackluster public education system (see last week’s international benchmark results if you have any doubt about our mediocrity relative to the rest of the world). We can’t keep pretending that all teachers are the same, and that great teaching is an unknowable art form that can’t be measured (and studied, and improved, and rewarded, and – you get the point).

In a highly partisan world, there is growing bipartisan support for using data to inform decisions about how to better educate our kids. And not a moment too soon, when you see how the kids in China are positively kicking our kids’ collective behinds.

  • Check out our gallery of Whispers political caricatures.
  • Check out this month's best political cartoons.