Thompson: Tunnel Vision in Evaluating D.C.s IMPACT Evaluations

SpeedTraffic control experts know that placing a police car next to a highway will change drivers' behavior for the better or, if it causes a dangerous traffic jam, for the worse.  Nobody doubts that speeds will change at least temporarily when drivers approach a police officer. Few people would assume that the driving behaviors of persons passing a cop are representative of their behaviors before the officer was sighted.    

When Thomas Dee, Brian Jacob, and Justin McCrary, in Manipulation in the Grading of New York’s Regents Examination, studied students who were just below the passing threshold, these students were said to inhabit the “suspect region.”   The assumption was that the educators who graded their examinations were tempted to inflate their scores. They estimated that 3 to 5% of those who were within the passing threshold should have been scored below passing. Dee et. al concluded that the pattern was due to “pervasive manipulation of student test scores.”

When Dee and James Wyckoff, in Incentives, Selection, and Teacher Performance:  Evidence from IMPACT, studied teachers who were just below the threshold, they found the same pattern. When those teachers scores jumped by more than teachers outside their suspect region, it was argued that teachers “undertook steps to meaningfully improve their performance.” (emphasis mine) This was interpreted to mean that IMPACT worked.

Both studies show the same thing.  When stakes are placed on words and numbers written on a sheet of paper, those words and numbers change.  There is no more evidence in the IMPACT study by Dee et. al of meaningful improvement than there was in his study of New York Regents results. The far more likely explanation is that the game was played in both cities in precisely the same way that common sense indicates that it would be played.

New Yorkers have always known that test score inflation is inevitable when teachers grade their own students’ tests.  Who would claim that the 6,412 History and Geography students who scored around the passing mark of “65” had essays that were ‘meaningfully” better than the 395 who scored “64?” So, surely, it was equally predictable that the teachers and evaluators in D.C. would find a way to nudge a significant number of “Minimally Effective” teachers on the threshold of being fired into “Effective” territory. 

No, I am not accusing the individual researchers of anything untoward. They have just confirmed that “to a man with a hammer, every problem looks like a nail.” The edu-politics surrounding their research was pointing towards the tightening of accountability on all fronts, and tunnel vision resulted.

Schooling, like the making and enforcement of traffic laws, is a political process. As anyone who has evaluated students knows, testing and grading is a political process.  As anyone who has been subject to a performance evaluation surely knows, it is a political process. The question is whether the give and take is constructive or destructive.  Those who impose risky and expensive experiments like IMPACT have the burden of proving that the resulting changes are likely to be beneficial. They have shunned that responsibility.
For those who believe that  Incentives, Selection, and Teacher Performance documents meaningful improvements in teaching and learning, I have a Regents exam with a grade of "66" that I'd like to sell you.-JT(@drjohnthompson) Image via.



