Thompson: Re-Evaluating the Gates MET Study
Dana Goldstein’s remarkable contribution to the AEI conference on edu-philanthropy, Paying Attention to Pedagogy while Privileging Test Scores, starts with the reminder that (except for Education Week) little of the MET’s media coverage “explained the study’s key methodology of judging all modes of evaluating teachers based on whether they predicted growth in state standardized test scores.”
Neither did the media typically point out that the foundation advocated for the use of test score growth in evaluating teachers before it launched the MET. Legislation requiring the use of student performance was "driven, in part, by close ties between the Gates Foundation and the Obama administration.”
Goldstein thus asks the question that too few have uttered:
How is research received by scholars, policymakers, and practitioners when the sponsor of that research—and political allies including the president of the United States—have already embraced the reforms being studied? And is anyone paying attention when the conclusions of such research appear to contradict, or at least to complicate, some of the core assumptions of that reform agenda?
Goldstein’s narrative is consistent with the equally great analysis of Sarah Reckhow and Tompkins-Stange which placed the rise of Gates’ advocacy and the pressure for value-added evaluations within the context of “the organizational food chain,” and how changes in the status of their policies can be “ascendant and rapid.”
The outcomes produced by the previous Gates small school experiment had been “a disappointment to the resolutely data-driven” organization, and the stars were aligned for a dramatic edu-political push. Reformers like the Education Trust had been pushing for incorporating test score growth into teacher evaluations. And, despite the unproven nature of their claims for value-added evaluations, VAMs represented a ready-made, though untested, tool for advancing a teacher quality agenda.
The MET was under a similarly hurried schedule, with director Tom Kane promising a completed project in two years.
Goldstein explains, however, how the after-the-fact MET did not study the policy issues that would (should?) have informed a more deliberative process regarding value-added evaluations. Unlike schools in the real world, the MET randomly assigned students to teachers, and even in this experimental format it was hard to keep up the randomization. It studied a student sample that did not resemble the demographics of high-poverty schools. The MET also found that “only 14 to 37 percent of the variance in classroom observation scores was due to persistent differences among teachers.”
Had the study been conducted before evaluation laws were changed, it could have been a powerful argument against value-added evaluations.
Perhaps most importantly, the MET study was conducted under low-stakes conditions. Goldstein thus notes, “The MET study is therefore not a test of the high-stakes teacher accountability policies that have been rolled out across the country during the Obama era.”
The Gates Foundation MET Project: Paying Attention to Pedagogy while Privileging Test Scores is exceptionally astute in the details about what was and what was not learned about classroom observations. Goldstein describes intriguing hints related to a likely flaw in trusting “multiple measures” to offset value-added’s bias against teachers in high-challenge schools; Goldstein comments “teachers with poorer students consistently earned lower observation scores, which seemed to call into question the entire premise that observation is fair.”
Being an inner city teacher, I would add that flaws in observations reinforce the argument against the use of flawed test score growth measures in teacher evaluations. Often, teachers in high-challenge schools would face a double shotgun blast from two metrics that are unfair in schools where peer pressure washes out the effects of good instruction. Inner city conditions can also force teachers to use pedagogies that emphasize control, even though they are less effective instructionally.
Too often, I believe, we teachers opposed to corporate reform criticize education experts who work with reformers. Goldstein shows, however, that Robert Pianta, Charlotte Danielson, and other experts “helped move MET toward a conception of using evaluation not just to reward and punish teachers, but also to help them improve.” Sadly, their partnership did not change VAM’s “privileged” position within the study’s methodology.
Goldstein addresses the two major critiques of MET. First, “the study’s data does not support the Gates Foundation’s strong support of value-added measurement as the "privileged" tool in evaluating teachers. She cites Jesse Rothstein’s and William Mathis’ arguments that high value-added scores on state tests were only weakly correlated (less than 0.5 std) with student growth on more rigorous exams. Moreover, they show that “different ways of evaluating teachers will yield very different rankings, that none are clearly better than the others.”
Second, Goldstein reminds us, “the very questions the Gates Foundation sought to answer limit policymakers’ conceptions of how to improve student achievement.” The opportunity costs of such a focus on the Gates vision of teacher quality include a shift in attention away from the promise of Dan Willingham’s work on cognitive factors, Richard Rothstein’s proposal for addressing segregation, and the insights of the Consortium on Chicago School Research. She observes that Anthony Bryk and Barbara Schneider show that “’relational trust’ is an essential component of improving student test scores.”
It is hard for me to conceive of a high-stakes value-added regime that does not undermine trusting relationships. I still find it hard to understand why this dilemma wasn't obvious to the Gates Foundation. Dana Goldstein helps make sense of this oversight. - JT(@drjohnthompson)