About this blog Subscribe to this blog

Thompson: What Does New Orleans Test Score Growth Really Mean?

Let’s recall the excitement in 2007 when Bruce Fuller, Katheryn Gesicki, Erin Kang, and Joseph Wright published Is the No Child Left Behind Act Working?  Fuller et. al showed that NAEP test score growth had largely declined after NCLB took effect, but states reported huge gains on their standardized tests. Oklahoma, for instance, posted a 48 point gap between its 4th grade reading NCLB scores and its NAEP results. After NCLB, the state’s 4th grade reading scores increased 2.3% per year while its NAEP results dropped by .3 per year.

Fuller’s blockbuster was a definitive indictment of the reliability of state NCLB test scores; it even got the test-loving Education Trust to question whether bubble-in accountability was working. It seemed like it was only a matter of time before testing received a unanimous verdict as guilty of being a hopelessly misleading metric. I thought the idea that state test score growth, during an age of test-driven accountability, could stand alone as evidence of increased learning would soon be discredited. 

While I must emphasize how much I admire the work of Douglas Harris, I’m dismayed by one passage in his report on the New Orleans model of reform, The Urban Education of the Future?. I’ve got no problem with Harris et. al reporting that New Orleans increased student performance, as measured by Louisiana’s embarrassingly primitive state tests, by .2 to .45 std. It is a scholar’s responsibility to report such data. However, why would Harris speak as if he assumes that those numbers mean anything? They might mean something or they might not, but certainly they don’t provide evidence that New Orleans portfolio model has increased student performance more than early education would have. 

Even when they are valid, test scores measure a narrow band of skills and knowledge.  They rarely or never reveal what information was retained by a student, or what went into one of a student’s ears and out the other. Neither are NCLB-type test scores likely to say much about whether any alleged learning was meaningful. So, I have been searching for a metaphor to illustrate why test scores, alone, during a time of test-driven accountability, can’t be used to argue that a pedagogy that focuses on raising objective outputs is more effective than early education or any other approach to holistic learning. 

NFL running backs share a lot of athletic skills with their counterparts in rugby. So, what would we say about a quantitative analysis estimating that football halfbacks are .2 to .45 std more effective in racking up the metrics (yardage, scoring etc.) on NFL fields than Australian rugby runners would be in competing in the American game under our rules and referees? Wouldn’t the response be, Well Duh!?

Of course, such if such a study used the word “impressive” to describe our achievement relative to our biggest competitors, that would allow sports announcers, who were brazen enough to do so, to repeatedly praise the impressive achievement of our running backs. But, few people who really understand football would give credence to the quantitative analysis. 

The same pattern may apply with education experts. Researchers at the ERA conference offered plenty of reasons why there is less than meets the eye to NOLA gains. When that much money and energy is devoted to increasing a single, primitive metric, those scores can’t be assumed to be a measure of real, enduring, life-changing gains. More likely they also reflect the shamefully low pre-Katrina starting point, post-Katrina demographic shifts, the nation’s 3rd highest rate of young people out of school without a job, curriculum narrowing, and a focus on test prep and remediation that doesn’t prepare kids for college or life.

But, will corporate reformers cherry-pick Harris’s words, spin them and characterize the ERA findings as saying that the portfolio system works, and suggest it can now be done on the cheap? 

Getting back to the ERA's explanation of the possible magnitude of the possible learning gains, I worry that it might reflect an unfortunate change in school cultures over the last decade. By ignoring the history of bubble-in testing, we condemned ourselves to repeat it. In this environment, researchers - even objective ones - can't ignore the questions that are important to their clients. If systems no longer reject teach-to-the-test as an ignoble practice, they will contract with scholars to push their data as far as it can go and try to determine how effective they have been in raising test scores.

When I started teaching, the ubiquitous mantra was Albert Einstein's Not Everything That Counts Can be Counted.  Sure, we had plenty of worksheet-driven malpractice, but that was seen as a problem, not a solution. As incomprehensible as it seems to teachers of my generation, there are now educators who see test score growth as an end in itself, and not a dubious guesti-mate.  And, that may explain the intense interest over the tenth anniversary of New Orleans's reforms, where advocates grasp at test scores, alone, as evidence that test, sort, reward, and punish has not failed.(@drjohnthompson)

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

This piece just shows how little John actually knows about the research Doug Harris presented - i.e., it's just a response based on conjecture and a rehashing of the same hackneyed arguments we've heard all along.

Each of the tired objections he raises - for example, the possible post-Katrina demographic shift question - was addressed by Doug with an explanation of how his research controlled for those variables.

One would think John would read the report FIRST, then make his case against it, but I suspect that would make too much sense.

I sent my draft on ERA data to Harris, several panelists, and several supporters of the portfolio model, and not one has responded with evidence that I've misstated anything. This just further explains why test score growth in schools that stress testing is not meaningful. Reformers have had years to explain why we should assume that those scores mean anything. Have you heard of a single person who has taken up that challenge?


The title of the paper you give, THE URBAN EDUCATION OF THE FUTURE, is the title of the conference, and your link goes to the conference's agenda. I could find no link to Doug Harris' paper, nor do you cite the text of the offending passage to which you refer.

There is a simple question to be asked: Do difference among, or, better yet, gains in, these tests predict later outcomes of interest? Are the coefficients of interest meaningfully large and reasonably bounded?

I have no idea, but presume that people working with them do.

Michael,
Why in the world would any teacher make the presumption that you make?
Regarding your previous post, you are right that my focus was the question about the national relevance of the New Orleans model, and I mostly drew upon the testimony of all the panelists I could see on the web. I'll get to Harris's paper, and discuss it in its own right later. I'll do so with two assumptions. One, he will do a great job regarding his analysis of New Orleans. Second, when discussing New Orleans, he won't have the burden of proof that is required before arguing that the portfolio model should be replicated.

Why in the world would I assume an article that appears plainly to refer to a report would name it correctly, and the link would lead to it, and the author would identify the one passage which dismayed him?

Are you being serious?

The question of whether THESE test scores are meaningful is not answered by what teachers believe.

The absolute deforming of American education by the obsession with test scores does not negate the issue of whether the scores are meaningful, and if there is value in trying to raise them.

Michael,
Take a breath. Calm down and reread the post and response.

You wrote:

Do difference among, or, better yet, gains in, these tests predict later outcomes of interest? Are the coefficients of interest meaningfully large and reasonably bounded?

I have no idea, but presume that people working with them do.

I responded why would teachers presume that?

There are two issues here, academic research versus real world policy. Actually, there are three: quantitative research, education policy in the inner city, and politics.

I'm looking forward to thoroughly reading Harris's research and his future research. But, there's no hurry (and I need to get my book back to the publisher). I'm assuming that Harris's research on New Orleans will be a goldmine for both real world education policy and discussions in regard the best ways to improve the highest-challenge schools.

Then there's politics and the urgency it brings. We can't let it be said unchallenged that NOLA test score increases are evidence that its model was more effective in the real world than early ed.

Reread the post. Why in an atmosphere that stresses test score gains who you assume that the say anything about the real world? That question was asked and answered for most educators long before the NOLA study started.

Apparently your "why in the world" referred to my post about test scores, not about the misleading writing leading to my expectations about the report to which you referred. Sorry.

I'll calm down.

The point about what test scores increased by interventions "mean" has intrigued me since the early (crappy) evaluations of Head Start.

Would increases in scores lead to the same gains on subsequent outcomes that the apparent "effects" of existing differences in test scores (cross-sectional) would lead you to expect? I would have to ask Adam Gamoran (who was at the same conference as Doug).

Do the increased scores in New Orleans, however accomplished, portend better school success later, however defined and measured? Hopefully, Doug or someone else is tracking that.

But, you mentioned early childhood education. I have always thought that very well-funded, very well-structured early childhood education was the best possible policy choice. That seemed to be consistent with Art Reynolds' and others' research. (Although, as an economist, Doug might want to do a cost-benefit or efficiency analysis.)

But, I think that raising test scores and doing early childhood education should not be opposed. For teaching purposes, I once took the published results from the Perry Pre-School Project, and calculated a path model relating the program to outcomes over time.

A great deal had been made of the fact that the initial test score gains of the Project kids faded out, but that they nevertheless enjoyed greater success down the road, e.g. lower special ed. placements, less retention, less juvenile delinquency, greater eventual education attainment, etc. The implication drawn was that test scores were not an important outcome.

What I discovered, however, was that the kids who enjoyed those later successes WERE THE KIDS WHOSE INITIAL TEST SCORES WERE RAISED (and later faded out). On average, kids whose scores had not increased did not enjoy the benefits attributed to the program. This is one reason I am not as quick as some to dismiss test scores as unimportant. (Other reasons have to do with the association of higher test scores with higher education attainment, occupational position, and earnings.)

John the fact that Harris, panelists, and several supporters of the portfolio model didn't respond to your to draft isn't evidence that test score growth isn't meaningful, but maybe that they're not concerned with what you have to say on the topic.

The comments to this entry are closed.

Disclaimer: The opinions expressed in This Week In Education are strictly those of the author and do not reflect the opinions or endorsement of Scholastic, Inc.