Howling Dog Graphic
Point. Click. Search.

Contents: Archives:

Search this weblog
Search WWW
Howler Graphic
by Bob Somerby
E-mail This Page
Socrates Reads Graphic
A companion site.

Site maintained by Allegro Web Communications, comments to Marc.

Howler title Graphic
Caveat lector

9 April 2001

Our current howler: Why Kate can’t read

Synopsis: Who flunked the recent NAEP fourth-grade test? Let’s start with the New York Times.

Gap Between Best and Worst Widens on U.S. Reading Test
Kate Zernike, The New York Times, 4/7/01

Occasionally, angry readers write to ask us when we’re starting the new web site—the site devoted to urban ed issues, which we discussed a few months back. Alas! We’re working hard on a deathless book about the coverage of the 2000 campaign (The Spinning of the President, Year 2000); and we postponed the start of our incomparable new site to review press coverage of the Bush budget plan. But a front-page article in the New York Times made us wish that the site were active. Saturday’s article reviewed Y2K NAEP test results. Let’s take a look at what it said.

"Gap Between Best and Worst Widens on U.S. Reading Test"—so read the article’s page-one headline. Kate Zernike’s opening paragraph:

ZERNIKE (1): Results of nationwide fourth-grade reading tests released yesterday show a widening gap between the very best students and the very worst despite a decadelong emphasis on lifting the achievement of all students.

"From 1992 to 2000," Zernike wrote, "the average reading scores for fourth graders on the National Assessment of Educational Progress, known as the nation’s report card, remained flat. The average score for top students increased while the average score for bottom students declined even more significantly."

Even more significantly! It sounded bad. "The release of the scores led to a round of finger-pointing over the cause of the growing gap," Zernike claimed. In paragraph four, she proved it:

ZERNIKE (4): Kati Haycock, director of the Education Trust, a nonprofit group that advocates for disadvantaged students, said the numbers spoke of "a frightening sort of educational Darwinism," adding, "It would appear that in a deeply misguided response to demands for higher achievement, schools are focusing their efforts and resources on those students most likely to succeed while neglecting the students who most need help."

A frightening sort of educational Darwinism! In paragraph five, Zernike noted other complaints. "Others said the problem was that teachers had failed to learn the best ways to teach reading," she said. (In paragraphs 8 and 9, she cited more claims. For full text, see postscript.)

Let’s review. According to Zernike, there was a widening gap between the best and the worst. Average scores for top students were increasing, while average scores for bottom students declined "even more significantly." This had led to a round of finger-pointing, she said. She quickly gave one excited example—and never cited anyone saying that there may not be a "problem" at all.

Is there anything in the NAEP results to justify this sort of coverage? Alas! A closer look at Zernike’s work suggests where the biggest incompetence may lie—among some of the nation’s excitable adults, not among the nation’s schoolkids.

LET’S TAKE A LOOK AT THE ACTUAL SCORES THAT PRODUCED this coverage in the Times. How did the fourth-graders really do? When we get inside the actual figures, the changes in scores are remarkably small. For example, in paragraph 12, Zernike notes an intriguing fact; "the average score [of all tested students] in 2000 was 217, the same as in 1992," she writes. As it turns out, the scores recorded by top and low scorers hadn’t changed very much, either.

How did the top scorers do? Zernike provides the scores achieved by students at the 90th percentile. In 1992, she says, students at the 90th percentile scored 261 on the NAEP’s 500-point scale. In 2000, the score was 264. She quotes a NAEP spokesman saying that the change in scores is "statistically significant" (no elaboration on that claim is offered), but one can’t help being skeptical, given the tiny score change. Indeed, all the way down through the scores, the changes are very slight. According to a chart which Zernike provides, scores also went up by 3 points at the 75th percentile (from 242 to 245). At the 50th percentile, scores went from 219 to 221. At the 25th percentile, scores dropped by one point; kids at that level scored 194 in 1992, 193 in the year 2000. Surely, even alarmists aren’t going to say that this one-point drop on a 500-point scale is significant. And remember—these scores are derived from a national sample of 8000 kids, from whom Zernike attempts to draw conclusions about the much larger national student population. But there is no such thing as a "perfect" sample; the slightest change in the composition of the 8000-student group could account for a tiny, one-point change. It is almost impossible to draw conclusions from score changes that are as slight as these. It is absurd to draw any sweeping conclusions from the score changes shown at these levels.

Zernike’s overview does reflect one reality. The largest score change comes near the bottom of the scale; at the 10th percentile, the average score dropped from 170 to 163. Based on that score change, it may well be that the nation’s lowest-achieving students were somewhat more capable in 1992 than they were in the year 2000. (It’s possible; it’s surely not obvious.) But does that reflect some change in the way the schools have performed? Zernike is quick to quote one loud alarmist crying out about "social Darwinism;" Haycock, the person Zernike quotes, simply assumes that this change in scores reflects a change in the work of the schools. But that conclusion is far from obvious. For example, has the student population changed in significant ways? If more kids now speak English as a second language, for example, it would hardly be surprising if scores dropped near the bottom of the scale. This would not necessarily reflect any change in how the schools were performing. Are more kids coming from troubled homes, however defined? Again, a change in scores could result from this, not from the conduct of schools. Two things, then, can be said about this change in the score at the 90th percentile. It’s hard to know if this change means anything at all—that is, it’s hard to know if the entire population would have scored like this, if all the nation’s students had been tested. And if the entire population did score like this, it would be hard to know if that change had resulted from a failure on the part of our schools, or from some change in the student population. But Zernike fails to note the problems inherent in sampling, and rushes to quote a loud "finger-pointer," who asserts that the schools have done something that is horribly wrong. There is no evidence offered—none at all; none—to back up Haycock’s loudmouth assertion. But there it is, in paragraph 4, with no countervailing outlook provided, and with no one ever noting that it is very difficult to draw conclusions from data like these.

FINAL POINT: ANYONE WHO FOLLOWS EDUCATION REPORTING will notice one thing—our press corps simply loves to report how dumb our schoolchildren are. But the comedy almost always comes from the incompetence of the reporters themselves. Zernike pens an excited tract based on exceptionally limited data. And how carefully has she reviewed those data? Consider this error-strewn passage:

ZERNIKE (12): The Department of Education reports the scores on a scale of 0 to 500 and by achievement levels: below basic, basic, proficient or advanced. The average score in 2000 was 217, the same as in 1992. The average scores of students in the bottom level dropped 7 points, to 163 from 170, and the scores in the top level rose to 264 from 261. In both cases the changes, while small, were statistically significant, said Gary W. Phillips, the acting commissioner for the department's National Center for Education Statistics.

(13) The percentage of students scoring at the advanced level increased to 8 percent from 6 percent between 1992 and 2000, and the percentage above proficient rose to 32 percent from 29 percent. The percentage below basic, 37 percent, barely changed.

In paragraph 12, Zernike says that the average score of the "advanced" group was 264 in last year’s testing. But the chart included with her article shows that this statement is simply wrong—according to the chart, 264 was the score attained by the 90th percentile child, who was not even part of the eight percent scoring at the "advanced" level (see paragraph 13). Similarly, Zernike says that 163 was the average score of students in the "below basic" group. That’s wrong too—163 was the score recorded by the kid at the 10th percentile (37 percent of the kids scored "below basic"). These errors play a minor role in the overall story, but they show the lack of attention and skill which the Times has brought to this story. Indeed, all over the press corps, reporters and editors who can’t describe the simplest facts offer sweeping judgments about the schools. In this article, the Times fails to report the simplest facts about the scores attained on this test. But there is the Times, out on page one, quoting a set of loudmouth spinners, who make a set of wholly unsupported claims about what these test scores surely must show.

The world isn’t going to come to an end because of Zernike’s article. But routinely, the ed press makes much more significant errors in reporting the state of our schools. Who pays the price for this incompetence? The burden falls where the help is most needed—on children attending our city schools. Those children are part of an educational disaster; they are owed a careful review of their plight. But Zernike can’t even describe simple test scores. What are the odds that she’ll ever figure out what’s up in our troubled city schools?

More excitement: Everyone Zernike quoted or cited was in a tizzy about the new test scores:

ZERNIKE (8): Federal education officials called the scores disturbing and a sign that education colleges were not imparting the latest ways to teach reading. National reports show plenty of evidence about the best methods, they said, but in the field, educators are still warring between whole language and phonics, and the proven methods are not filtering down to those who need them most. The best method, several researchers and national panels have said, is neither pure whole language nor pure phonics but more of a hybrid, which would emphasize teaching children to decode the meaning of words.

(9) "Although we talk about reform, not all the classrooms of America are seeing this reform," said Marilyn Whirry, a teacher in California and a member of the National Assessment Governing Board, which oversees the test.

Everyone Zernike cited assumed that the data showed a real decline among low achievers. And everyone assumed that this real decline was somehow being caused by the schools. No one said that the test score changes were minor. And no one said that the decline in performance, even if real, may not have been caused by the schools.


The occasional update (4/9/01)

Third way: Meanwhile, Andrew Toppo of the Associated Press came up with a third explanation. According to Toppo, that "163" wasn’t the average score of the "below basic" group. And it wasn’t score at the 10th percentile, either. Toppo penned a third account. Here’s his work from the Chicago Tribune:

TOPPO: But while students in the top 10 percent increased their average score a bit—from 261 to 264—the average scores of readers in the bottom 10 percent dropped from 170 to 163.

According to Toppo, that 163 was the average score of kids in the bottom 10 percent. We are assuming that that’s incorrect, but Andrea Billups of the Washington Times reported it that way too:

BILLUPS: Worse still, the gap between the nation’s best readers and its most struggling readers continues to widen, with students who scored in the top 10 percent of the exams increasing their average score from 261 to 264, while those in the bottom 10 percent fell from 170 to 163.

So, if you were reading about the NAEP test on Saturday, you had your choice about that 163. It was one of these:

  1. The average score of the "below basic" students (the bottom 37 percent)
  2. The average score of the bottom ten percent
  3. The score of the kid at the 10th percentile

Everyone agreed—there was a "163" in the NAEP report. They just couldn’t figure out what it meant.

Our question: As we continue to study why Johnny can’t read, shouldn’t we study the press corps too?