Arraying test scores as a function of income: A definitive view on Maine’s rank in school 'achievement'

Arraying test scores as a function of income:
A definitive view on Maine’s rank in school 'achievement'
Brian Hubbell
February 4, 2011



There are deeply sound reasons for holding fundamental reservations about taking standardized test scores as markers of what’s valuable in education. At best, tests may represent a measure that parallels the actual accomplishments of learning. Often, they may instead represent counterproductive distraction. Worse, they may cause true damage by what they displace.

But, in the bluster of accountability that has rattled school roofs ever since the Bush-Kennedy El Nino of NCLB, the only paradigms left standing are data-driven. And data-driven now means evaluated by standardized tests, whether one believes in them or not.

So, accepting this scuffed-up hardpan as the new cricket pitch, let us resolve to match rigor with some statistical scruple. Here’s an offering in that vein.

Problem and available tools

In Maine as elsewhere, ardent school reformers of many stripes begin with two premises: 1) that our school systems are falling short by standardized measures; and 2) that examples beyond Maine’s borders can be shown as modeling improved performance. If so, those conditions ought to be objectively, reproducibly, and conclusively evident.

At present, for better or worse, there are only three sets of standardized tests that can be used in any way to compare the current performance of Maine schools with those in other states. These are the SAT, the NECAP, and the NAEP tests.

NECAP - narrow comparison in a strong neighborhood

Of these, the newly-adopted NECAP offers only a two-year-long comparison to three other nearby states: Vermont, New Hampshire, and Rhode Island. This test may eventually provide meaningful data about how Maine performs within the educationally strong New England region. (Below I'll explain why I think Vermont may be of particular interest.) But NECAP offers no direct measure on the national field.

SAT - scores skewed by participation rate

It's almost impossible to draw valid comparisons between states using average SAT scores because the pool of test takers is a self-selected subset that varies significantly in every state.

In the western half of the US, where the ACT generally stands as the college admission test, fewer than a quarter of the school population take the SATs while those who do have disproportionate ambitions for attending exclusive eastern colleges. With participation rates as low as three percent, these states tend to have the highest average scores.

On the other extreme, Maine is the only state which, for better or worse, requires the SAT of every high school student, even those with substantial learning disabilities and those with an utter absence of college ambitions.

As a result, not surprisingly, with the highest national participation rate, Maine has the lowest average SAT score - a data point that even the Governor was quick to seize, erroneously, as evidence that Maine stands “in the bottom third nationally in achievement.”

From this it may be tempting to speculate about Maine’s relative performance in relation to states with relatively high participation rates. But the college board definitively warns against this, cautioning that the subset effects of demographics make all between-state comparisons inherently invalid.

NAEP - the absolute measure putting Maine in the top third

That leaves the NAEP which is specifically designed for uniformity of implementation and interstate interpretation.

There are four (and only four) NAEP tests that are consistently administered in all 50 states: fourth-grade math, fourth-grade reading, eighth-grade math, and eighth-grade reading. Again for better or worse, these are the only standard tests from which valid comparisons can be made about school performance across state lines. This is what, in the broad “data-driven” world, we have to work with.

The crudest ranking for state educational ranking can be made by summing the scale scores from all four tests for each state and sorting.

Doing this with the most recent scores from 2009 shows Maine to rank 13th out of 50 states, tied with Colorado and Ohio. This should serve as an absolute benchmark for placing Maine’s schools in the national context - irrefutably in the top third.

NAEP - plotted against affluence

But if instead of absolute measure what we’re searching for is some evidence of successful educational practice - indeed the effort and “value added” by each state’s schools, it is necessary to view these aggregated NAEP scores through a lens of the relative advantage of each state’s population of students.

The most straightforward measure of relative advantage is relative income, a measure which correlates broadly with test scores in most investigations.

The graph below plots the aggregated NAEP scores from above against each state’s median household income.

On one axis, the array shows Maine horizontally aligned with its peers in average test performance - a group which includes Ohio, Colorado, Virginia, Kansas, South Dakota, and Maryland.

On the other axis, the array shows Maine vertically aligned with its peers in average affluence: Kansas, South Dakota, Ohio, Idaho, Michigan, and Arizona.

The up-sloping linear trend-line, running from Arkansas on the left side up through Oregon and Colorado, represents a mean relationship between income and test scores. It also suggests an isometric reference parallel to the trend-line for measuring relative equivalence of educational outcome in relation to an adjusted baseline of advantage.

In this respect, the array suggests that Maine’s peers in educational effort and outcome may instead be Kentucky, Indiana, Pennsylvania, and Minnesota. In this frame, Maine schools may in fact be credited with “adding value” beyond those of Connecticut, New Jersey, and New Hampshire, despite those states’ higher aggregate scores.

Arraying the same test scores against each state's percentage of students eligible for federal lunch subsidy shows a similar pattern.

States likely to model for improvement

Strikingly, this suggests the states that Maine should be looking toward for models for improvement - and they are relatively few. It will surprise hardly anyone that Massachusetts and Vermont could be states from which Maine might learn something. But it appears that Montana, Minnesota, and North Dakota could serve as good examples applicable to Maine’s circumstances as well.

Maine's position on these charts above also should serve to caution against policy changes adopted from other states that risk moving Maine education in the opposite direction in relation to the trend-line.

The table below compares some of the characteristics that may bear upon the relative test scores and costs in these nominally high-performing states.


 State Expenditure per pupil Pupil-teacher ratio  Pop. per sq. mile  Median income  % poor  NAEP aggregate scores
 NJ $17,537 12.0 1184.1 $64,143 29% 1041
 VT $14,215 10.5 67 $50,619 30% 1042
 MA $13,586 13.6 845 $59,981 30% 1058
 ME $11,898 12.1 43 $48,032 36% 1022
 MN $10,012 15.7 66 $56,956 32% 1037
 MT $9,695 13.6 7 $42,778 36% 1031
 ND $9,168 11.6 9 $49,450 31% 1033
 US $10,792 15.3   $49,777 43% 1003


Data sources and references:


Brian Hubbell
February 4, 2011


[Download as *.pdf]

NAEP Scores

I am in no way qualified to comment on the statistical validity of the NAEP scores. However, I do wonder about the fact that the assessment is not given to all students in all Maine schools every year. Although I understand that mathematically the scores of a few can be extrapolated to the scores of the many, I think it would be good - as with the SAT being administered to all 11th grade Maine students whether planning to attend college or not - if that detail were mentioned as the NAEP scores are used in whatever context they are.

NAEP sampling and error

That's a good point, Nancy.

More information about the NAEP methodology, sampling, error, and limitations can be found on this NAEP page: Cautions in Interpreting NAEP Results. (I've now added the link to the references above.)

The key point is that, relative imprecision aside, NAEP the only assessment that is designed for consistency over time and across state lines.