Extreme school ratings: Ohio’s proposed gap closure indicator requires greater scrutiny
The U.S. Department of Education recently granted Ohio relief from No Child Left Behind’s (NCLB) most ponderous mandates. (Note, while the USDOE has approved this waiver the Ohio General Assembly has not yet passed the necessary legislation to make this all real). To receive relief from NCLB, Ohio was required to present a school accountability plan that would put its 1.75 million students on a college- and career-ready path. Ohio’s NCLB waiver promises a revamped accountability system based on three indicators of school quality: (1) student achievement, (2) student growth, and (3) achievement gap closure. The three indicator scores (reported as percentages) are summed and averaged—each given equal weight—to determine a school’s overall performance.
The proposed system’s third indicator, gap closure, is a newly-conceived measure of how well nationally-defined student subgroups (e.g., racial, economically disadvantaged, special education, English language learners) perform on standardized tests compared to a state-designated baseline test score—an annual measureable objective (AMO). All school buildings have at least one student subgroup; however, schools are only accountable for subgroup scores if they have 30 or more students in any of the nine NCLB-defined subgroups.
To gauge how well schools would perform under the proposed accountability system, the Ohio Department of Education (ODE) simulated schools’ performance using 2010-11 report card data. ODE’s simulated results, however, put into question the validity of their gap closure indicator. .
Here’s why. Consider the distribution of Ohio school buildings’ overall rating (Figure 1). The vertical axis indicates the number of school buildings by rating received, and the horizontal axis shows the rating scale, which is expressed as a percentage. We observe that most school ratings fall within a relatively narrow band between 70 and 90 percent, somewhat normally distributed, though with a leftward skew (mean = 74 percent, standard deviation = 16.9).
Figure 1: Overall school building ratings relatively evenly distributed around mean. Distribution of overall school building ratings, ODE simulated results using 2010-11 data.
Source: Ohio Department of Education and author’s calculations. Note: A higher percentage for a building’s rating—the horizontal axes in Figures 1 and 2—correspond to a higher grade. This is equivalent to how student grades are calculated (e.g., “A” ≡ 90 to 100 percent). Overall school building rating is comprised of three indicators (1) student achievement, (2) student growth, and (3) gap closure—equally weighted.
Now consider how school buildings are distributed according to Ohio’s proposed gap closure indicator (Figure 2). Again, the vertical axis indicates the number of school buildings, while the horizontal axis indicates a school’s gap closure rating. We notice a very different distribution of schools’ gap closure ratings compared to schools’ overall ratings. Gap closure ratings are nearly evenly dispersed across the entire rating scale (mean = 64 percent, standard deviation = 33.5). Moreover, we observe a large number of schools falling in the extreme margins of the distribution; for example, 890 out of 3,275 buildings received a 100 percent rating while 320 received 25 percent rating or less.
Figure 2: More than one in three schools rated at extreme margins (indicated in bright red: 100 percent or under 25 percent) for gap closure indicator. Distribution of gap closure rating by building, ODE simulated results using 2010-11 data.
Source: Ohio Department of Education and author’s calculations. Note: Buildings without reported data were removed from calculation.
The distribution of gap closure ratings is an anomaly. Why don’t we see a more balanced, normally-distributed dispersion of school ratings, similar to what the overall ratings show—with schools gravitating towards the mean? Moreover, should we conclude that the 890 schools that received a 100 percent rating are marvelously narrowing achievement gaps, while the 320 buildings that received less than 25 percent are miserably failing?
These questions warrant a closer examination of the schools at the extremes. Perhaps we’ll find that the top-performing schools actually only have few or only one subgroup to educate, while those at the bottom of the distribution have to educate many students across many subgroups.
A preliminary scan of schools supports the hypothesis. Take Midway Elementary School, a rural all-White school in Madison County: it received a 100 percent gap closure rating, because it met the test score benchmark for one subgroup—White students. Meanwhile, the Charles Mooney Elementary School in Cleveland received a 0 percent gap closure rating, as it did not meet the state standards for any of its five subgroups.
Narrowing achievement gaps for disadvantaged subgroups is a legitimate educational objective, and the U.S. Department of Education is right to require states to include gap closure in annual school and district report cards. But school ratings should also reflect effort. If some schools do in fact receive 33 percent of its overall school rating points virtually free—simply because they have few subgroups—ODE should consider adjusting its gap closure rating formula. Perhaps ODE could upwardly adjust the rating of high-subgroup schools based on their number of racial minority or special education students. This would be tantamount to a degree of difficulty adjustment. (Think figure skating scores: missing a triple axel is punished less than missing a single axel.) Another alternative may be to reduce the weight of the gap closure indicator from 33 percent for single or low-subgroup school buildings. These adjustments would ensure that low-subgroup schools are not unfairly rewarded and high-subgroup schools are not being excessively punished in their overall school building ratings.
 Ohio’s current accountability system includes four indicators: (1) student achievement, (2) student growth, (3) school performance, which includes graduation and attendance rates, and (4) annual yearly progress, which includes racial, special education, etc. subgroup performance. The proposed accountability system, as described in Ohio’s NCLB wavier application, keeps indicator (1), student achievement, the same, modifies indicator (2), student growth, by adding a graduation gap closure measurement, eliminates indicator (3), school performance, and replaces (4) annual yearly progress with the new gap closure indicator. The most applicable pages (pp. 51-52) from Ohio’s NCLB waiver, revised May 24, 2012, can be found here.
 NCLB, Public Law 107-110, Title IA, Section 1111 (2)(C)(v)(II) defines a student subgroup accountable to an AMO: “The achievement of--(aa) economically disadvantaged students; (bb) students from major racial and ethnic groups; (cc) students with disabilities; and (dd) students with limited English proficiency.” Thus, an all-White, with no special education, economically disadvantaged, or ELL students, would have at least one subgroup—White students.
 In a follow-up piece, we plan to look at the schools that fall in the extreme margins to ascertain whether the schools at the top of the distribution (100 percent ratings) are simply those with few or only one subgroup, while those at the bottom of the distribution are those with numerous subgroups.