Building a test worth teaching to

12.14.2011

“Believing we can improve schooling with more tests,” Robert Schaeffer of FairTest once argued, “is like believing you can make yourself grow taller by measuring your height.”

It’s a great line. Such statements are the seductive battle cries of the anti-standards and anti-assessment crowd. But is there any reason behind this kind of rhetoric?

Parents rarely complain that their young babies are being weighed and measured too much—even though it can create an extra burden in an often stressful time in their lives. That’s not because parents naively believe these basic tests will make their babies grow faster or taller, but rather because they trust that their doctor will use the data from these and other tests to flag early problems and develop individualized plans to help their children thrive.

Of course, education assessments—particularly end-of-year summative assessments—are far more complicated than scales. But the purpose of tests in school is no different: to flag problems early and often so that they can be addressed before they become lifelong issues.

In education, like in medicine, there are unintended consequences to relying on a limited number of tests in a narrow range of subjects. According to a report released by Common Core last week, 76 percent of teachers feel that critical subjects like science, history, and art are being “crowded out by extra attention being paid to math and language arts,” and 93 percent of those teachers believe that this crowding is a direct result of the state testing regimes that focus almost exclusively on reading and math.

But, too frequently, people see these unintended consequences and seek to throw the baby out with the bathwater—they argue that we should abandon standards- and assessment-driven reform because our current experiment has so far fallen short. That is a mistake. In the end, our biggest problem isn’t that we test students too often, but rather that the quality and scope of tests we administer year in and year out are poor.

A quick scan of the battery of released reading tests on state websites reveals a distressing array of inane reading passages and low-quality questions that promote exactly the kind of instruction we want to avoid. In reading, for instance, rather than selecting passages for their word length and asking them to make rather empty “text to self” connections, why not select passages based on their literary merit and ask them to analyze the author’s actual words? Or to defend a text-dependent thesis statement? And why not focus informational passages on important and grade-appropriate history and science content—content that our education standards already ask students to master and that, if we held students accountable for knowing, teachers might spend more time teaching?

The reason is simple: too many states have low-quality assessments because too few states (if any) make getting assessment right a top priority. States spend a comparatively miniscule amount of their budgets on assessment. In Ohio, for instance, a back-of-the-envelope calculation reveals that assessment accounts for a mere 0.7 percent of the state’s total education spending. (In other states, I’m sure the figure is similar.) We pay for a household scale, but we want the diagnostic functionality of an MRI.

And yet we all know that, in order for standards to gain traction in the classroom and drive the kind of educational change that reformers on all sides of the debate want to see, teachers must have access to useful and reliable achievement data gathered through sophisticated assessments. They must be able to diagnose where individual students are struggling so that they can target extra help, and they need to be able to identify where the class is struggling so they know when to move on and what to focus on if the group isn’t yet ready.

And so our challenge is not to abandon testing and hope for the best. Encouraging teachers to stop “teaching to the test” makes about as much sense as encouraging doctors to stop “treating to the diagnostic.” The two are—and should be—linked. Instead, our challenge is to develop a test on which only students with deep content mastery can succeed. In short, we simply must develop a test worth teaching to.