This week, we’ve featured principals, teachers, and some of our favorite policy thinkers reflecting on the current state of teacher evaluations—what’s working, and what still needs to change to ensure schools have the information they need to give their students the best teachers they possibly can. Today, we finish our look back at The Widget Effect with a few reflections on what we here at TNTP have learned about teacher evaluation over the last five years.
1. Implementation matters more than design.
Many states and school districts have designed new teacher evaluation systems over the last five years, but not nearly as many have fully implemented them, or implemented them well (at least so far). Rewriting laws or renegotiating contracts to design a new evaluation system can make for high-profile political drama, but in many ways it’s the easiest part of the process—because even a perfectly designed evaluation system is only as good as its implementation.
Requiring a higher number of classroom observations every year won’t ensure principals are actually giving teachers accurate, useful feedback. Deciding that a certain percentage of evaluations should come from student learning measures doesn’t solve the tricky problems of finding good measures for teachers in non-tested grades and subjects. And establishing more rigorous expectations for teachers doesn’t magically eliminate rating inflation (as we’ve seen time and time again over the last few years).
Looking ahead: Plan to focus on implementation over the long run. Many school systems view “implementation” of a new evaluation system as making a huge effort up front to train administrators and explain the new system to teachers. Both are important, but they’re only the first step. Quality implementation—and realistic ratings—requires constant follow up by the people responsible for operating schools, from Chief Academic Officers to middle managers to principals and assistant principals. They all need easy, real-time access to evaluation data; and they need to course correct as necessary during the school year to ensure fairness and accuracy.
2. Multiple measures—including data about student learning growth—are the way to go.
Until a few years ago, teacher evaluations consisted of a single measure: classroom observations. Many school districts assigned teachers evaluation ratings based on one or two lessons that an administrator happened to watch. Common sense suggests this is not the best way to get a complete picture of a teacher’s performance over the course of an entire year or any sense of how much students learned in that classroom.
That’s why almost every state and school district that has redesigned its teacher evaluation system in recent years has opted for a model that considers multiple measures of a teacher’s classroom performance. These new systems typically supplement classroom observations with objective measures of student learning (sometimes value-added, sometimes other measures). Some also include the results of surveys that capture students’ opinions about their experience in class.
The multiple measures approach—including objective student learning data as one of those measures—reflects not only common sense, but findings from the Measures of Effective Teaching Project—the most comprehensive research to date on measuring teacher performance. Just as importantly, researchers concluded that it’s nearly impossible to differentiate performance using classroom observations alone.
While the issue of evaluation remains hotly debated, multiple measures might be the one place where something resembling a consensus has emerged. That’s a positive thing we should celebrate.
Looking Ahead: Use multiple measures, but also multiple sets of eyes. Another finding from the MET Project is that having at least two different people observe the same teacher over the course of the year provides a much clearer picture of that teacher’s performance. In other words, the quantity of observers matters even more than the quantity of observations. Yet this is a lesson that few school systems (with the notable exception of Washington, D.C. Public Schools) have actually taken to heart.
3. You can’t fix evaluations if observers don’t rate accurately.
For all the debate about value-added measures of teacher effectiveness—and it’s huge chunk of the entire debate about evaluation reform—classroom observations are still the predominant factor in almost every teacher evaluation system. Critics of evaluation reform often suggest that value-added is the primary or even the only factor in new evaluation systems, but the truth is that value-added doesn’t even apply to large numbers of teachers right now. Classroom observations are the only part of evaluations that nearly every teacher experiences.
Observations are also one of the best examples of the gap between design and implementation. If you’re concerned about the potential variability of value-added scores, you should be truly frightened by the statistical Wild West that is classroom observations. In most schools, principals and other observers are still failing to evaluate accurately and with rigor, so even new observation rubrics specifically designed to better distinguish levels of teacher performance have produced the same old results that rate every teacher “good” or “great.” As long as the problems with classroom observations remain unaddressed, The Widget Effect won’t be going anywhere—value-added or no value-added.
Looking ahead: Improve observation rubrics by shrinking them. Many new evaluation systems are built around observation rubrics that measure teachers’ performance in dozens of different areas. As we explained earlier this year, these bloated rubrics tend to muddle the accuracy of ratings and the quality of feedback teachers receive. We’ve found that rubrics with just four or five components—all focused squarely on how students respond to instruction—capture differences in teacher performance just as well as longer rubrics, without all the confusion.
4. Done right, teacher evaluations really can help teachers and students.
It’s important to remember that improving teacher evaluations is just a means to an end: improving the quality of instruction students receive every day. That’s exactly what IMPACT, the teacher evaluation process in Washington, D.C. Public Schools, seems to be doing.
According to a study released last year by two of the most respected education researchers in the country, IMPACT is helping teachers improve their instructional skills and helping DCPS retain far more of its best teachers than its least effective teachers.
IMPACT’s success is a big deal, because it led the current wave of new evaluation systems. It was already in development before we published The Widget Effect and before the Obama administration launched Race to the Top. Like many other evaluation systems that have debuted in recent years, IMPACT was controversial—after all, adjusting to any big change that affects every teacher and principal in a school district will never be easy.
These early road bumps didn’t shake DCPS’ commitment to IMPACT, though. The district worked with teachers and principals year after year to improve the system and improve its implementation to get IMPACT to where it is today. IMPACT shows us where other evaluation systems can be in a few years if leaders in states and districts stick with them.
Looking ahead: Stay the course. Reversing the widget effect through better teacher evaluations is more than a bureaucratic compliance exercise. It represents a sea change in how everyone involved in our public schools thinks about and manages the quality of instruction. A culture of almost total indifference to teacher performance won’t change overnight, and it won’t change because of a new rubric and some memos from the central office. But it is critical that state and district leaders persist through the anxiety and resistance that comes along with any big change—learning and adapting as they go—and see through the process of changing the culture. It’s no coincidence, for example, that DCPS became the fastest-improving urban district in the nation after focusing everyone in the system first and foremost on the quality of instruction in classrooms—a direct result of a new evaluation system.