Field Trip: The Data Speaks: Why Teacher Evaluations Are More Accurate Today
No Child Left Behind… Race to the Top… the Every Student Succeeds Act. All have had significant impact on teacher evaluations. Over time, have such laws led to an increase in the quality of teaching our schools?
This year the Frontline Research & Learning Institute looked at the data around teacher evaluation scores. We speak with Sarah Silverman, Ph.D., of Whiteboard Advisors, the firm that partnered with Frontline to analyze the data and the author of the report, “Bending Toward Accuracy: How Teacher Evaluations Are Evolving.” We look at:
- Teacher evaluation scores over a 5-year period and where the data comes from
- Why those scores indicate that teacher evaluations have gotten more accurate over time
- What you can do with teacher evaluation data in your own schools to foster growth in teaching
Read the full report: “Bending Toward Accuracy: How Teacher Evaluations Are Evolving”
Hi everyone, I’m Ryan Estes, the host of this podcast. As I record this, next week is Thanksgiving. So while we’re preparing to make pies and stuffing and cranberry sauce, and while gratitude is on our minds, let me list a few of the things that I’m grateful for. I think I can speak for the entire team here at Frontline Education when I say that we’re grateful for you, men and women working all over the country to provide a great education for our kids.
More personally, I’m also grateful that you’re listening to Field Trip, and today we’re doing something a little bit different. We’ll be back in December with more stories from leaders who are knocking it out of the park in their schools. But right now, we’re going to look at some kind of surprising data around teacher evaluations in America. If that’s your jam…. Great! And if not? No problem… we’ll see you in December.
From Frontline Education, this is Field Trip.
Gone are the days of teachers cringing at the thought of evaluation season, wondering what their score will mean for their salary, or if it will hurt their chances of getting that promotion.
Well, maybe not entirely. But that’s the hope, anyway.
In the post-No Child Left Behind era, the Every Student Succeeds Act has returned a lot of decision-making power around teacher evaluations to the states. And policymakers are hoping that teacher evaluations will be used first and foremost as a tool to support teacher growth.
So… is that what’s happening? The Frontline Research & Learning Institute set out to look at the data to find out what kind of impact evaluation reform has had on evaluation scores, and then, what that can tell us about the effect these changes are having. We spoke with Sarah Silverman, the author of the report, “Bending Toward Accuracy: How Teacher Evaluations are Evolving.”
Sarah is from Whiteboard Advisors, the social impact consulting firm that partnered with Frontline to analyze this data.
I asked Sarah to first give me a sense of the current landscape around teacher evaluations.
SARAH SILVERMAN: I myself have a kind of storied history with educator evaluation. I started my career as an educator and spent some time in the era right after Race to the Top was launched, which was about 2009. And Race to the Top focused on making some pretty sweeping reforms to the ways that talent management was working in terms of state policies and also at the district level.
So I spent some time during that period working specifically with state leaders and often committees of folks that included educators, union officials, folks from state legislatures, or legislative staff, rather, governors’ offices and and school district leaders, trying to figure out how to create new educator evaluation systems that, rather than following a time-tested tact of having a satisfactory/unsatisfactory checklist approach, instead we’re more focused on differentiating among various characteristics and qualities of teaching that educators hopefully could use to better inform their thinking about, what they were good at, what they needed to improve upon, and also guide things like coaching between supervisors or instructional leaders and individual educators.
We put a great deal of effort into trying to create those systems in ways that were clear and that were helpful, both to leaders and educators themselves. And in some cases those efforts, I think played out fairly well. People found them really valuable and were able to connect them with professional learning.
In other cases, the design or the purpose were less clear and ultimately, capacity inside school buildings and districts often limited a lot of folks from implementing the systems as they were designed to be. And so that sometimes created less positive experiences for educators and for staff.
And in addition there was a kind of veil of political pressure. So Race to the Top was, of course, a federal initiative and states had the choice to participate in it, but there was this sense that that it was about blaming teachers for long-standing challenges in terms of student outcomes.
And while I believe, and I know many do, that our effort was really focused on trying to create something that was going to be more helpful to educators, unfortunately a lot of folks perceived, and in some cases actually experienced, systems that were that felt more punitive than actually helpful.
RYAN ESTES: When you set out to look at teacher evaluations today, were there specific questions that you wanted to answer? What made you say, “We need to look at data around this issue”?
SARAH SILVERMAN: Folks who are who are in education will certainly recognize that there is a tendency to have initiatives pop up and catch fire for a while and then in a lot of ways disappear over time.
And we were thinking, you know, it’s been five or six years since folks have started to implement these systems, we’re really quite curious now that the pressures, both in terms of the end of the Race to the Top grant and the introduction of new elementary and secondary education reauthorization, we were curious , whatever came of those efforts? To what extent were the incredible amounts of time and energy that district and school officials put into creating these new systems actually playing out in terms of the promise of actually delivering better information that could help educators improve their practice over time?
And so what we started with was, where are we now? What does the field look like in terms of who’s implementing and what are they implementing? And then we continued down the research path from there.
RYAN ESTES: Before we get into what you found, tell me: how was the study conducted? What data were you looking at?
SARAH SILVERMAN: We decided to use data from the last five years and we really focused on findings from the last three years. We took a look at information we collected from education systems that had implemented electronic evaluation processes, so they had adopted some sort of new evaluation rubric and then they were collecting data on a regular basis and using an online system to track and measure progress in their evaluation system. And so we ended up with data from across the United States, from school districts large and small, from urban, suburban and rural districts, a really broad cross-section of school districts that have implemented at some point over the last five years an electronic evaluation system.
And we tried to attend to the extent to which folks had been implementing for just a couple of years versus those who had been implementing for a longer period. So we did incorporate districts that started their evaluation processes during one of those years, even if they hadn’t been implementing the full time.
RYAN ESTES: As a sidebar, why would you say it’s so important to use data as we consider teacher performance?
SARAH SILVERMAN: We all like to believe that we are better at evaluating others’ performance than we probably are, I think. Some recent data, and actually some data that has come out, I think a few times from the researcher Marcus Buckingham, who has been at Gallup and ADP, and who has done a tremendous amount of work looking at the distinction between what we believe and what we can tell from from data. And he’s observed that our ability in the absence of good information, like regularly collected data that is against some sort of independent protocol, is not excellent, and other researchers have supported that finding as well.
I think it’s particularly important for us to take a look at data, particularly in this format where we can look at broad trends, not just how good one individual might be at evaluating another individual, but where we can look at broad trends in terms of what’s happening around the implementation itself, and also what is happening as a result of the implementation of these new educator evaluation systems. Looking at those trends helps us spot things that we may not see if we’re just looking at one classroom or one school or even one state.
RYAN ESTES: Okay, so let’s get down to it. What did you find as you conducted the study and wrote this report? Did anything in particular stand out as especially interesting to you?
SARAH SILVERMAN: When educator evaluation systems, let’s call them post-reform systems, were initially implemented about five years ago, what we found was rather disappointing. What we had hoped would happen was that evaluators, in the presence of a thorough training system and a well-vetted rubric for performance, would do a better job of identifying some discrete and specific skills that educators either could improve upon or were especially good at and should be exemplars to the rest of the school.
And what instead happened is that, by and large, most educators were evaluated in the top tier, or at least in the top two tiers, of their performance rubric. So they were effectively still getting that “satisfactory” check and not necessarily getting really specific information about their performance. And so folks, I think concluded back then, that the effort was maybe a wash in terms of its ability to create better outcomes, and perhaps a problem from the standpoint that it increased the amount of work that was necessary to implement educator evaluation systems.
What we found was a bit surprising. What happened was not that ratings inflated over time and as pressures wore off that that people just sort of gave educators a “satisfactory” mark and moved on, but instead, essentially, the opposite happened. As those pressures dissipated and folks were able to use the systems, or felt that they were able to use the systems, as they were originally designed, the average scores actually started to decline.
That started happening a couple years after implementation. About three years ago, average scores actually started to decline, and we think that’s not a bad thing. You might think off the top of your head, “Jeez, if educator evaluation scores are declining, does that mean educators are getting worse?”
And we really don’t think so. We think what’s happening is actually that these systems are doing a better job of being more accurate indicators of performance, and that that really has pretty tremendous opportunity to create space for educators and their instructional leaders to identify places where they can most intelligently put their time and resources to work on continuous Improvement. And we approach this from the mindset that of course there are educators who are exemplary, and there are probably many educators who are exemplary in at least one area, but we also know from research that it takes several years to really hone your craft. And it requires that in education as well as lots of other domains. And so we really shouldn’t expect otherwise.
So we think what we’re finding here is that evaluators who are often principals or other instructional leaders are able to actually get more accurate and feel more confident giving those ratings when the outcome is connected to helping people improve as opposed to fear about other more punitive outcomes that may not be helpful.
I asked Sarah to go out on a limb a little bit and speculate about the reasons behind these findings.
SARAH SILVERMAN: The best we can really do at this point is probably to create an informed hypothesis that we really would love to dig into a little bit further and try to understand more.
But I think that there are probably three things at play. Number one, people are getting more used to these new evaluation systems. As I mentioned we had decades of implementation of the satisfactory/unsatisfactory approach to educator evaluation systems and that had become normative. It had become an exercise that was not so much designed for improvement purposes, but really to document cases where folks’ performance was was really below a threshold of acceptability. And certainly we should counsel out folks whose performance is consistently really quite poor.
But where the real value is is in helping people who come with a variety of different skills ultimately hone the skills that they need, that are best aligned with the students in their schools and classrooms. So number one, I think evaluators are getting more comfortable with the system and educators probably themselves, too, are getting more comfortable with the system.
Number two, one of the things I alluded to is this notion of political pressure. The idea that an educator evaluation was going to somehow identify the people who are good and the people who are bad and then maybe fire the bad people or otherwise create conditions that would cause them to leave their roles.
And I think that was a fairly cynical view, although in certain cases that that did seem to bear out, and so it’s not an unfair concern. But I think that that sense of pressure has really gone away.
One of the things we did see in the data is that in the immediate term after those who implemented in the first year of our data set, there were a number of folks who were evaluated at the bottom of the scale, and those people did appear to be counseled out or to leave the system. So there may have been a case where there were a number of people who were continuing to operate as educators who probably should have been counseled out much sooner, and this was an opportunity to do that for school districts.
But by and large districts are not using educator evaluation systems as a means to figure out how to get rid of people, so to speak, or for what I would consider not particularly moral purposes. It seems instead that they’re actually using these systems to try to give people good and accurate information.
And I think that has increased the trust in the system and it also has increased the trust in the approach by individuals who are responsible for evaluation to do those in a fair and accurate manner.
In addition to comfort with the system in the sense that the pressure is off, I think the third thing is, there’s a desire to have better and more helpful information. Everywhere we look this notion of having better evidence — and to the question that you raised the beginning of the conversation, the notion of having data that helps kind of check our general intuition which may not always be reliable — is something that more and more folks are expecting, and maybe that’s because there are more data available in more domains. Maybe that’s because generationally, people seem to be looking for more specific feedback and attention to their performance. I’m not sure exactly what that might be but I think that the idea that having good information and having more frequent access to information that helps focus energy and attention around performance is a good thing, is causing these these scores to bend, as we’ve said, toward accuracy, toward a more appropriate and accurate reflection of people’s performance.
RYAN ESTES: Sarah, one more question here and that is, how do we get these numbers off of the page, off of the spreadsheet? What are action items that we can take away from this, or that educators and school and district leaders can actually take away and do with some of these findings that we’ve seen in this report?
SARAH SILVERMAN: Yeah, that’s one thing that we are very hopeful people will do. So number one, we always suggest that people take a look at their own data. We’ve taken a look at these broad trends to try to draw attention to things that might be happening in one’s from district or one’s own school or even in one’s own classroom in some cases, but it’s really important to try to contextualize this with data from your own system.
And so number one, take a moment to take a look at those data — and it doesn’t actually take a lot. If you’ve got an electronic evaluation system, you should be able to fairly easily run a report. And for those using Frontline systems, we have some tools that will help you do that. But for the most part, if you’ve got an electronic system, you should be able to run your report and take a look at what the distribution of evaluation scores looks like.
And are you seeing that most people are still scoring in the top tier? Are you seeing that people are starting to move into the second or third tier of performance? And if so, what might be happening in your district or your school over the course of time that you’re looking at that report that might help you explain what’s going on?
Then number two, once you’ve taken a look at what your data look like, you can develop a strategy for responding to it. So if you are in a district where you continue to have the vast majority of folks performing at the highest level, you might want to take a look at your student performance outcomes and see if your educator effectiveness and your student performance outcome align with one another. If your teachers are performing very, very well and your students are really not performing well, it raises the question, what, if anything, could educators be doing differently or should educators be learning to do differently to better serve the needs of the students in your in your school or district?
And then third, we always suggest that folks, once they’ve they’ve identified what action needs to happen, that they develop an action plan and try to hold themselves accountable to looking at the data really regularly to make sure that their action plans are actually being implemented and they’re seeing outcomes as a result of the goals and objectives that they’ve set.
We think that for the most part folks are going to find that their data are bending but also still not perfect reflections. Not that they’ll ever be perfect, but they’re not necessarily the most accurate, perfect reflections yet of educator performance. So one thing that folks might want to spend time doing is thinking about, “What is the goal of our educator evaluation system, and how far are we from actually meeting that goal?”
If your goal is to create a system that reflects educators’ real performance and provides them with helpful actionable feedback, it’s going to be important that you’re not pulling punches, so to speak. It’s going to be important that you are really giving accurate and, as much as possible, aligned information to the to the rubric that you’re using.
That’s really helpful to someone to understand the difference between what might be a developing skill versus an effective skill, or an effective skill versus a highly effective skill. In the absence of that the exercise isn’t really worth folks’ time in the way that that it should be, and certainly now, time is highly limited and the number of asks on educators’ time is only increasing. So trying to value that time as much as possible and making sure that exercises like evaluation are highly valuable is just so, so critical.
RYAN ESTES: Sarah, are there any questions that I should ask you about this report that I have not touched on yet?
SARAH SILVERMAN: I love that question. I think the real questions for us are really the ones that we couldn’t answer in this report that we want to try to explore going forward.
For example, it’s one thing to get information that is more accurate and potentially more actionable, and it’s another thing to actually connect that information with action. One thing we’d love to learn more about, and we’ll spend some time exploring our data further for future reports, but also will probably want to connect with folks in the field to discuss, is as you are getting more accurate information from your evaluation system, how are you connecting that with professional learning in a meaningful way? Historically, so many people, so many educators have lamented that professional learning systems or even professional learning activities are not providing the kind of support and developmental velocity that they they would like. In some cases, they’re even described as terrible wastes of time, and I imagine that can be the case. But in other cases, there are some shortcomings in professional learning systems that good information could help overcome.
For example, as we wrote in a previous report called “Bridging the Gap,” there are a number of metrics, some of which are outlined by ESSA, that give some basic parameters for what high quality professional learning looks like. And as you might imagine, high quality is really about connecting the professional learning to what’s happening in an educator’s daily work and in their professional growth trajectory.
One big question we have is how much can and is this information being used to inform better professional learning? Also, if it’s not being used, what are some of the barriers being used for those purposes, what are some of the barriers to making that happen? What are some of the ways that district and school leaders could shift the thinking about professional learning so that it is more valuable and more instructive to educators, and also something that’s really tightly connected to what we know about their performance and their capabilities at any given time?
RYAN ESTES: That’s great stuff. We have been speaking with Sarah Silverman of Whiteboard Advisors about the report from the Frontline Research & Learning Institute called “Bending Toward Accuracy: How Teacher Evaluations are Evolving.” Sarah. I really appreciate your insight. Thank you for talking with us today.
SARAH SILVERMAN: My pleasure, Ryan. Thanks for the opportunity.
You can read the report “Bending Toward Accuracy” at the Frontline Research & Learning Institute — just go to FrontlineInstitute.com to download it there for free. And for more information about Frontline Education and tons of resources for education leaders, visit FrontlineEducation.com.
For Frontline Education, I’m Ryan Estes. Thanks for listening, and have a great day.