Links for Portland Parents of Talented and
and Comments on the studies produced by
the Tennessee Value Added Assessment System (TVAAS)
December 11, 1999 (readings/links updated through
[Below is my summary and comments
on a set of studies that were sent by the office of the TVAAS. I
apologize for any mistakes or misrepresentations. I sent this
summary to the TVAAS director, William Sanders, for corrections,
and his comment is at the end. My own comments appear in italics within
Contents of this page:
Articles in order of publication
What is the TVAAS and how did it begin?
What has the TVAAS
Postscript: Dr. Sanders' reply to my
--Go back to Links for parents
of Talented and Gifted children in Portland Public Schools--
Articles in order of publication:
1994(a): William L. Sanders
and Sandra P. Horn, "The Tennessee Value-Added Assessment
System (TVAAS): Mixed-Model Methodology in Educational
Assessment, Journal of Personnel Evaluation in Education,
Vol. 8, 299-311
1994(b): William L. Sanders,
Arnold M. Saxton, and others, "Effects of Building Change on
Indicators of Student Academic Growth," Evaluation
1995: William L. Sanders and
Sandra P. Horn, "Educational Assessment Reassessed: The
Usefulness of Standardized and Alternative Measures of Student
Achievement as Indicators for the Assessment of Educational
Outcomes" Education Policy Analysis Archives Vol.
3, no. 6, http://info.asu.edu/asu-cwis/epaa/welcome.html
1996 (a): Samuel E. Bratton,
Jr., Sandra P. Horn, S. Paul Wright, "Using and Interpreting
Tennessee's Value-Added Assessment System: A Primer for Teachers
and Principals" (TVAAS)
1996(b): William L. Sanders
and June C. Rivers, "Cumulative and Residual Effects of Teachers on
Future Student Academic Achievement" (TVAAS)
1997(a): [no author],
"Graphical Summary of Educational Findings from the
Tennessee Value-added Assessment System (TVAAS)
1997(b): S. Paul Wright,
Sandra P. Horn and William L. Sanders, "Teacher and
Classroom Context Effects on Student Achievement: Implications
for Teacher Evaluation", Journal of Personnel Evaluation
in Education, vol. 11, 57-67
1998: William L. Sanders and
Sandra P. Horn, "Research Findings from the Tennessee
Value-Added Assessment System (TVAAS) Database: Implications for
Educational Evaluation and Research" Journal of
Personnel Evaluation in Education, vol. 12, 247-256
1999: W. L. Sanders and K.J.
Topping, "Teacher Effectiveness and Computer Assessment of
Reading: Relating Value Added and Learning Information System
2004: W. L Sanders A summary of conclusions drawn from
longitudinal analysis of student achievement data over the past 22 years.
Paper presented to Governors Education Symposium, Ashville, NC.http://www.sas.com/govedu/edu/hunt_summary.pdf
2009: A Response to
Criticisms of SASŪ EVAASŪ
William L. Sanders, S. Paul Wright, June C. Rivers, Jill G. Leandro
Nov. 13, 2009 (with additional references to articles online)
What is the Tennessee Value Added Assessment System (TVAAS) and how did it begin?
In 1992, following a suit filed by small school
districts, the state of Tennessee passed an Education Improvement
Act. This Act equalized funding across the state, and increased
the total amount of school funding. The large increase in funding
was paid for by increased sales taxes. Along with the funding
came a demand for more "accountability" and a system
was set up to measure student achievement, dropout rates,
attendance, and promotion. The TVAAS was set up to measure the
effectiveness of schools in increasing student achievement.(1998,
Tennessee students in grades 3 through 8 take
an achievement test, the Tennessee Comprehensive Assessment
Program (TCAP) every year in five subjects: reading, language
arts, math, science, and social studies. This test is written by
McGraw/Hill which also provides tests to several other states. It
was chosen because the test covers material that is taught in
Tennessee schools. In addition, high school students are now
being tested in five different areas of mathematics.(1998, p.
The TCAP is a "norm-referenced"
rather than a "criterion-referenced" test: that is,
students are measured against the actual test scores of students
in the same grades across the country, not against a fixed
standard curriculum that every student is expected to master
(1996(a), p. 25). The 1995 article argues that these tests are
more reliable and much less expensive than alternative tests;
they also take much less class time and cover a much broader
range of topics than other assessment methods.(1995, pp.7-8 and
5-6). Costs for certain tests can be as high as $150 per pupil
per test.(1995, p. 7). The cost for the TCAP in 1995 was $3.59
per student, and the cost of the TVAAS reports added $0.60 per
student.(1996(a), p. 30). In Britain, when "standard
assessment tasks," (individual performance tests) were
introduced, it was estimated that the assessments took 2 to 5
weeks of class time.(1995, p.6)
TVAAS studies are based on a huge database that
now contains more than five million records of student test
results (1998, p. 250) and analysis requires a very powerful
The achievement scores of every student are
saved over several years to form a continuous record, (a
longitudinal record). Every student's record is also linked to
the school district and school that that student attended, and to
the individual student's teachers. Conclusions are based not only
on each student's growth over the previous year, but also on
averages of the student's growth over a three year period.(1998,
The TVAAS system takes every student at his or
her own starting level and measures how effectively the
teacher/school/district increases what the student
knows. This is the "value added" part of the system. Teachers
and schools are held accountable for making sure that their
students improve in
scores from one test to the next, not for having their students
meet some fixed standard minimum score(1998, p. 250)
TVAAS tracks how much time students spend with
each teacher. Teachers are responsible for every student who
spends 150 days in their classes, and reports of teachers are
based on the average growth of their students over the past three
to five years (1994(a), p. 303). Reports from the TVAAS must be
included in teacher evaluations, but cannot be the only
information used in teacher evaluation.(1998, p. 249.)
Every teacher receives a report on the
achievement gains of the students in his/her class.
Teachers and schools receive information about
the average achievement gains of their low, average, and high
achieving students and can compare these to the growth of
students of similar ability from earlier years in the same
class/school. They can also compare their students' improvement
with the national average. These reports help teachers and
schools to pinpoint problems in particular grades or subjects, or
with particular sorts of students.(1998, p. 250, 1997(a), pp
What has the TVAAS found?
The TVAAS has developed several statistical
models that enable it to study the importance of various factors
on student learning. Again, "student learning" is
understood as the increase in achievement test scores
from one year to the next. For example, they have looked at the
effect of small classes compared to large classes, of changing
from one school to another compared to staying in the same
school, and of being in one school district compared to another.
Here are some of the most important findings
and conclusions of the TVAAS studies. These apply only to the
state of Tennessee, but it seems likely, because of the extremely
large database, that they would also apply in other states.
The studies found that the single most
important factor in student achievement gain was the student's
teacher. Two other important factors were the achievement level
of the student, and the school system itself.
The 1996(b) study divided elementary school
mathematics teachers (grades 3 to 5) in two different urban
school systems into "quintiles," or five groups of
teachers. Teachers were assigned to quintiles according to how
much academic growth their students showed during the school
year, from the lowest gain (first quintile) through low, average,
above average, and highest gains. (fifth quintile). The study
tracked students to see what sequences of "low,"
"average" or "high" teachers they had, and
then compared the scores of the students.
"Differences in student achievement of 50
percentile points were observed as a result of teacher sequences
after only three years. The effects of teachers on student
achievement are both additive and cumulative with little evidence
of compensatory effects." (1996(b) Summary of Findings).
A "low" teacher lowered the scores
for a student for the year that the student was in that teacher's
classroom, and even if the student had a "high" teacher
the following year, the student did not catch up--a student who
had a "high" teacher for both years or even an
"average" teacher followed by a "high"
teacher, would still have higher test scores. Good teachers could
help their students make progress during the year they had them,
but they couldn't completely erase the effect of lower growth the
year before. The negative effect of poor teachers could still be
seen two years later. The more "low" teachers a student
had, the lower the students' final scores were likely to be.
"Average" teachers were able to do a
good job with "average" students. However, the top
fifth of students did not make the same progress UNLESS they had
High scoring students in mathematics need better teachers to show progress
than do other
"As teacher effectiveness increases, lower
achieving students are the first to benefit.
The top quintile of teachers facilitate
appropriate to excellent gains for students of all achievement
levels."(1996(b) Summary of Findings).
- "a comparison of average student
achievement gains ... shows the first quintile of
teachers [the least effective teachers] to be ineffective
with ALL achievement levels of students.
- The second quintile of teachers
facilitated ... achievement with the lower achieving
group, but became less effective as the achievement level
of the students increased.
- Although the third quintile of teachers
was effective with more achievement levels, lower
achieving students profited more than higher achieving
students when assigned to "average" teachers in
- Teachers in the fourth quintile achieved
target gains with all but the highest level of student
achievers, and again, the lower achieving students were
- The fifth quintile teachers [the most
effective teachers] were generally effective with ALL
student achievement levels, but even the highest
achieving students made less than adequate gains in one
of the two systems [districts].
IN BOTH SYSTEMS, TEACHERS IN THE TWO LOWER
QUINTILES DID NOT FACILITATE TARGET GAINS WITH MOST OF THEIR
AND OVERALL, A GREATER PERCENTAGE OF LOW
ACHIEVING STUDENTS THAN HIGH ACHIEVING STUDENTS MADE SATISFACTORY
GAINS." (1996(b), pp. 4-5).
Poor/Minority students make as much progress as other students with the same teachers
"African American students and white
students with the same level of prior achievement make comparable
academic progress when they are assigned to teachers of
comparable effectiveness. " (1998, p. 254)
If two students, one black and one white,
started a school year with similar achievement scores, they were
likely to make the same growth during the year if they
had equally able teachers. Similarly, if two students, one from a
rich family/neighborhood and another from a poor
family/neighborhood started with similar scores, they would also
make the same growth during the year-- as long as they had
equally able teachers.
However, if one of the students was starting
with a higher test score, that student was likely to show less
growth. Also, if one of the students had a less able teacher,
that student was also less likely to succeed.
In one district where the black students made
up 38% of the total, they were somewhat more likely to have the
worst teachers. Ten percent more black students than expected
were assigned to the least effective teachers. The authors cite
another study by E.M. Bridges that found that when parents and
students complained about poor teachers, the teachers were likely
to be transferred to schools with large numbers of transient,
poor, or minority students.(1997(b), p.5)
[My conclusion is that even in an average
school, a math student in the top fifth of the class has a one in
five chance of finding a teacher who is skilled enough to make it
likely that that student will learn.
A very high-ability math student in a
struggling school is under a double disadvantage. That student
has a greater need than classmates for an excellent teacher, but
is less likely to find one than a student in an average school.
If we assume that the pattern continues,
then "TAG" students, who are in the top tenth may have
even greater disadvantages than students in the top fifth. These
students are very unlikely to succeed without
in Poor/Minority neighborhoods
are as effective as other schools
in fostering student
"The effectiveness of a school cannot be
predicted from a knowledge of the racial composition of the
school population. ... Although sometimes schools with high
proportions of minority students show lower average raw scale
scores, the gains their students make are comparable to those of
schools with a minimal proportion of minority
students."(1997a, p 26)
When all subject areas were averaged over three
year periods, schools with high levels of minority students
showed the same overall gains in student performance as other
schools .(1997(a), p. 26) Students in minority neighborhoods
might START school with lower scores, but after that they showed
the same amount of GROWTH in learning as students from other
neighborhoods, even if the actual scores remained lower. This was
also true for schools with many students in the free/reduced
price lunch program (1997(a) p.32. The actual scores might be
lower but the growth was the same.
Schools and teachers in poor neighborhoods
often excuse poor scores by their students by pointing to the
environment the students come from. This study shows that every
school can fairly be held responsible for making sure that its
students show a year's growth in their test results, even when
the students start with low scores.
[My conclusion is that, if we focused our
effort on early childhood/early primary so students didn't enter
school with low scores, we could expect all schools to do well
The best students learn the least
Student achievement level was the second most
important predictor of student learning. The higher the
achievement level, the less growth a student was likely to have.
"Only the most effective teachers--the top 20 per cent--are
providing instruction that produces adequate gain in
high-achieving students, while students in the lower achievement
levels profit from all but the least effective teachers.
Therefore, the majority of the brightest students fail to achieve
to their potential year after year"(1998, p. 254). This
happens in school systems in different parts of the state with
different levels of poverty and of minority students.
"Possible explanations include lack of
opportunity for high-scoring students to proceed at their own
pace, lack of challenging materials, lack of accelerated course
offerings, and concentration of instruction on the average or
below-average student. This finding indicates that it cannot be
assumed that higher-achieving students will "make it on
their own."(1997(b), p. 66)
The actual school "system" [district]
a student was in was the third most important factor.
Moving to middle schools reduces achievement
Transferring from one school to another didn't
make much difference as long as a student transferred to any
grade except the lowest. However, when whole groups of students
moved to the bottom grade of a new school, for example, when
students moved from elementary to middle school, there was a very
serious loss in achievement. This drop was worst in grades 6 and
7--the usual grades for moving to middle school.(1994(b), p. 3).
Class size doesn't matter
Class size was not found to be a significant
factor. [This one really bothered me until I thought about
it, and now I wonder if it was a result of the way they did the
study. The researchers divided classes into only two sizes--10 to
19 students and 20 to 32 students. Perhaps most classes are right
around the dividing point. I can see that if nearly all Tennessee
elementary classrooms have from 18 to 22 students and you divide
those into two groups, you won't see much difference between
them. I find it hard to believe that if you compared the bottom
and top--classes with 10 students vs. classes with 32--you
wouldn't get a significant difference in outcomes--Margaret].
Diverse classes are as successful as less diverse classes
Classes with students of a wide range of
ability (heterogeneous) were as successful as classes with a
smaller range (homogenous).
[at first sight, this finding seems to
contradict many other studies of ability grouping--we now have
more than 700 grouping studies. Most studies of gifted students
have found very significant benefits for these students. Other
studies have found that grouping does not harm the students who
are not gifted. Robert Slavin's work, which is quoted in 1997(b)
(p. 65), should be used with caution. Slavin did not include
programs for gifted students in his study. Even Slavin now says
that gifted students benefit from accelerated instruction or
"a markedly different curriculum" (See his article,
"Ability Grouping, Cooperative Learning and the
Gifted," Journal for the Education of the Gifted, vol 14,
[When we take another look at the TVAAS
data, it is easy to understand how they came to the conclusion
that grouping did not make a difference.
- First, they did not look at whether
students were being grouped within the classroom.
- Second, they only looked at how much
difference in ability there was in the classroom, not at
how good the class was on average. Both
"remedial" and "advanced" classes
would be averaged together as "homogenous"
classrooms--and compared with "normal"
- Third, they only divided students into
four achievement groups--but the students who have shown
the greatest benefit from grouping are way above the top
75th. percentile. The difference between the 75th.
percentile level and the 98th. is several grade levels.
- "Ceiling effects" may not
give reliable information. The tests in use may not test
the full range of knowledge gained by very gifted
accelerated students, because the questions are too easy.
For example an algebra student may find lots of questions
about addition and subtraction, but none on algebra,
geometry and trigonometry. To measure students in
advanced classes, only "out of level" testing
gives good information.
- Most important of all, virtually all
the studies agree that grouping does not help much UNLESS
curriculum and instruction is changed to meet the needs
of the students. That is the whole point of grouping
gifted students in the first place--to give them more
advanced work. There is no evidence that Tennessee
provides this, so it is not surprising that there was
little benefit to being in a "homogenous"
classroom. When the instruction IS changed, studies have
found that gifted students can gain on average a full
year in achievement.
Most writers also stress the importance of
teachers who are trained to teach gifted students. Since the
study shows that gifted students have an even greater need of
good teachers than average students, it suggests that if gifted
students are grouped together and put in a classroom with a
dreadful teacher they will do very badly--even worse than an
average student. Even if they are grouped together and put in a
classroom with an average teacher, they will not do very well. To
succeed, gifted students need grouping AND an appropriate
curriculum AND a good teacher! Otherwise they may not show even
as much growth as average students with an average teacher.
Therefore, it is not surprising that in school districts across
the country, the best students are making the lowest gains.
For further discussion of the ability
grouping issue, see James A. Kulik, An Analysis of
the Research on Ability Grouping: Historical and Contemporary
Perspectives, Storrs, Connecticut: National Research
Center on the Gifted and Talented, 1992) and Tom Loveless,
"The Tracking and Ability Grouping Debate," (1998)
available online at http://www.edexcellence.net/library/track.html
findings of the TVAAS should be approached with caution. A lack
of information about how the Tennessee school system works, or
about how particular findings were calculated can lead to
unjustified assumptions about what the findings mean. However, it
seems incontestable that the quality of teachers is the single
most important factor in student success.
More important than the actual findings is
the potential value of the METHOD. Using norm-referenced tests
and measuring student growth seems to be a very effective tool
for analyzing whether a school district is doing a good job with
students of most abilities, and for finding places where
improvements can be made. For very gifted students, however,
norm-referenced tests usually used have too low a
"ceiling" for satisfactory results, and the use of
out-of-level testing would yield more useful information.
The "value added" approach seems
to me to be more consistent with the Portland Public School
District's "core objectives" than the use of
"criterion-referenced standards" and single test scores
required by the State.--Margaret]
[I e-mailed the summary above to Dr.
Sanders, and his reply took issue with my comments. His reply is
printed below with his permssion.--Margaret DeLacy]
" Over-all I think you have done a good
job with your summary. There are some of your conclusions with
which I would quibble.
You have exaggerated the ceiling effect problem.
This is one area that I have monitored very closely. The only
place that we have found a problem is for measuring the teacher
effects for teachers teaching "real" algebra in the 8th
grade. The elementary tests do not adequately measure this
progress. That is why we use our high school end-of-course tests
for these classes.
Another point, often people incorrectly
perceive that a test has a ceiling effect just because a rather
large number of students have previously scored at the 97
percentile and higher. What is often failed to be considered is
the error of measurement of these tests. The higher up on the
distribution the more error of measurement resulting in the often
mistaken view that there will be a ceiling bias the following
year. If however, one considers each student's entire previous
academic history (we use up to five years), then it can be
demonstrated that progress for groups of the higher end students
can be measured with out bias. This is not to say that all other
tests would enable the above to be true. However, I do have data
from several different achievement tests and have found
consistency with the above.
I don't agree with your class size conclusion.
To my knowledge about the only study that has found a class size
effect was an earlier Tennessee study, Project Star (this study
was done by other researchers at another institution.) In that
study they lowered the class size below 15 to 1. In our study, we
wanted to assess the relationship as it existed within the
regions of the study. The decision to make this variable discrete
was for statistical convenience so that we could express the
potential interactions with other variable more conveniently. If
we had included class size as a continuous variable the same
conclusion would have been reached.
As to the issue of grouping or not grouping of
students, I make the following argument. Each school should
enable appropriate levels of academic growth for all students
regardless of the entering level of the kid. How this is to be
provided can take many different forms. The small rural school
with one teacher per chronological age may have to have a
different strategy than a suburban school with 15 teachers per
grade. I don't think that we should get "hung up" on to
group or not group, rather we should focus on sustained academic
growth for all kids...."
"...I am especially concerned about the
ceiling effect issue. I have heard this so much over the years
and have had to demonstrate this usual non-problem that I would
want to dispel this concern for parents of very high achieving
Since this article was written, a number of Sanders's articles
have been made available online. See http://www.mdk12.org/practices/ensure/tva/
-- Margaret DeLacy
Ohio is one of
ten states that have obtained approval from the U.S. Department of Education
to employ a value-added assessment model for No Child Left Behind compliance.
For several years, Ohio educators and businesses have been working with Batelle
For Kids to implement this model.
The Ohio Association for Gifted Children
(OAGC) has a page on its website featuring links to a variety of value-added
Petrilli and Aaron Churchill of the Fordham Institute published a column "Why
states should use student growth, and not proficiency rates, when gauging school
using growth-based data from Ohio on October 13, 2016, on the Fordham website.
"A More Accurate Growth Model: Using Multigrade Adaptive
Assessments to Measure Student Growth",
a report from the Steering Committee of the
Delaware Statewide Academic Growth Assessment Pilot.
Delaware recently requested permission to use multi-grade computer-adaptive
tests in order to document the performance of both very high-performing and very
District of Columbia is currently using value-added data in combination with
teacher observations as a tool for evaluating teacher performance. in
2010, Superintendent Michelle Rhee annouinced that 165 teachers would be fired
for poor performance. A discussion of the data that underlay the
evaluations can be found in opposing articles that appeared in commentary from
the Washington Post and Education Week in July of 2010.
Rick Hess added a follow-up in August, and his comment generated further
responses from readers
"Were some D.C.
teachers fired based on flawed calculations?"
"Professor Pallas's Inept, Irresponsible Attack on
Rick Hess, "Value Added:
the Devil's in the Details"
(alphabetically by author)
"Sanders 101" (1999) Jeff Archer,
"Unfinished Business: More
Measured Approaches in Standards-Based Reform" by Paul E.
Barton of the Policy Information Center, Educational Testing Service
(2004). Another very informative summary of the issues raised by
the use of achievement test scores to evaluate school performance.
Longer, more detailed and a bit denser than the NWEA report above.
Recommends a mixed approach to evaluation including ensuring that
testing reflects actual curriculum goals, repeated testing during the
school year, and the use of both gains-based and status-based
'Failing' or 'Succeeding' Schools: How Can We Tell?" by Paul E. Barton
published by the American Federation of Teachers (2006)
Value-Added and Experimental Studies of the Effect of Charter
Schools on Student Achievement: A Literature Review,
Y. Emily Tang,
(December 2008) from the Center on Reinventing Public Education at the
University of Washington, Bothell. Argues that traditional aggregate or
"snapshot" score reports do not provide a good picture of the success of charter
schools because of the variability of their student bodies.
For a recommendation that a value-added approach be adopted in
California, see "Putting Education to the Test: A Value-Added Model for
California" by Harold C. Doran and Lance T. Izumi (June
2004). This article focuses on developing a score that will show whether a
student is on track to reach proficiency by the date required by the state
standards and does not consider what happens with students who are proficient
for many years in advance of those grade levels.
Measuring Teacher Effectiveness,
Preliminary Findings, by the Gates Foundation, (no date, circa December,
2010, too many "authors" to list) Ably summarized in an article by
Jason Felch of the Los Angeles Times (see
below). See also the website for the MET project, listed below under "Research
False Performance Gains: A Critique of Successive Cohort Indicators by
Steven M. Glazerman and Liz
Potamites December 201, Mathematica Policy
Research, Working Paper
Argues that cohort-based accountability measures
provide very misleading information when compared to average gain indicators and
"Passing Muster: Evaluating Teacher Evaluation Systems by"
, Mathematica Policy Research,
, University of Washington,
, Stanford University and others,
published by the Brookings Institutution, May 3, 2011
Advice on the best way to use
value-added assessment to evaluate teacher performance,
"When the Stakes are High, can we Rely on
Value-Added? Exploring the Use of Value-Added Models to Inform Teacher
WorkForce Decisions", Dan Goldhaber, Center for American Progress,
"Developing Value-Added Measures for Teachers
and Schools" Eric A. Hanushek and Caroline M. Hoxby a chapter from
REFORMING EDUCATION IN ARKANSAS: Recommendations from the Koret Task Force
(2005) A short and readable outline of the main issues with
Value-Added Measures of Education Performance: Clearing Away
the Smoke and Mirrors Douglas N.
Harris University of Wisconsin (2010)a Policy Brief from the Pace School
of Education at Stanford University. Summary of a book that discusses the
limitations and best uses of value-added measurement.
IMPACT D.C.’s Model Teacher Evaluation System" by Susan Headden (Education
Sector, 2011) IMPACT includes the use of value-added data on student assessment
as well as expert evaluations of teachers' classroom performance. Many
commentators are describing this as one of the best national accountability
"Value-Added Assessment and Systemic Reform: a Response to
the Challenge of Human Capital Development" (2005) Theodore Hershberg, in the
Phi Delta Kappan
Indispensable Tests: How a Value-Added Approach
to School Testing Could Identify and Bolster Exceptional Teaching
by Robert Holland (December 2001) Lexington
"Individual Growth and School Success."
Northwest Education Associates. A very clear and entertaining explanation of why
student gain data is important and should be used to correct the Annual Yearly
Progress evaluation required by the No Child Left Behind Act. Both the
executive summary and the full report are available at the link below, but the
full report requires free registration.
"Value Lessons" by Lynn Olson, Education Week May 5,
2004. A long article about the implementation of Value-Added Assessment in
Great Britain including both praise and criticism.
Is Your Child’s School Effective?
(2006) Paul E. Peterson and Martin R. West Education Next
NEW Incorporating Student Performance
Measures into Teacher Evaluation Systems, by Jennifer Steele, Laura
S. Hamilton, Brian M. Stecher Center for American Progress/ Rand Corporation,
"Value-Added Assessment: An Accountability
Revolution" by J..E. Stone in the
compilation "Better Teachers, Better Schools" by the
"No Child Left Behind Act: States Face Challenges Measuring
Academic Growth That Education’s Initiatives May Help Address." United
States Government Accountability Office, GAO Report to Congressional Requesters,
"Growth Measures: Don’t Call ’em ‘Value Added" February
Value-Added Evaluation & Those Pesky Collateralized Debt
Obligations by Rick Hess for the Education
Week blog, May 3, 2100
comment on the Bookings study "Passing Muster: Evaluating Teacher Evaluation
'value-added' analysis of teacher effectiveness
By Jason Felch for the Los Angeles Times, Dec.
"Teachers' effectiveness can be reliably estimated by gauging their students'
progress on standardized tests, according to the preliminary findings of a
large-scale study released Friday by leading education researchers. The study,
funded by the Bill and Melinda Gates Foundation, provides some of the strongest
evidence to date of the validity of "value-added" analysis, whose accuracy has
been hotly contested by teachers unions and some education experts who question
the use of test scores to evaluate teachers."
The Los Angeles Times
compiled Value-Added data for individual teachers and
released the results on august 14th. 2010. The results were
surprising--there was more variation within schools than between them; student's
economic and ethnic status did not play an important role, and neither did class
size. Strict teachers seemed to be more successful. Below are
two stories about the report. The final link is to an online discussion of these
articles that was initiated by a blog by Kevin Karplus:
The Los Angeles Times released a follow-up report in May, 2011:
"Times updates and expands value-added ratings for Los
Angeles elementary school teachers: New data include ratings for about 11,500
teachers, nearly double the number covered last August. School and civic leaders
had sought to halt release of the data".
FAQ about the analysis:
List of articles about the series:
New York Measuring Teachers by Test Scores
by Jennifer Medina for the New York TImes,
Jan 21, 2008
"New York City has embarked on an ambitious experiment, yet to be announced,
in which some 2,500 teachers are being measured on how much their students
improve on annual standardized tests. The move is so contentious that
principals in some of the 140 schools participating have not told their
teachers that they are being scrutinized based on student performance and
"Tennessee seeks to use student tests to
show teacher quality" From Education Week 05/07/03 an article about
an initiative to use the TVAAS data to show that teachers are "high
performing" if their students make good progress http://www.edweek.org/ew/ewstory.cfm?slug=34esea.h22&keywords=Value%2DAdded
"Tennessee Reconsiders Value-Added Assessment System"
Lynn Olson, Education Week,
March 3, 2004: The headline
exaggerates a bit, what is being reconsidered is an adjustment that is made to
the raw data before it is plugged into the Value-Added equations
Education Scholars Finding New 'Value' In Student Test Data, Education
Week, 11/20/02 discusses some problems in collecting and analyzing value
added data as well as benefits:
"Leaders in Ohio and
Pennsylvania are making better sense of their school data" December
The state of Ohio recently adopted a value-added
assessment approach as part of its compliance with the No Child Left Behind Act
and has received a $10m grant from the Battelle institute to create a
database. As a result, there is a continuing series of articles from Ohio
charting the implementation of the system.A follow-up article discussing initiatives in Ohio and several
other states and districts is a series of articles by Brett Schaeffer in
The School Administrator (Web Edition),
For critiques of the traditional TVAAS model methodology
and/or findings see:
"If anything, this first
MET report provides good evidence that simply asking students about their
teachers is a much better idea than going through both statistical and logical
gymnastics to obtain a VAM score."
Eva Baker et al: Problems
with the Use of Student Test Scores to Evaluate Teachers
Co-Authored by Scholars
Convened by The Economic Policy Institute: Eva
L. Baker, Paul E. Barton, Linda Darling-Hammond,
Edward Haertel, Helen F.
Ladd , Robert L. Linn, Diane Ravitch, Richard Rothstein, Richard J. Shavelson,
and Lorrie A. Shepard, August 29. 2020
This policy document
argues that Value-Added Measurement systems hold benefits over other ways to
evaluate test scores, but should not be used as a major component of evaluations
of individual teachers
"Are we there
yet? What Policymakers Can Learn From Tennessee's Growth Model" by Charles
Barone (March 10, 2009) Education Sector Technical Reports
This is a critique of
Sanders' proposal to use projections of student growth towards benchmarks for
measuring whether schools are meeting the Annual Yearly Progress (AYP) standard
of the Elementary and Secondary Education Act (ESEA), known as No Child Left
Behind Act (NCLB). Barone believes this model is too complex and lacks
transparency. The article includes links to a rebuttal by Sanders and a
rejoinder by Barone. They agree that NCLB doesn't address student learning above
state state standards: "No one has as yet offered a clear accountability
solution to address the criticism that any NCLB accountability model (i.e., one
targeted only at "proficient") does not provide any incentives to raise
performance once students have met the proficient benchmark. This is likely due
to the fact the federal government has traditionally focused its efforts on the
most at-risk students." (Barone, n. 6).
Damian Betebenner "An Analysis of School District Data Using
Value-added Methodology" CSE Report 622, CRESST/University of Colorado at
Boulder (March 2004)
Center for the Study of Evaluation, National Center for Research on
Evaluation, Standards, and Student Testing (note: this is a very technical
report). I haven't been able to figure out why Betebenner finds that
teachers in a g/t program are at a relative advantage under traditional
VA models (p.21) when many studies have found that these students usually make
much lower achievement test gains. Further enlightenment would be appreciated. http://cresst96.cse.ucla.edu/products/reports_set.htm
Daneil Fallon, "Clarifying How We Think about Teaching and
Learning" (2004) This is an editorial discussion presented at a conference,
not a peer-reviewed article but provides a useful and readable account of the
immediate history of TVAAS and relates it to the debate over "teacher effects"
and teacher education.
"Because the promise of value-added research is so great, and
the competing arguments about it so confusing, Carnegie Corporation of New York
awarded a grant to a team of outstanding statisticians at the RAND Corporation,
an independent nonprofit national organization dedicated to research. We asked
the research team to review all of the currently competing statistical
value-added assessment models. We wanted RAND’s opinion on responsible uses of
the method and we wanted their advice on what we can trust. The research team’s
conclusions, published in a little book earlier this year, pointed to a number
of reasons to be cautious when using value-added analysis.
First and foremost, the researchers point out that our ability to draw sensible
conclusions depends ultimately on the quality of the tests that are
administered. Many of the tests in use today are in fact poorly aligned with
state standards and also are not calibrated against the developmental level of
the students at each grade. Some tests do not provide accurate or reliable
measures of what we seek to discover.
Second, the researchers warn that all of the statistical models are relatively
new and have not been broadly applied by large numbers of researchers. Although
each model tries to control for many variables in order to get a clear reading
of a teacher effect, we know that every model currently in use contains some
amount of unplanned statistical bias, and these imperfections are not well
The RAND study concludes that since value-added methods have known shortcomings
they should not be used by themselves for high-stakes policy decisions."
Harcourt Assessment, Inc. "Value-added assessment systems"
(2004) Another easy to read summary of the Rand report
Thomas J. Kane,
Douglas O. Staiger, David Grissmer, Helen F. Ladd, "Volatility in School Test
Scores: Implications for Test-Based Accountability Systems" Source:
Brookings Papers on Education Policy, No. 5 (2002), pp. 235-283
Not an easy to read summary ,
but important reading none the less. Unfortunately, it doesn't seem to be
available in full text on the web (It can be found on JStor through the
wonderful Multnomah County Library system). The authors argue that test
scores often vary for reasons that have nothing to do with teacher inputs and
that variability increases as size falls. As a result, very small schools,
small classes, and populations that have been disaggregated by ethnicity or
income into smaller groups have test results that are more variable than do
bigger schools, classes, and groups. This increased variability means that
small schools have a much easier time floating to the top of the charts for one
year than do large schools which are more likely to be closer to average.
(Small schools are also more likely to sink to the bottom temporarily). Programs
that reward "top" schools are biased towards small schools. Disaggregating
results by student ethnicity has the perverse effect of rewarding segregated
schools. Using student gains in place of aggregate scores tends to
increase volatility and averaging scores over a small number of years doesn't
necessarily help. The authors propose several ways to improve
accountability including introducing some statistical "filters" into the
Haggai Kupermintz, "Teacher Effects as a Measure of Teacher Effectiveness
Construct Validity Considerations in TVAAS (Tennessee Value-Added Assessment
System)" (March 2004) CSE Technical Report 563 CRESST/University of
Colorado at Boulder
Center for the Study of Evaluation, National Center for Research on
Evaluation, Standards, and Student Testing
Abstract: "This report examines the validity of measures of
teacher effectiveness from the Tennessee Value-Added Assessment System (TVAAS).
Specifically, the report considers the following claims regarding teacher
effects: that they adequately capture teachers' unique contributions to student
learning; that they reflect adequate standards of excellence for comparing
teachers; that they provide useful diagnostic information to guide instructional
practice; and that student test scores adequately capture desired outcomes of
teaching. Our analyses of the TVAAS model highlight potential weaknesses and
identify gaps in the current record of empirical evidence bearing on its
Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz, and Laura
S. Hamilton "Evaluating Value-Added
Models for Teacher Accountability (2003) An important and much-cited
summary of the recent debates on TVAAS (called Value-Added Models or VAM)
commissioned by the Rand Corporation. "The research base is
currently insufficient to support the use of VAM for high-stakes decisions. We
have identified numerous possible sources of error in teacher effects and any
attempt to use VAM estimates for high-stakes decisions must be informed by an
understanding of these potential errors. However, it is not clear that VAM
estimates would be more harmful than the alternative methods currently being
used for test-based accountability. http://www.rand.org/pubs/monographs/2004/RAND_MG158.pdf
The Impact of the No
Child Left Behind Act on Student Achievement and Growth. This report
found that after the implementation of NCLB, student achievement test scores
rose slightly, but student growth decreased, and shows that high-achieving
students made lower gains.
Research and Conferences
From the National Center for the Analysis of Longitudinal Data in Education
Research, includes some value-added research
on Value-Added Modeling, convened by CALDER, April 22-24, 2008
papers, audio and power point slides.
Measures of Effective
Teaching, A research project funded by the Gates Foundation
Methods, Implications for Policy and Practice, May 2008. Convened by the
Urban Institute. Audio and Powerpoint. Focuses more on politics and
policies, less on equations.
VARC--The Value Added
Research Center at Wisconsin University
Research Projects at the Wisconsin Center for Education Research
--Go back to Links for parents of Talented
and Gifted children in Portland Public Schools--