Use of Alternate Assessment Results in Reporting and Accountability Systems: Conditions for Use Based on Research and Practice
NCEO Synthesis Report 43
Published by the National Center on Educational Outcomes
Prepared by:
Rachel Quenemoen • Susan Rigney • Martha Thurlow
May 2002
Any or all portions of this document may be reproduced and distributed without prior permission, provided the source is cited as:
Quenemoen, R., Rigney, S., & Thurlow, M. (2002). Use of alternate assessment results in reporting and accountability systems: Conditions for use based on research and practice (Synthesis Report 43). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved [today's date], from the World Wide Web: http://cehd.umn.edu/NCEO/OnlinePubs/Synthesis43.html
State assessment systems must address both
technical issues and policy issues as assessments and accountability practices
are developed and implemented. These technical and policy issues have been
expanded from traditional large-scale assessment to the new alternate assessment
approaches required by law and developed in every state. The primary purpose of
federally required alternate assessments in state or district assessment systems
is the same as the primary purpose of federally required large-scale
assessments, that is, accountability. The purpose of both is to provide valid
and reliable assessment data that accurately reflect the state’s learning
standards, and that indicate how a school, district, or state is doing in terms
of overall student performance. From that information, schools can make broad
policy decisions that improve schooling practices so all students are
successful.
As these approaches to assessment are
implemented, states have raised questions about how results from disparate tests
can be combined for the primary purpose of accountability. This report reviews
our current understanding of the technical and policy considerations involved in
high quality alternate assessment. Based on research from early implementation
and what are considered to be best practice approaches, this synthesis describes
five steps in alternate assessment test development processes that allow
interpretation and use of results in reporting and accountability.
Responsible use of any assessment includes
documentation of its purpose and technical quality. In the development of
alternate assessments, many states have worked with researchers and
practitioners to assure that the performance measures used with students who
have significant disabilities are also defensible in terms of reliability and
validity. Once these measurement issues
have been addressed, the issue of how to include the results in school
accountability decisions becomes primarily a policy decision. There are
several different ways that can occur.
Three state examples illustrate different methods of incorporating alternate
assessment results in accountability systems.
In general, these conclusions for practice
can be asserted:
I was pretty worried about alternate assessment requirements to begin with, but as we went through training, several of us started talking about how really alternate assessment just asked us to do what we love and do best – teach children so that they learn, and can show what they know and can do! I know I have to do my job well because my students are going to be counted, and I am accountable for what they have learned. It focused my instruction, and I have more clarity on where I need to head with each student. And now I have data to talk about. I don’t just say ‘She looks better.’ I have something that is measurable. It’s hard work including students who have never been part of the system. It took a lot of advocacy, which has really reenergized my work.
(Thompson, Quenemoen, Thurlow, & Ysseldyke, 2001, composite of teacher comments from surveys after first year of implementation).
During the past decade, the emergence of
standards-based reform has generated debate among policymakers and educators
about how to ensure that all students benefit from the reform. Advocates of a
standards-based system assume that all students can be expected to attain high
standards of learning, although some may need more time and varied instruction.
As a result of federal legislation (IDEA and Title I), state-level policymakers
and educators have been required to determine not whether, but how, all students
will participate fully in instruction, assessment and school accountability
based on high standards. Educators have had to translate the policy commitment
of high expectations for all students into practice by:
State assessment systems must address both
technical issues and policy issues as assessments and accountability practices
are developed and implemented. In a standards-based system, the assessments that
are used to hold schools accountable must accurately reflect the state’s
learning standards in order to be valid. The traditional technical issues of
test validity and reliability have been broadened to include consideration of
the impact of testing, often referred to as consequential validity (Brualdi,
1999; Messick, 1989).
These technical and policy issues have been
expanded from traditional large-scale assessment to the new alternate assessment
approaches required by law and developed in every state. Alternate assessments
provide a mechanism for students with the most significant disabilities to be
included in the assessment system. The primary purpose of federally required
alternate assessments in state or district assessment systems is the same as the
primary purpose of federally required large-scale assessments, that is,
accountability. The purpose of both is to provide valid and reliable assessment
data that accurately reflect the state’s learning standards, and that indicate
how a school, district, or state is doing in terms of overall student
performance. From that information, schools can make broad policy decisions that
improve schooling practices so all students are successful.
As these approaches to assessment are
implemented, states have raised questions about how results from disparate tests
can be combined for the primary purpose of accountability. This report reviews
our current understanding of the technical and policy considerations involved in
high quality alternate assessment. Based on research from early implementation
and what are considered to be best practice approaches, this synthesis describes
alternate assessment test development processes that allow interpretation and
use of results in reporting and accountability; examples of state solutions are
presented, and recommendations for practice are provided.
Alternate Assessment: A Rethinking of Traditional Test Development Processes
Alternate assessments are the focus of
controversy for a number of reasons. The nature of the controversy surrounding
alternate assessments reflects, in part, the nature of the alternate assessments
that are currently in use, and the characteristics of the students for whom they
are designed. To really understand alternate assessments, it is important to
think more broadly about the inclusion of students with disabilities in
educational assessments over the past century. Not only has there been limited
inclusion of most students with disabilities, but those students with the most
significant disabilities have been excluded without exception. We have had at
least a half a century to fine-tune how to assess “average” students, but only a
few years to devote to a similar development process for students with complex
disabilities. As our country refined its measures of educational attainment to
focus on standards and what “proficiency” means for the general population of
students, research-based efforts increased, resulting in a literature base that
clarifies what proficient performance is and what instructional strategies and
organizational features are necessary for ensuring that students reach
proficiency. (See Mid-Continent Regional Laboratory Web site for a comprehensive
bibliography of this literature base:
http://www.mcrel.org/standards-benchmarks/docs/reference.asp)
This process is just beginning for alternate
assessments designed for students with significant disabilities. For the most
part, but not entirely, the analogous development process is being carried out
by states. Research-based efforts (see Browder, 2001; Kleinert & Kearns, 2001;
Thompson, Quenemoen, Thurlow, & Ysseldyke, 2001) are clarifying the notion of
proficiency for students with significant disabilities. Some states are
beginning to produce research-based evidence where quality curriculum outcomes
are definable and measurable for students with significant disabilities. Just as
traditional assessment processes and definitions of proficiency have implicit
assumptions (e.g., a proficient student will perform well in the next grade or
educational level), so do alternate assessment processes. These assumptions have
grown out of our professional historical understanding of what proficiency looks
like for these students.
History and Development of Alternate Assessments
Because alternate assessments are new, it is
important to revisit how they have emerged within a historical context. The
definition of optimal outcomes for students with significant disabilities has
shifted since 1975 when these students were first guaranteed a free appropriate
public education. In the 1980s there was a move away from a notion of
“developmental sequence,” where all students were expected to progress through a
“normal” developmental sequence of infanthood, toddlerhood, and so forth.
Instead, there emerged a focus on functional domains that were necessary for
success in current and future environments. As a field, we stopped teaching
students at their “mental age” (e.g., two year old developmental level), and
focused on chronological age functions, preparing them for practical application
of school based learning. During that time, however, we seemed to have lost
touch with academic goals and instruction – focusing on community-based
activities instead of concrete learning of skills and knowledge in a variety of
settings.
With the advent of “inclusion” models in the
1990s, there was a tendency to focus on the importance of “social” progress from
contact with peers; this was often valued over numeracy or literacy skills in a
variety of settings. In many cases academics was addressed only incidentally.
Still, some students remained isolated in self-contained classrooms, being kept
warm and comfortable by well intentioned, caring aides, but with very limited
“learning” taking place. (See Browder, 2001, pages 13-17, for an excellent
summary of this historical sequence, and specific linkages to how large-scale
assessments for students with significant disabilities are being developed.)
The development of high quality alternate assessments required a reexamination of these sometimes competing approaches, and put pressure on states and schools to articulate precisely what “learning” meant for this population. As a result, and with thoughtful commitment to bringing all students into the opportunities of standards-based reform, the special education and assessment communities have identified a set of steps for the development of alternate assessments that are analogous to the process used in developing general assessments (see Table 1). In states where a thoughtful process occurred, the five steps are evident.
Table 1. Steps in Development of Alternate AssessmentsCareful stakeholder and policymaker development of
desired student outcomes for the population, reflecting the best understanding
of research and practice.
Careful development, testing, and refinement of
assessment methods.
Scoring of evidence according to professionally
accepted standards.
Standard-setting process to allow use of results in
reporting and accountability systems.
Continuous improvement of the assessment process.
1.
Careful stakeholder and policymaker development of desired student outcomes for
the population, reflecting the best understanding of research and practice.
These articulated student outcomes are essential to the development of a
rigorous assessment system. It is in this step where thoughtful linkages to
state standards must be articulated, and it is in this step that many states
have extended their content standards to ensure that students with significant
cognitive disabilities have access to, and make progress in, the general
curriculum. These desired student outcomes, linked to the content standards
defined for all students, are reflected in the rubrics or scoring criteria used
to score the evidence, regardless of the type of evidence gathered. The
development of draft rubrics/criteria and subsequent refinements is how these
outcomes are operationalized in the measurement process. The desired student
outcomes are also reflected in the achievement descriptors and ultimately in the
achievement levels. Although there is general professional congruence on what
these desired outcomes may be (Browder, 2001; Kleinert & Kearns, 1999), the
precise desired outcomes vary from state to state, just as do the content
standards, rubrics, descriptors, and achievement levels for the general
assessment. Research is underway to document the variation in articulated
outcomes and rubrics or scoring criteria across states.
2.
Careful development, testing, and refinement of assessment methods.
Typically, the assessment methods are ways to gather evidence, resulting in a
portfolio process, assessment instruments, or other approach that will yield
high quality evidence. This has been the focus of many states’ efforts over the
past few years. We have seen shifts in methodology after pilot years or first
years of implementation (Thompson & Thurlow, 2001). Just as for the general
assessment, the development of a rigorous, valid, and reliable instrument takes
commitment and time, but is one step in the process, not the only step. Not all
states are at this point. In states with a complete alternate assessment
development process, extensive training and support for teachers, parents, and
other IEP team members also occurs.
3.
Scoring of evidence according to professionally accepted standards. Once
assessment evidence is gathered, thoughtful states have engaged in rigorous
scoring procedures following rigorous professional standards for assessment
scoring. Scoring training is provided, scorers demonstrate their competency, and
inter-rater reliability tests and rechecks of scorer competency occur throughout
the process. Dual scoring, third party tie breakers - all tools of the
assessment trade - should be in evidence in this step (AERA/APA/NCME, 1999;
Thompson, Quenemoen, Thurlow, & Ysseldyke, 2001).
4.
Standard-setting process to allow use of results in reporting and accountability
systems. Once scores are assigned, states with a complete alternate
assessment development process have moved on to typical steps in
standard-setting processes, such as reviewing student work to identify initial
“bands” of possible cut scores across achievement descriptors, followed by panel
reviews of student work to arrive at cut scores or panel reviews of hypothetical
cut score parameters (Cizek, 2001; Roeber, 2002; Thompson, Quenemoen, Thurlow, &
Ysseldyke, 2001).
5.
Continuous improvement of the assessment process. In every state that has
accomplished this full process, we have seen revision of rubrics, editing of
achievement descriptors, and increased focus on training of teachers and other
IEP members prior to second year implementation. And in states where there has
been an emerging body of research, we have seen promising evidence of
reliability, validity, and more importantly, a direct link between improving
alternate assessment scores and improvements in instruction (Kamfer, Horvath,
Kleinert, & Kearns, 2001; Kleinert, Kennedy, & Kearns, 1999; Quenemoen,
Massanari, Thompson, & Thurlow, 2000; Turner, Baldwin, Kleinert, & Kearns,
2000).
This type of development process for
alternate assessments is comparable to that used for regular assessments. A
growing body of literature is demonstrating an emerging consensus on these
characteristics of good alternate assessments (see Appendix).
Several states have followed this cycle of
“test development” with their alternate assessments. Kentucky is the leader, in
part because it was the first state to develop an alternate assessment. It has
the broadest base of research to back its progress. There are other states as
well. Massachusetts, Arkansas, and West Virginia discussed their approaches at a
recent meeting of state alternate assessment staff (ASES SCASS), to name just
some of the states that have completed a full assessment development cycle that
includes standard setting. These states are in good position to build on this
rigorous and research-based approach to improve outcomes for students with the
most significant disabilities.
Status of State Alternate Assessments
NCEO has documented the development of
alternate assessments in states for several years (Thompson, Erickson, Thurlow,
Ysseldyke, & Callender, 1999: Thompson & Thurlow, 2000, 2001). In some states,
the alternate assessment is essentially a paper and pencil test adjusted to a
lower level of standards. It is evident in these states that the alternate
assessment is intended for a group of students that has not kept up with the
majority of students of the same age, either because of low cognitive
functioning or because of other disabilities. They are students for whom the
same general instructional goals have been defined; they are just meeting them
at slower rates than other students. In some states, the alternate assessment is
simply a teacher checklist of developmental skills. The group of students for
whom these types of assessments are intended vary with the nature of the
checklist, which may focus on adaptive behavior in some states and on reading
skills in others.
In most states, the alternate assessment
consists of a body of evidence collected by educators, parents, and the student
to demonstrate and document the student’s skills and growth toward state
standards; sometimes these alternate assessments also incorporate
characteristics of educational supports that the student receives. In most of
the states with this type of alternate assessment, it is clear that the students
for whom the alternate assessment is intended have very complex disabilities –
most often significant cognitive disabilities.
Current Challenges in Alternate Assessments
The variability in alternate assessments
makes them more difficult to understand. Still, we know that alternate
assessments are a problem when they are used for a broader group of students
than they should be, when they are used to lower standards, and when they are a
way to exclude students from the accountability system. While it is difficult to
determine exactly how many students should be in an alternate assessment, the
finding that states are planning on having anywhere from 5% to more than 40% of
their students with disabilities in their alternate assessment system (Thompson
& Thurlow, 1999) suggests that there is a problem with defining a common target
group for the assessment.
Alternate assessments should not be used to
lower standards for students with disabilities. NCEO’s analysis of state’s
alternate assessments (Thompson & Thurlow, 2001) revealed an increasing number
of states with a two-prong alternate assessment – one prong for students with
significant and complex disabilities, and the other for students not functioning
on grade level. Not only does this approach increase the number of students in
the alternate assessment, but it brings in students who indeed can take the
regular assessment, though they may not perform very well on it. An alternate
assessment should not be for those students not expected to do well. It should
be those who are working toward the essence of standards, where standards have
been viewed broadly to encompass those students with very complex disabilities.
To the extent that states develop clear guidelines for who should participate in
the alternate assessment, and to the extent that those guidelines define a group
of students with significant, complex disabilities, then it is possible to hold
alternate assessment students to high standards and to document how they can
reach proficient status. By doing this, it is then possible to include these
students in the accountability system in a way that values their attainment of
expanded standards just as much as the system values the attainment of students
in the regular assessment.
Alternate assessments should not be a way to
exclude students from the accountability system. If the alternate assessment is
an avenue to exclusion from accountability, we are likely to see more and more
students inappropriately pushed into the alternate assessment simply because
they are not expected to perform well. Making sure that all students are in the
accountability system – that all students count – is critical to avoiding
corruption in accountability.
Issues Underlying Current Struggles with Including Students with Disabilities in
Accountability Systems
There are many reasons why states and
districts have resisted the participation of students with disabilities in
assessments and accountability systems. Low expectations are a primary reason –
they permeate education and translate into fears about emotional trauma and
other types of emotional abuse from testing. Another reason is that special
education’s history of separating and taking care of students with disabilities
has reinforced the notion that general educators do not know how to provide the
instruction that these students need, and a perception that if they do not have
the needed skills, then certainly they should not be held accountable.
High stakes assessments that carry
significant consequences for students are a major cause of the struggle over the
participation of students with disabilities in assessments and accountability
systems (Heubert & Hauser, 1999). It seems reasonable to argue that students
with disabilities have not had the same access as have other students to
standards-based content, the general curriculum, and good instruction. There is
often a desire to remove them from the accountability system – or to give them
waivers – so that they can graduate or progress from one grade to the next
without having to demonstrate the same knowledge and skills as other students.
This may also limit system incentives to provide the same access to students
with disabilities. Thus, high quality alternate assessment systems are developed
within policy systems that provide the system incentives to provide the same
access to all children, and these policy decisions drive the way the assessment
system is designed, and how use of results in reporting and accountability is
defined.
Policy Decisions: Three Different Methods of Including Results from an Alternate
Assessment in the State Accountability System
Responsible use of any assessment includes
documentation of its purpose and technical quality. In the development of
alternate assessments, many states have worked with researchers and
practitioners to assure that the performance measures used with students who
have significant disabilities are also defensible in terms of reliability and
validity. Once these measurement issues
have been addressed, the issue of how to include the results in school
accountability decisions becomes primarily a policy decision. There are
several different ways that can occur. The following state examples reflect
three of these possible methods.
The three states discussed here share a firm
policy commitment to include all students with disabilities in assessment and
accountability. At the policy level, these states began from the same place: all
meant all, no exceptions. Each state developed an approach to alternate
assessment that involves a portfolio or body of evidence. Each state assesses
student performance linked to the state standards that apply to all students,
with extensions or linkages. Each state is able to show reliability and validity
of alternate assessment scores. And each of these states includes results from
the alternate assessment in the school accountability system. Each state bases
school accountability on gains in student achievement as demonstrated on state
assessments over time
Kentucky and North Carolina have been
prominent leaders in school reform, each with a long and productive history of
thoughtful policy and rigorous technical approaches. Wyoming has adopted
procedures to deal with the statistical constraints of small numbers. Each state
employs a different method of incorporating results from the alternate
assessment in the school accountability system. The following are very general
descriptions of the approaches, presented in order to highlight key similarities
and differences, and to show how the solution meets the policy requirements of
holding schools accountable for high expectations for all students, including
those students whose disabilities require an alternate assessment.
These descriptions apply to state
accountability system designs from online descriptions or written materials
publicly shared as of end of year 2001, and do not reflect changes after that
date. Inclusion of these states here does not suggest their accountability
approach will be approved by Title I review – each of these states has features
that may or may not fully conform to the reauthorized ESEA. They all have
attempted to begin with an assumption that all students count, and they have
worked to build an accountability approach that ensures that all students can
benefit from the required improvements resulting from accountability.
WYOMING
Wyoming assessment results are expressed as
performance levels and descriptors. In order to avoid confusion or simplistic
comparisons between the general and alternate assessments, results from the
general assessment are expressed as four performance levels (advanced,
proficient, partially proficient, novice), those from the alternate assessment
are expressed as three different performance levels (skilled, partially skilled,
beginning) with different descriptors. However, they use both in a complex
formula for accountability (Marion, 2001).
School accountability is a two-stage process.
In the first stage, the state computes an index based on combined general
assessment components and evaluates the overall gain from year to year against a
long-range target. School classification is constrained by the level of
participation, for example, no school can be Satisfactory with less than 98% of
all students participating in the assessment system. For schools that are not
making adequate gains, additional evidence is considered, including:
participation rates in the alternate assessment, results in the alternate
assessment; results of assessments given at grades 1 and 2, progress of students
receiving Title I services, and reduction in percent of students at the “novice”
level. A school can gain or lose points based on a combination of participation
and progress on the alternate assessment.
Inclusive Advantages: Wyoming’s emphasis on getting all students into the
system has resulted in a 99% participation rate. Schools can improve their
status through improved alternate assessment scores and through full
participation.
Disadvantages:
Wyoming’s two stage approach effectively weights alternate assessment results
less than general assessment scores; for schools making adequate gains,
alternate assessment results have minimal or no effect.
NORTH CAROLINA
North Carolina employs a different but also
weighted approach to incorporate results from the alternate assessment in the
accountability index. For the general assessment, students are tested in
multiple content areas, and results from each student in each of these areas are
combined in the accountability index. The performance levels are I-IV for the
general assessments. The second academic assessment, the alternate assessment
academic inventory (NCAAAI) for students who may not be able to take the general
assessment but who are not eligible for the alternate assessment portfolio
(NCAAP), has two (low and high) levels within each level - novice, apprentice,
proficient, distinguished - in multiple content areas (reading skills,
mathematics skills, writing skills). This academic inventory was pilot tested in
2000-01 and is scheduled for inclusion in the performance composite (only) in
2001-02. Schools were held accountable for the performance of students taking
the alternate assessment portfolio by its inclusion in the performance composite
in 2000-01. The levels for the portfolio are novice, apprentice, proficient, and
distinguished. Students receive a score in each of four domains (Communication,
Personal and Home Management, Career and Vocational, and Community); each domain
score contributes one-fourth to the performance composite, so that a student’s
overall performance on the portfolio counts once, rather than four times in the
performance composite.
The North Carolina formula is similar to
Wyoming in that it has a two-stage process. In the North Carolina ABCs, there
are two types of composite scores: growth and the performance composite.
Alternate assessment portfolio scores are not included in the growth composite
at this time; they are included in the performance composite. The total weighted
growth composite for a school is the sum of the weighted growth components.
Components of the model in 2000-01 included:
For computations of the performance
composite, the total number of scores at or above achievement Level III in each
subject included in the ABCs model is divided by the total number of eligible
test-takers (i.e., valid scores, absent students, etc.) for all tests.
Components included in 2000-01 were:
North Carolina averages test results to
arrive at the school index. Students who take the regular assessment contribute
more scores to the school average than students who take the alternate
assessment who are counted just once Nevertheless, all students can be included
in the determination of school accountability at the second tier with one score
instead of several.
Inclusive Advantages: North Carolina’s approach includes all students in
assessment and reporting.
Disadvantages:
North Carolina’s approach effectively weights alternate assessment results for
students with the most significant disabilities less than scores from the
general assessment. This occurs through the combination of its two-stage
approach and its use of one score rather than multiple content area scores. This
approach weights the scores of students with disabilities proportional to their
presence in the assessment population with each assessment counting once.
KENTUCKY
Kentucky was a pioneer in the development of
fully inclusive assessment and accountability systems. Kentucky has almost 100%
of their students participating in the assessment system. Approximately 0.5% of
the total enrollment (at the benchmark grades) participates in the alternate
assessment. Working in close collaboration with University of Kentucky
researchers, the state developed an alternate portfolio system aligned with the
Kentucky content standards intended for all students. Their assessment system is
a model for thoughtful and technically rigorous assessment.
Kentucky uses four performance levels to
describe student work within a content area: novice, apprentice, proficient, and
distinguished. For each student, points are awarded on the basis of the
performance level attained. Results for students who do not participate in an
assessment are counted as zero and averaged with all other students. Kentucky
took the position that students who demonstrate “proficient” performance within
the structure of the alternate assessment should make the same contribution to
the school accountability index as students scoring at the proficient level on
the regular assessment, so results from the alternate assessment are reported
using the same terminology and point values employed in the regular assessment.
In addition, because a student taking the alternate assessment is required to
demonstrate achievement within multiple content areas, the overall score from
the alternate is entered for each content area represented in the regular
assessment. Essentially, the Kentucky accountability system can be thought of as
averaging students, rather than test scores, and each student receives equal
weight as the school index is computed.
Inclusive advantages: Kentucky’s system has resulted in nearly 100%
participation and 100% inclusion of results in reporting and accountability.
Disadvantages: There is a perception that their procedures obscure
meaningful differences in performance between the regular and alternate
assessments.
A Fourth but Controversial Method in Use in Some States
There is another approach used in a few
states that requires general discussion. A few states set arbitrary values for
all alternate assessment results: for example, the performance levels for all
other students are set at 3 or 4 defined performance levels, but all alternate
assessment results are restricted to the lowest of levels, or arbitrarily
reported as a “0” regardless of student performance. The argument is that in
order to be eligible for alternate assessment, students are performing at very
low levels, thus regardless of performance, they are by definition in the lowest
level. This approach might be viewed as the easiest to defend, but it is likely
to defeat the purpose of school accountability for these students. Automatic
lowest level scores do not yield information helpful in identifying improvements
in performance in a way that holds educators responsible for their improvement.
Does it Matter How the State Incorporates the Alternate Assessment Results into
the Accountability Mix?
Common
objection 1: “Those students will bring our school index down.”
Common
objection 2: “Awarding the same number of points for successful performance
on the alternate devalues proficient performance on the regular assessment.”
In his examination of current state
practices, Richard Hill of the National Center for the Improvement of
Educational Assessment (Hill, 2001) examines approaches for including alternate
assessment results in an overall index and draws some conclusions that address
both of the common objections stated above. Based on accountability simulations
using actual state data, Hill reached the following conclusions:
Hill identifies and compares two primary
approaches to scaling results of alternate assessment for inclusion in
accountability systems:
Option
one: Scale results of the alternate assessment so that the value awarded for
performance levels on the alternate are the same or similar to the value awarded
for performance levels on the general assessment (similar to Kentucky). In
states that use this procedure, improvement of results on the alternate over
time is rewarded as substantially as improvement on the regular assessment. He
sees this approach as more fair because it boosts the motivation to include all
students, and results in gain scores that are accurate, unless you have a huge
change in the number participating in the alternate from year to year in a given
school.
Option
two:
Scale results of the assessments so that the performance levels on the alternate
are at the lower end of the scale and performance levels on the regular
assessment occupy the upper end of the scale, (with or without possibility of
overlap between the upper performance levels on the alternate and lower levels
of the regular assessment). In states that use this approach, the alternate
assessment is assigned a defined value; that is, some states may scale the
assessment system so that anything less than 50% is at the lowest performance
level. That correctly measures output, gains are accurate – but you really do
not need an alternate assessment at all with this approach, since by definition
students will be performing under 50%, and thus would automatically be at on the
lowest level.
Hill suggests that as states develop their
accountability formula, the essential questions to be addressed are: What is
fair? What will encourage the greatest improvement for every student? What seems
reasonable? Given the limited evidence that the inclusion of results from
alternate assessment does technical “damage” to school scores, and given the
assumption that the gains for these students are important in improving outcomes
for these students, Hill concludes that the decision about how the state will
incorporate results from the alternate assessment into a school accountability
formula is primarily a policy decision, not a technical decision.
What Do These State Practices Suggest to Policymakers?
How policy decisions are implemented in
accountability calculations can vary, as seen by the three featured states,
Wyoming, North Carolina, and Kentucky. All have made a policy decision to
include all students in accountability; each has selected a different technical
approach to putting that policy into practice. On the technical level several
different approaches are workable, albeit with varying advantages and
disadvantages. Based on initial work by Richard Hill, it appears these
approaches meet basic technical requirements. Each of these states has also
grappled with developing a policy approach that includes all students in
accountability, although only Kentucky has equal weighting of all students.
There are many states that do not as yet have
all students included in accountability formulas. A few states have not
completed development of a technically adequate alternate assessment; those
states will have to address these deficiencies immediately. But many states have
alternate assessments that are aligned to state standards and that meet current
technical requirements. In these states, they should continue building the
technical soundness over time, but immediately work on policies that make use of
the alternate assessment results in reporting and accountability. Beyond
compliance issues, if alternate assessment scores are not included in
accountability systems, a troubling policy message is sent that “some” students
will not “count” in a reform based on the belief that all students can
learn.
Conclusions
As states decide how to integrate all scores
into the accountability index, they must address core policy issues. Some
questions that must be answered include:
Each of our example states began with a firm
policy commitment to include all students in assessment and accountability. Each
has committed resources to development of a valid and reliable alternate
assessment aligned to the same challenging content standards set for all
students.
In general, these conclusions for practice
can asserted:
I think, in our school, for the first time, these students are seen as who they really are, individuals with a unique personality. This happened as soon as more of the staff and community became involved with them through standards-based instruction and alternate assessment. Standards and alternate assessments bring together the best skills of both general and special educators. Parents really love the collections of student work. It showcases a student’s performance and helps us all see the growth that’s really happening.
(Thompson, Quenemoen, Thurlow, & Ysseldyke, 2001, composite of teacher comments from surveys after first year of implementation).
References
AERA/APA/NCME. (1999). Standards for educational and psychological testing. Washington,
DC: American Educational Research Association, American Psychological
Association, National Council on Measurement in Education.
Browder, D. (2001). Curriculum and assessment for students with moderate and severe
disabilities. New York: Guilford Press.
Brualdi, A. (1999). Traditional and modern concepts of validity. Washington, DC:
Retrieved May 20, 2002 from the ERIC/AE Digest on the World Wide Web:
http://www.ed.gov/databases/ERIC_Digests/ed435714.html.
Cizek, G. (Ed.). (2001). Setting performance standards: Concepts,
methods, and perspectives. Mahwah, NJ: Lawrence Erlbaum Associates.
Hill, R. (2001). The impact of including
special education students in accountability systems. National Center for the
Improvement of Educational Assessment, Inc.
http://www.nciea.org/publications/CCSSOSpecialEd_Hill01.pdf
Heubert, J. P., & Hauser, R. M. (1999). High stakes: Testing for tracking, promotion,
and graduation. Washington, DC: National Academy Press.
Kamfer, S., Horvath, L., Kleinert, H., &
Kearns, J. (2001). Teachers’ perceptions of one state’s alternate portfolio
assessment systems: Implications for practice and teacher preparation. Exceptional Children, 67(3), 361-375.
Kentucky state accountability system information taken from:
http://www.lrc.state.ky.us/kar/703/005/020.htm, http://www.ihdi.uky.edu/kap/ ,
http://www.ihdi.uky.edu/kap/history.htm, and
http://www.ihdi.uky.edu/kap/faq.asp#What%20is%20measured%20by%20the%
20Alternate%20Assessment
Kleinert, H., & Kearns, J. (2001). Alternate assessment: Measuring outcomes and
supports for students with disabilities.
Baltimore: Brookes Publishing.
Kleinert, H., & Kearns, J. (1999). A
validation study of the performance indicators and learner outcomes of
Kentucky’s alternate assessment for students with significant disabilities. Journal of The Association for Persons with
Severe Handicaps, 24(2), 100-110.
Kleinert, H., Kennedy, S., & Kearns, J.
(1999). Impact of alternate assessments: A statewide teacher survey. Journal of Special Education, 33(2),
93-102.
Marion, S. (2001). Wyoming’s approach for including all students in the assessment and
accountability system. PowerPoint presentation at CCSSO Large-scale
assessment conference, June 26, 2001.
Messick, S. (1989). Validity. In R. L. Linn
(Ed.), Educational measurement (3rd ed., pp. 13-103). New York: Macmillan.
Mid-Continent Educational Laboratory. (2000). Content knowledge (3rd ed.). Aurora, CO: McREL. Retrieved May 20, 2002, from
the World Wide Web: http://www.mcrel.org/standards-benchmarks/docs/reference.asp
North Carolina state accountability system
information taken from: http://www.ncpublicschools.org/abcs/ABCsHist.html#aa and
http://www.ncpublicschools.org/accountability/
Quenemoen, R., Massanari, C., Thompson, S., &
Thurlow, M. (2000). Alternate assessment
forum: Connecting into a whole. Minneapolis, MN: University of Minnesota,
National Center on Educational Outcomes.
Roeber, E. (2002). Setting standards on alternate assessments
(Synthesis Report 42). Minneapolis, MN: University of Minnesota, National Center
on Educational Outcomes.
Thompson, S.J., Erickson, R., Thurlow, M.L.,
Ysseldyke, J., & Callender, S. (1999).
Status of the states in the development of alternate assessments (Synthesis
Report 31). Minneapolis, MN: University of Minnesota, National Center on
Educational Outcomes.
Thompson, S.J., Quenemoen, R., Thurlow, M.L.,
& Ysseldyke, J.E. (2001). Alternate
assessments for students with disabilities. Thousand Oaks, CA: Corwin Press.
Thompson, S.J., & Thurlow, M.L. (1999). 1999 State special education outcomes: A
report on state activities at the end of the century. Minneapolis, MN:
University of Minnesota, National Center on Educational Outcomes.
Thompson, S.J., & Thurlow, M.L. (2000). State alternate assessments: Status as IDEA
alternate assessment requirements take effect (Synthesis Report 35).
Minneapolis, MN: University of Minnesota, National Center on Educational
Outcomes.
Thompson, S.J., & Thurlow, M.L. (2001). 2001 State special education outcomes: A
report on state activities at the beginning of a new decade. Minneapolis,
MN: University of Minnesota, National Center on Educational Outcomes.
Turner, M., Baldwin, L., Kleinert, H., &
Kearns, J. (2000). An examination of the concurrent validity of Kentucky’s
alternate assessment system. Journal of
Special Education, 34(2), 69-76.
Appendix
Side-by-Side Comparison of Literature Addressing the Development of Alternate Assessments
Browder, Diane. (2001). Curriculum and
assessment for students with moderate and severe disabilities. New York:
Guilford Press.
Kleinert, H. & Kearns, J. (2001). Alternate
Assessment: Measuring Outcomes and Supports for Students with Disabilities.
Baltimore: Brookes Publishing.
Quenemoen, R., Thompson, S., Thurlow, M., &
Lehr, C. (2001). A Self-Study Guide to Implementation of Inclusive Assessment
and Accountability Systems. Minneapolis, MN: University of Minnesota, National
Center on Educational Outcomes.
http://cehd.umn.edu/NCEO/OnlinePubs/workbook.pdf
Thompson, S.J., Quenemoen, R., Thurlow, M.L.,
& Ysseldyke, J.E. (2001). Alternate assessments for students with disabilities.
Thousand Oaks, CA: Corwin Press.
Thurlow, M., Quenemoen, R., Thompson, S., & Lehr, C. (2001). Principles and characteristics of inclusive assessment and accountability systems(Synthesis Report 40). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.
Essential Alternate Assessment Development
Processes |
Alternate Assessment for Students with
Disabilities |
Alternate Assessment: Measuring Outcomes and
Supports for Students with Disabilities
(Kleinert & Kearns, 2001) |
Curriculum and Assessment for Students with
Moderate and Severe Disabilities (Browder, 2001) |
NCEO Principles of Inclusive
Assessment and Accountability Systems (Thurlow, Quenemoen, Thompson, & Lehr,
2001); A Self-Study Guide to Implementation of Inclusive Assessment and
Accountability Systems (Quenemoen, Thompson, Thurlow, & Lehr, 2001) |
Stakeholder and policymaker development of
desired student outcomes |
Chapter 1: Consider how all
of the students in your school can work toward the same content standards,
and how their progress can be measured. Then, consider how students with
significant disabilities are working toward broad content standards, and how
an alternate assessment can measure their progress. You may find that your
students are not getting as many opportunities to learn to the standards as
they should. |
p. 11 “…the question of what
to assess poses considerable challenges. The question cannot be divorced
from the context of the state or district content standards that are the
framework for the general curriculum and the regular assessment…broadly
stated content standards – focused on the broad application of core content
to “real-life” contexts – are clearly more suited to inclusion in the
alternate assessment than are more narrowly written standards that focus on
only a prescribed set of academic content.” p. 47 “In special education there is a
longstanding debate about what students with disabilities should and do
know. We believe strongly that the general curriculum framework should be
the first consideration for IEP teams. |
p. 3-4 “This book describes
specific methods for conducting alternate assessment of students with
disabilities that will meet state standards, developing an IEP, and
assessing progress on target objectives to improve student learning. The
approach…is consistent with trends in curriculum-based assessment described
for students with mild disabilities…[but] for students with severe
disabilities, professionals first must define an individualized curriculum
before planning specific assessment. This specific assessment will often use
behavioral assessment strategies, but may also include qualitative
appraisals such as portfolio assessment.” |
Alternate Assessment 1.a. Alternate assessments are aligned with
state standards held for all students, through some process of extension,
expansion, access, or other high performance bridge to the state content
standards. |
Careful development, testing, and refinement of
assessment methods |
Chapter 1: The nuts and bolts
of gathering high quality assessment data, and assembling them in
appropriate ways is the core of the process of alternate assessment, but
gathering high quality data depends on the thoughtful preparatory work as
the IEP is developed, as instruction is carried out, and as progress is
observed and documented. States and districts vary on how data will be
assembled and handled, so what you do specifically to gather and prepare the
data will vary as well. |
p. 12 “We believe very
strongly that alternate assessments should be performance based (“testing
methods that require students to create an answer or product that
demonstrates their knowledge or skills,” U.S. Congress, 1992), as opposed to
more “paper-and-pencil” – based measures. . . Portfolio assessments, which
are performance-based collections of student products, are especially suited
for alternate assessments … [as one of several reasons] ...portfolio
assessment enables students and teachers to use multiple measures…and can
provide a broadly defined assessment structure capable of accommodating a
very diverse student population.” p. 13 “Alternate assessments should allow the
student to apply what he or she has learned; skills are not used in
isolation. Instead, they are parts of complex performances that integrate
skills across developmental and academic areas…Alternate assessment should
not be a one-time test or single snapshot of student performance.” |
p. 16-17, Table 1.3. Methods
Used to Assess Students with Moderate and Severe Disabilities notes that
“[alternate assessment] may use any of the above described assessment
methods.” An analysis of shortcomings and contributions to current practice
of each approach is given for the multiple methods. The book focuses on a
blended approach that is relevant both to student’s life skills needs and to
their public school experience. |
Alternate Assessment 1.c.
Alternate assessment options promote the use of a variety of valid authentic
performance-based assessment strategies aligned to standards, allowing all
students to be able to show what they know and are able to do. |
Scoring of evidence according to professional
accepted standards |
Chapter 1: Once alternate
assessment data are gathered, state, district, and school administrators
must have all assessments scored, and then report to the public and to the
parents how their school is doing.
From chapter 7:
These processes vary greatly, for example: (excerpts)
Teachers are trained to score all the student
work, but no one scores his or her own students or own district work,
followed by expert rescoring of a sample of work; in some states, regional panels review all
portfolios with at least two independent scores, and expert scorers rescore
those portfolios whenever the original two scores are in disagreement. |
|
p. 78 “First performance
indicators should be clearly defined and validated with stakeholders…Second,
clear guidelines should be developed for scoring… A third way to increase
reliability and validity in portfolio assessment is to recruit evaluator
(e.g., teachers) who know the types of programs being assessed, and to train
them to reach high levels of agreement in using a checklist, rubric, or
other scoring method. Whatever method is used to develop and score the
alternate assessment, consideration needs to be given to how the results
will be used.” P. 80 “…alternate assessment information can be used for the
benefit of students with disabilities if it becomes a foundation for quality
enhancement for these educational programs.” |
Principle 3. All students with
disabilities are included when student scores are publicly reported, in the
same frequency and format as all other students, whether they participate
with or without accommodations, or in an alternate assessment. This principle provides the
first level of accountability for the scores of students with disabilities.
Regardless of how students participate in assessments, with or without
accommodations, or in an alternate assessment, students’ scores are
reported, or if scores are not reported due to technical issues or absence,
the students are still accounted for in the reporting system. |
Standard setting processes to allow use of
results in reporting and accountability systems |
Chapter 7:
States incorporate their “values” into rubric content and scoring… After
setting criteria, developing rubrics, and scoring the evidence, another step
is taken to refine the “levels” of performance against the content and
performance standards… states and districts work to develop a common
understanding of “levels” of performance for students participating in
alternate assessments. In Chapter 2 we explored the ideas that content and
performance standards may be the same for all students; but performance
indicators that demonstrate progress toward the standards will vary for
students who have many challenges for learning… There are many variations to
how levels are determined across the states, and the beliefs and philosophy
of the state stakeholders typically drive these decisions. |
|
|
Principle 4. The assessment
performance of students with disabilities has the same impact on the final
accountability index as the performance of other students, regardless of how
the students participate in the assessment system (i.e., with or without
accommodations, or in an alternate assessment). This principle provides the
second level of accountability for students with disabilities. In order for
all students to count in increased expectations for accountable schools, all
student assessment participation and performance data must be integrated
into district and state accountability indices. Federal Title I requirements
specifically require this, but districts and states should address fully
inclusive accountability in any local or state developed accountability
indices to promote equal access and opportunity for all students.
Alternate Assessment 3.d.
Scoring and reporting processes include a detailed approach for
administration, with clearly defined performance standards, scoring and
recording procedures, and reliability checks built into the process. |
Continuous improvement of the assessment process |
After completing an assessment year, and before moving into the next year,
take time to check out the benefits and challenges, not only of the
standards and assessment systems, but the benefits of the reform effort for
students and everyone who serves them. |
p. 225 “Performance-based
assessment strategies for students with significant disabilities are hardly
new…What is new, however, is the inclusion of students with significant
disabilities in large-scale educational assessments, especially through the
use of alternate assessments. We do not pretend to have the definitive
answers as to how closely alternate assessments for these students can be
tied to the learner outcomes identified for all students or what the
performance and scoring criteria for these assessments should be. ..
Research in each of these areas is in its infancy but fortunately these
important questions are now being addressed for the good of all students now
and in the future.” See references section for
citations on the Kentucky research on impact of alternate assessment |
p. 80 The starting point in
quality enhancement is to set the goal of having a high-quality program…
Although there may be real problems with both the evaluation and the
resources available, a team that is committed to excellence will respond to
these challenges with problem solving, rather than viewing them as reasons
not to respond. The fact that many states and districts with accountability
systems did not include students with moderate and severe disabilities until
the IDEA 1997 mandates suggests that these students were overlooked in
initial discussions of school reform.” |
Principle 5. There is
improvement of both the assessment system and the accountability system over
time, through the processes of formal monitoring, ongoing evaluation, and
systematic training in the context of emerging research and best practice. This principle addresses the
need to base inclusive assessment and accountability practices on current
and emerging research and best practice, with continuous improvement of
practices as research-based understanding evolves. By working together on
improvement of assessment and accountability systems, stakeholders can
sustain commitment to keeping the standards high and keeping the focus clear
on all students being successful. Ongoing training of IEP team members and
other key partners is an essential component of this effort. |