Models for Reporting the Results of Alternate Assessments within State Accountability Systems
Prepared by:
Sue Bechard
Measured Progress
September 2001
Any or all portions of this document may be reproduced and distributed without prior permission, provided the source is cited as:
Bechard, S. (2001). Models for reporting the results of alternate assessments within state accountability systems (Synthesis Report 39). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved [today's date], from the World Wide Web: http://cehd.umn.edu/NCEO/OnlinePubs/Synthesis39.html
Reporting the scores of students with disabilities
participating in alternate assessments raises a number of challenges, including those surrounding
concerns about statistical soundness, as well as those related to the different
purposes and focuses that characterize current alternate assessments.
Across the nation, states have reached different decisions about how to report
the results of their alternate assessments.
This report summarizes six models currently under construction, or in some
cases, already being used by states.
Using proficiency levels as a common reporting approach, the six models are:
Model 1: Same
proficiency levels for general assessment and alternate assessment
Model 2:
Different proficiency levels for general and alternate assessments are treated
as the same
Model 3:
Different proficiency levels for general assessment and alternate assessment
Model 4:
Overlapped proficiency levels for general assessment and alternate assessment
Model 5: Lowest possible
proficiency level for alternate assessment
Model 6: No alternate
assessment proficiency levels
The pros and cons of each of the six models are
addressed, along with the implications of using each model. It will be important to monitor the
impact of the different approaches over time.
In response to the 1997 reauthorization of the
Individuals with Disabilities Education Act (IDEA 97) and Title I of the
Elementary and Secondary Education Act (ESEA), states are now conducting
alternate assessments for students with disabilities who cannot participate in
general state assessments, even with accommodations or modifications. The work
thus far has involved development of many different assessment strategies
nationally, including checklists, reviews of records, surveys, performance
events, documentation of progress on IEPs, and the collection of various types
of evidence into paper or electronic portfolios (Thompson & Thurlow, 2001).
Since students with disabilities participate in general assessments with and
without accommodations, the alternate assessment population represents only a
segment of students with disabilities and generally a very small segment of the
total student population. Generally, states have identified up to 2.5% of the
total student population or about 20% of students with disabilities as
appropriate for their alternate assessments.
States are at the point of deciding how their alternate
assessments will be scored and reported. Regardless of the manner in which the
assessments were conducted, or the extent to which reliability and validity of
scores have been established, the results are to be reported publicly. IDEA and
Title I requirements are not prescriptive about how results are to be reported.
IDEA 97 (Section 300.139) requires states to publicly report on alternate
assessment participation and performance (see Table 1). Title I (Section 1111)
requires that states disaggregate the results for students with disabilities
compared to nondisabled students, and to provide for the reporting of results to
be included in a public report on school progress. According to Summary Guidance
on the Inclusion Requirement for Title I Final Assessments (Cohen, 2000),
“Whatever assessment approach is taken [referring to standard assessment,
assessment with accommodations, or alternate assessment], the scores of students
with disabilities must be included in the assessment system for purposes of
public reporting and school and district accountability” (p. 2).
As states are determining how the results of alternate
assessments will be reported, the question arises as to how the results will be
presented in relation to the reports of their general assessments. This paper
presents six models that are currently in use or being considered to situate the
alternate assessment results within states’ reporting systems.
Issues that Have an Impact on Reporting Decisions
Several factors potentially could have an impact on
decisions about reporting alternate assessment results. The three addressed here
are among the more salient within the context of standards-based reform.
Statistical Soundness
There is quite a bit of controversy over the concept of
“statistically sound.” It is a discussion that relates to the soundness of the
scores on the assessment (reliability and validity), the aggregation of the
scores from alternate assessments with general assessments, and the aggregation
of scores from general assessments administered with standard and non-standard
administrations (see Thurlow & Wiener, 2000).
It is important to continue to address these issues. My
purpose here is not to explore the technical issues involved in the aggregation
of scores from alternate assessments with scores from general assessment, but
rather to identify different ways in which it could be done. Throughout this
discussion, however, it is important to recognize that the technical issues have
a significant impact on the discussion of how scores are reported. Still, even
when scores are determined not to be “statistically sound” or when it has been
determined that they will not be aggregated with other scores for reporting, the
federal mandates suggest that they be visible.
Purpose and Focus
There are numerous variables that have an impact on a
state’s decision about how scores will be reported. One is the purpose of the
assessment. Different types of reports may be used if the assessment will be
used for instructional programming rather than for accountability purposes, or
to compare schools.
Several types of reports are under discussion in many
states. The following four types exemplify some of the options states are
considering. One type of report includes all students on all assessments (100%
of the total student population). Another includes all students on the general
assessment with or without accommodations, and in some states, with non-standard
accommodations (approximately 98% of the total student population). A third type
shows all students with disabilities (approximately 10% of the total student
population) on all assessments. Last, the report may display the results of
students with disabilities on the alternate assessment, sometimes including
students in the general assessment with non-standard accommodations, or taking
off-level or out-of-level tests. There is almost as much variability in the
reports as there are states.
Embedded in the purpose of assessment are determinations
of what is assessed – the focus of the assessment. Some states, such as
Kentucky, South Carolina, Tennessee and Rhode Island, developed rubrics that
focus not only on student achievement but also directly evaluate programs
(Thompson & Thurlow, 2001). This is in contrast to states, such as Massachusetts
and Colorado, which have determined that student achievement will be the only
indicator of program quality or improvement. This focus of the assessment also
has an impact on how scores are reported.
Stakes
The consequences of the assessment bring other
considerations for reporting. A state that uses the assessment to determine
graduation or grade promotion will likely have different reporting requirements
than states with school or district-level consequences. At this time, two states
(Massachusetts and Ohio) are considering the use of their alternate assessments
as a way for students to earn a state diploma. Other states see the alternate
assessment as a path to a different certificate. In some states, the report
format reflects the decision that the skills required for proficiency on the
alternate assessment are at a lower level than the skills required for
proficiency on the general assessment.
This is the first year that most states will produce
reports on alternate assessments. Many approaches to reporting the results of
alternate assessments are emerging as states consider the purposes of their
assessments, the statistics involved, the requirements of their accountability
systems, the federal requirements, and the stakes attached. In viewing the
various approaches, there seem to be six models of reporting currently under
construction. While these models probably are not exhaustive, I present them
here to illustrate simply and graphically some of the options that exist at this
time.
All states have at least three levels of proficiency.
However, most states report their general assessment results using four levels,
some use more levels. For my purpose here, the four levels are used to
demonstrate the relationship of the alternate assessment to the general
assessment. The pros, cons, and implications of each model are presented also.
Model 1
In Model 1 (shown in Figure 1), the scores of students in the alternate assessment are placed into one of four levels of proficiency, just as the general assessment are. When reported, the alternate assessment scores are aggregated with the general assessment scores in the appropriate corresponding proficiency category. The scores of the alternate assessment carry the same weight in the reporting (and perhaps in the accountability system) as do the scores of the general assessment.
Figure 1. Model 1, Same
Proficiency Levels
Proficiency Levels
|
|||
1 |
2 |
3 |
4 |
Includes total % of
all students in Proficiency Level 1 |
Includes total % of
all students in Proficiency Level 2 |
Includes total % of
all students in Proficiency Level 3 |
Includes total % of
all students in Proficiency Level 4 |
Proficiency levels vary by state; the four in this table are just examples and
could represent labels like the following:
1 = novice, failing, unsatisfactory; 2 = partially proficient, needs
improvement; 3 = proficient, meets expectations; 4 = exceeds expectations, advanced.
Pros. There are several pro-Model 1
statements. Among them are the following reasons why an approach that places all
students in the same proficiency levels might be positive:
•
The scores of alternate assessments are valued as equal to the scores of general
assessments. The policy benefits of treating the scores as the same are viewed
as outweighing the technical soundness concerns about combining scores from
different assessments in the same report.
•
One policy benefit is that schools are encouraged to take responsibility for the
learning of all students.
•
The unit of reporting or accountability (classroom, school, or district) does
not perceive that “the scores of those students pull down the ratings.” In fact,
the alternate assessment scores may actually improve the overall ratings of a
classroom, school, or district when the scores from the alternate assessment
have an equal chance to be high and are counted the same as a high score from
the general assessments.
Cons. There are several statements that
can be made about Model 1 that are contrary to its support. Among them are the
following reasons why an approach that places all students in the same
proficiency levels might be negative:
•
The assessments are different but are reported together, an approach that is
viewed by some as “statistically unsound.”
•
This model may be inappropriate when the state has assessments with high stakes
for students (e.g., diploma). When the alternate assessment is used for high
stakes for students, a different model (perhaps with skills assessed on the
alternate assessment shown at a level comparable to skills assessed on the
general assessment) may be needed when students must demonstrate proficiency
related to a grade level benchmark to earn a diploma.
Implications. Model 1 has a number of
implications for its use. Among these are the following implications:
•
Model 1 currently is considered by states where the unit of reporting or
accountability is the school or the district, but not for individual students.
These states tend to have a stronger focus on program evaluation and
improvement.
•
Reports of combined scores may be difficult to interpret and explain, unless
scores are also disaggregated. Reports are sometimes accompanied by text that
explains that different assessments are reflected in the scores; these approach
may be needed for clearer interpretation and understanding.
Model 2
Model 2 (see Figure 2) is described as the “apples + oranges = fruit” model (Roeber, 2001). It acknowledges that the general assessment and alternate assessment are different measures and does not try to mix “apples” and “oranges.” Instead, it allows that a score on the alternate assessment holds the same value as a score in the same proficiency level on the general assessment and can be reported as “fruit.” In other words, the effect of earning a “2” on either assessment would be the same for educators in that they would investigate how instruction might be improved for both students if they received a score that was below “acceptable” relative to the scoring system.
Figure 2.
Model 2, Different Proficiency Levels Treated as Same
General Assessment |
Combined |
|
General Assessment Proficiency Level 1 GA description and GA %
|
GA + AA
Includes total % of
students in both |
Alternate Assessment Proficiency Level 1 AA description and AA %
|
General Assessment Proficiency Level 2 GA description and GA %
|
GA + AA Includes total % of
students in both |
Alternate Assessment Proficiency Level 2 AA description and AA %
|
General Assessment Proficiency Level 3 GA description and GA %
|
GA + AA Includes total % of
students in both |
Alternate Assessment Proficiency Level 3 AA description and AA %
|
General Assessment Proficiency Level 4 GA description and GA %
|
GA + AA Includes total % of
students in both |
General Assessment Proficiency Level 4 AA description and AA %
|
Note: GA = general assessment; AA – alternate
assessment
Proficiency levels vary by state; the four in this table are just examples and
could represent labels like the following:
1 = novice, failing, unsatisfactory; 2 = partially proficient, needs
improvement; 3 = proficient, meets expectations; 4 = exceeds expectations, advanced.
Pros. There are several pro-Model 2
statements. Among them are the following reasons why an approach that considers
the proficiency levels to be different, but counts them as the same might be
positive:
•
The same value operates in Model 2 as in Model 1 in that the scores of the
alternate assessments are valued as equal to the scores of the general
assessments.
•
This model encourages schools to take responsibility for the learning of all
students because all count in the same way.
•
The unit of reporting and accountability (classroom, school, or district) does
not perceive that “the scores of those students pull down the ratings.”
Alternate assessment scores may actually improve the overall ratings or a
classroom, school, or district.
•
The assessments are different and are reported separately as well as together;
this fosters clarity and discourages confusion.
Cons. There are several statements that
can be made about Model 2 that are contrary to its use. Among them are the
following reasons why an approach that places all students in different
proficiency levels, but then merges them might be negative:
•
Some might argue that this approach is still “statistically unsound,” in that
the aggregation is technically not appropriate.
•
When the alternate assessment is used for high stakes purposes in a high stakes
for students environment, there may be a report that shows the achievement level
of alternately assessed students at a level comparable to generally assessed
students.
•
The report format may be difficult for parents to interpret.
Implications. Model 2 has a number of
implications for its use. Among these are the following implications:
•
This approach reaps the benefits of equitable consequences while avoiding the
potential misinterpretation that the knowledge and skills demonstrated on the
alternate assessment are the same as those demonstrated on the general
assessment.
Model 3
In Model 3 (see Figure 3), there can be no aggregation by proficiency level, since the number of proficiency levels on the alternate assessment is intentionally different from the number of proficiency levels on the general assessment. The total number of students in the denominators of the alternate assessment and the general assessment may or may not be summed to ensure that there is accounting for 100% of the students.
Figure 3.
Model 3, Different Proficiency Levels
Alternate Assessment Proficiency Levels |
||
Alternate Assessment |
Alternate
Assessment Proficiency Level 2 |
Alternate
Assessment Proficiency Level 3 |
Alternate Assessment Proficiency Levels |
|||
General Assessment |
General
Assessment Proficiency Level 2 |
General
Assessment Proficiency Level 3 |
General
Assessment Proficiency Level 4 |
Pros. There are several pro statements that can be made about Model 3.
Included among them are the following:
•
There is a clear distinction between the assessments. Each operates as a
separate entity with separate rating scales.
•
The proficiency levels may be named differently, thus avoiding reporting
students with significant disabilities in categories labeled as “failing” or
“unsatisfactory.”
•
Statistical soundness issues resulting from the aggregation of proficiency
levels from different assessments are avoided.
Cons. Statements can also be made about
Model 3 that are contrary to its support. The following are among these:
•
If states do not sum the number of students in both denominators to create a
single denominator, it will be easier to leave some students out of the
accountability system.
•
Scores on the alternate assessment will not be easy to use for accountability
purposes, since they represent a very small number of students who will not fit
into the reporting system developed for the majority.
Implications. Model 3 has several
implications for its use. Among the implications are the following:
•
It will be difficult to aggregate scores in the future, if that becomes
necessary.
•
Reports to the public on the achievement of students taking the alternate
assessment may be difficult since the number of students is often so small that
it may fall below a state’s minimal number for reporting.
Model 4
Model 4 is shown in Figure 4. This model is based on an alternate assessment development process in which the general standards were expanded for the alternate assessment by being mapped backwards from the grade level benchmarks. This process allows for skills assessed by the alternate assessment to begin at a lower level than a student must have to show proficiency in the general assessment. Often, these lower levels on the alternate assessment correspond to the “failing” level of the general assessment. Still, in this model, it is possible for a student who is difficult to assess, such as a Dr. Stephen Hawking or a Helen Keller, to use the alternate assessment process to demonstrate achievement on higher level skills comparable to those in the general assessment. If there were high stakes for students, such as earning a diploma, a student in this type of alternate assessment would be able to demonstrate skills to earn a diploma. It is possible to find alternate assessment scores reaching into levels 3 and 4 on the general assessment, which would then be comparable to the skills demonstrated on the general paper and pencil tests.
Figure 4. Model 4, Overlapped Proficiency Levels
General Assessment Proficiency Level 1 |
General Assessment Proficiency Level 2 |
General Assessment Proficiency Level 3 |
General Assessment Proficiency Level 4 |
|||
Alternate Assessment
Proficiency Level
|
|
|
|
|||
1
|
2 |
3
|
4
|
|
Alternate Assessment Proficiency Level 5 |
Alternate Assessment Proficiency Level 6 |
|
|
|||||
|
|
|
|
Pros. Model 4 has several positive
aspects to it. Included among the pro statements that can be made for Model 4
are the following:
•
The scales of the alternate assessment and the general assessment are arranged
to show an accurate relationship between the different skills demonstrated on
the different assessments based on how the alternate assessment was developed.
•
The alternate assessment scale allows skills to be demonstrated on the alternate
assessment in the higher levels of the general assessment.
•
The names of the three proficiency levels on the alternate assessment can be
different from the lowest level of the general assessment levels into which they
are embedded, thus avoiding objectionable labels, such as “failing.”
Cons. Statements can also be made about
Model 4 that are contrary to its support. The following are among these:
•
Most students in the alternate assessment will be perceived as operating in the
“failing” or lowest category.
•
If schools are the units of accountability, students in the alternate assessment
may be perceived as lowering the ratings of the school.
•
Aggregation of scores from the alternate assessment and the general assessment
will load on the lowest general assessment proficiency level.
•
It is challenging technically to accurately align the two scales, since students
take either the general assessment or the alternate assessment.
Implications. Model 4 has several
implications for its use. Among the implications are the following:
•
When there are high stakes for students, it will be necessary to validate that
scores earned on the alternate assessment in the diploma-granted categories are
comparable to scores earned on the general assessment.
•
It is important to try to have a group of students who participate in both the
alternate assessment and the general assessment. If a group of students
participated in both assessments, it would be possible to scale the scores of
the alternate assessment and the general assessment on a continuous scale.
Model 5
Model 5 is shown in Figure 5. This model puts all of the scores from the alternate assessment into a proficiency level below all of the proficiency levels on the general assessment. There are no proficiency level differences within the alternate assessment category. All students appear in the denominator.
Figure 5. Model 5,
Lowest Possible Proficiency Level for Alternate Assessment
Proficiency Level 0 (Alternate Assessment) |
Proficiency Level 1 (Alternate Assessment) |
Proficiency Level 2 (Alternate Assessment) |
Proficiency Level 3 (Alternate Assessment) |
Proficiency Level 4 (Alternate Assessment) |
Pros. There are not as many obvious pro
statements that can be made about the approach represented by Model 5. However,
two statements that have repeatedly been made are the following:
•
All students can appear in the denominator.
•
This approach maintains the integrity of a single high standard.
Cons. Several statements that are
“cons” to this approach have been identified. They are as follows:
•
The alternate assessment does not add value to the assessment system.
•
A state may be required to justify that all students who took the alternate
assessment are below proficiency level 1 of the general assessment.
•
The designation of the scores from the alternate assessment as zero may have the
same effect as the practice of exempting students from the assessment.
•
Assigning the lowest proficiency level scores provides no incentive for
improving services or achievement for students in the alternate assessment
because it does not recognize improvement in performance.
Implications. Model 5 has several
implications for its use. Among the implications are the following:
•
Educators may perceive the alternate assessment’s purpose solely as satisfying
mandates, but providing no useful instructional information.
•
The value of assessing, and therefore educating, students who will not achieve a
score above a zero may be questioned.
•
An alternative to this model is one in which all of the students who took the
alternate assessment are lumped together into an “alternately assessed”
category, which does not count in terms of their performance.
Model 6
Model 6 puts all of the scores from the alternate assessment into a category called “alternately assessed,” which counts the alternate assessment students as having participated, but does not include any performance information in the reports. All students appear in the denominator.
Figure 6. Model 6,
No Alternate Assessment Proficiency Levels
Alternately Assessed |
Proficiency Level 1 (General Assessment) |
Proficiency Level 2 (General Assessment) |
Proficiency Level 3 (General Assessment) |
Proficiency Level 4 (General Assessment) |
Pros. A few positive statements can be
made about the approach represented by Model 6. Included among them are the
following:
•
All students can appear in the denominator.
•
There is no statistical confusion, since no results are reported.
Cons. Several negative statements also
can be made about the Model 6 approach. The following are among these:
•
The alternate assessment does not add value to the assessment system.
•
When no results are published, instructional information is lacking.
•
The designation of the scores from the alternate assessment as not counting in
any way, other than as participation, may have the same effect as the practice
of exempting students from the assessment.
•
Assigning the lowest proficiency level scores provides no incentive for
improving services or achievement for students in the alternate assessment,
because it does not recognize improvement in performance.
Implications. Model 6 has several
implications for its use. Among them are the following:
•
Educators may perceive the alternate assessment’s purpose solely as satisfying
mandates, but providing no useful instructional information.
•
The value of educating or assessing students whose achievement will not be
reported may be questioned by educators.
This is the first year, 2001, that most states will
publish public reports of their alternate assessment results. The models
included here reflect a range of approaches that have either been suggested or
implemented by the 50 states. Other models are likely to emerge as states gauge
the impact of the reporting formats they select.
The reporting models that have been identified thus far
bring to light a realization that alternate assessments are part of an
assessment system. While these assessments may have been developed by small
teams of special educators (not in all states, of course, but in many), they
must now be situated within an assessment program that includes all students.
The existence of alternate assessments causes states to reflect on all of the
components of the total system. Conversations about accommodations, non-standard
accommodations and alternate assessment options have been renewed in many states
now that broadly granted exemptions for some special students are no longer
possible.
The variety of methods created to report the results of
alternate assessments demonstrate the struggle of states to incorporate these
new assessments into an existing structure – one that previously did not have to
address the achievement of students with significant needs, or in many cases,
even their presence. There are states that clearly have all of their students in
state reports, and states that have clearly described how all of their students
with disabilities are doing.
There are many ways to make visible the achievement of
students with disabilities in state accountability systems. The interpretation
of federal legislation relative to state practices will surely guide future
practice. Thus, it is important to keep track of the various models that are
used, to explore (as done in this paper) the potential pros and cons about each
approach, as well as the implications of the use of each. Following this, it
will be extremely important to monitor the impact of the different approaches
over time.
Cohen, M. (2000, April 6). Letter and attachment
(Summary guidance on the inclusion requirement for Title I final assessments).
Washington, DC: Office of the Assistant Secretary for Elementary and Secondary
Education.
Thompson, S. J., & Thurlow, M. L. (2001).
2001 State special education outcomes: A report on activities at the
beginning of a new decade. Minneapolis, MN: University of Minnesota,
National Center on Educational Outcomes.
Thurlow, M. L., & Wiener, D. (2000). Non-approved
accommodations: Recommendations for use and reporting (Policy Directions
11). Minneapolis, MN: University of Minnesota, National Center on Educational
Outcomes.