Models for Reporting the Results of Alternate Assessments within State Accountability Systems


NCEO Synthesis Report 39

Published by the National Center on Educational Outcomes

Prepared by:

Sue Bechard
Measured Progress

September 2001


Any or all portions of this document may be reproduced and distributed without prior permission, provided the source is cited as:

Bechard, S. (2001). Models for reporting the results of alternate assessments within state accountability systems (Synthesis Report 39). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved [today's date], from the World Wide Web: http://cehd.umn.edu/NCEO/OnlinePubs/Synthesis39.html


Executive Summary

Reporting the scores of students with disabilities participating in alternate assessments raises a number of  challenges, including those surrounding concerns about statistical soundness, as well as those related to the different purposes and focuses that characterize current alternate assessments.   Across the nation, states have reached different decisions about how to report the results of their alternate assessments.  This report summarizes six models currently under construction, or in some cases, already being used by states.  Using proficiency levels as a common reporting approach, the six models are:

      Model 1:   Same proficiency levels for general assessment and alternate assessment

      Model 2:       Different proficiency levels for general and alternate assessments are treated
                     as the same

      Model 3:       Different proficiency levels for general assessment and alternate assessment

      Model 4:       Overlapped proficiency levels for general assessment and alternate assessment

      Model 5:  Lowest possible proficiency level for alternate assessment

      Model 6:  No alternate assessment proficiency levels

The pros and cons of each of the six models are addressed, along with the implications of using each model.  It will be important to monitor the impact of the different approaches over time.


Overview

In response to the 1997 reauthorization of the Individuals with Disabilities Education Act (IDEA 97) and Title I of the Elementary and Secondary Education Act (ESEA), states are now conducting alternate assessments for students with disabilities who cannot participate in general state assessments, even with accommodations or modifications. The work thus far has involved development of many different assessment strategies nationally, including checklists, reviews of records, surveys, performance events, documentation of progress on IEPs, and the collection of various types of evidence into paper or electronic portfolios (Thompson & Thurlow, 2001). Since students with disabilities participate in general assessments with and without accommodations, the alternate assessment population represents only a segment of students with disabilities and generally a very small segment of the total student population. Generally, states have identified up to 2.5% of the total student population or about 20% of students with disabilities as appropriate for their alternate assessments.

States are at the point of deciding how their alternate assessments will be scored and reported. Regardless of the manner in which the assessments were conducted, or the extent to which reliability and validity of scores have been established, the results are to be reported publicly. IDEA and Title I requirements are not prescriptive about how results are to be reported. IDEA 97 (Section 300.139) requires states to publicly report on alternate assessment participation and performance (see Table 1). Title I (Section 1111) requires that states disaggregate the results for students with disabilities compared to nondisabled students, and to provide for the reporting of results to be included in a public report on school progress. According to Summary Guidance on the Inclusion Requirement for Title I Final Assessments (Cohen, 2000), “Whatever assessment approach is taken [referring to standard assessment, assessment with accommodations, or alternate assessment], the scores of students with disabilities must be included in the assessment system for purposes of public reporting and school and district accountability” (p. 2).

As states are determining how the results of alternate assessments will be reported, the question arises as to how the results will be presented in relation to the reports of their general assessments. This paper presents six models that are currently in use or being considered to situate the alternate assessment results within states’ reporting systems.

 

Issues that Have an Impact on Reporting Decisions

Several factors potentially could have an impact on decisions about reporting alternate assessment results. The three addressed here are among the more salient within the context of standards-based reform.

 

Statistical Soundness

There is quite a bit of controversy over the concept of “statistically sound.” It is a discussion that relates to the soundness of the scores on the assessment (reliability and validity), the aggregation of the scores from alternate assessments with general assessments, and the aggregation of scores from general assessments administered with standard and non-standard administrations (see Thurlow & Wiener, 2000).

It is important to continue to address these issues. My purpose here is not to explore the technical issues involved in the aggregation of scores from alternate assessments with scores from general assessment, but rather to identify different ways in which it could be done. Throughout this discussion, however, it is important to recognize that the technical issues have a significant impact on the discussion of how scores are reported. Still, even when scores are determined not to be “statistically sound” or when it has been determined that they will not be aggregated with other scores for reporting, the federal mandates suggest that they be visible.

 

Purpose and Focus

There are numerous variables that have an impact on a state’s decision about how scores will be reported. One is the purpose of the assessment. Different types of reports may be used if the assessment will be used for instructional programming rather than for accountability purposes, or to compare schools.

Several types of reports are under discussion in many states. The following four types exemplify some of the options states are considering. One type of report includes all students on all assessments (100% of the total student population). Another includes all students on the general assessment with or without accommodations, and in some states, with non-standard accommodations (approximately 98% of the total student population). A third type shows all students with disabilities (approximately 10% of the total student population) on all assessments. Last, the report may display the results of students with disabilities on the alternate assessment, sometimes including students in the general assessment with non-standard accommodations, or taking off-level or out-of-level tests. There is almost as much variability in the reports as there are states.

Embedded in the purpose of assessment are determinations of what is assessed – the focus of the assessment. Some states, such as Kentucky, South Carolina, Tennessee and Rhode Island, developed rubrics that focus not only on student achievement but also directly evaluate programs (Thompson & Thurlow, 2001). This is in contrast to states, such as Massachusetts and Colorado, which have determined that student achievement will be the only indicator of program quality or improvement. This focus of the assessment also has an impact on how scores are reported.

 

Stakes

The consequences of the assessment bring other considerations for reporting. A state that uses the assessment to determine graduation or grade promotion will likely have different reporting requirements than states with school or district-level consequences. At this time, two states (Massachusetts and Ohio) are considering the use of their alternate assessments as a way for students to earn a state diploma. Other states see the alternate assessment as a path to a different certificate. In some states, the report format reflects the decision that the skills required for proficiency on the alternate assessment are at a lower level than the skills required for proficiency on the general assessment.


Models for Reporting

This is the first year that most states will produce reports on alternate assessments. Many approaches to reporting the results of alternate assessments are emerging as states consider the purposes of their assessments, the statistics involved, the requirements of their accountability systems, the federal requirements, and the stakes attached. In viewing the various approaches, there seem to be six models of reporting currently under construction. While these models probably are not exhaustive, I present them here to illustrate simply and graphically some of the options that exist at this time.

All states have at least three levels of proficiency. However, most states report their general assessment results using four levels, some use more levels. For my purpose here, the four levels are used to demonstrate the relationship of the alternate assessment to the general assessment. The pros, cons, and implications of each model are presented also.

 

Model 1

In Model 1 (shown in Figure 1), the scores of students in the alternate assessment are placed into one of four levels of proficiency, just as the general assessment are. When reported, the alternate assessment scores are aggregated with the general assessment scores in the appropriate corresponding proficiency category. The scores of the alternate assessment carry the same weight in the reporting (and perhaps in the accountability system) as do the scores of the general assessment.

Figure 1. Model 1, Same Proficiency Levels

Proficiency Levels

 

1

 

2

 

3

 

4

 GA + AA

Includes total % of all students in Proficiency Level 1

 GA + AA

Includes total % of all students in Proficiency Level 2

 GA + AA

Includes total % of all students in Proficiency Level 3

 GA + AA

Includes total % of all students in Proficiency Level 4

 Note:  GA = general assessment; AA – alternate assessment

Proficiency levels vary by state; the four in this table are just examples and could represent labels like the following:   1 = novice, failing, unsatisfactory; 2 = partially proficient, needs improvement;  3 = proficient, meets expectations;  4 = exceeds expectations, advanced.

 

Pros. There are several pro-Model 1 statements. Among them are the following reasons why an approach that places all students in the same proficiency levels might be positive:

     The scores of alternate assessments are valued as equal to the scores of general assessments. The policy benefits of treating the scores as the same are viewed as outweighing the technical soundness concerns about combining scores from different assessments in the same report.

     One policy benefit is that schools are encouraged to take responsibility for the learning of all students.

     The unit of reporting or accountability (classroom, school, or district) does not perceive that “the scores of those students pull down the ratings.” In fact, the alternate assessment scores may actually improve the overall ratings of a classroom, school, or district when the scores from the alternate assessment have an equal chance to be high and are counted the same as a high score from the general assessments.

Cons. There are several statements that can be made about Model 1 that are contrary to its support. Among them are the following reasons why an approach that places all students in the same proficiency levels might be negative:

     The assessments are different but are reported together, an approach that is viewed by some as “statistically unsound.”

     This model may be inappropriate when the state has assessments with high stakes for students (e.g., diploma). When the alternate assessment is used for high stakes for students, a different model (perhaps with skills assessed on the alternate assessment shown at a level comparable to skills assessed on the general assessment) may be needed when students must demonstrate proficiency related to a grade level benchmark to earn a diploma.

Implications. Model 1 has a number of implications for its use. Among these are the following implications:

     Model 1 currently is considered by states where the unit of reporting or accountability is the school or the district, but not for individual students. These states tend to have a stronger focus on program evaluation and improvement.

     Reports of combined scores may be difficult to interpret and explain, unless scores are also disaggregated. Reports are sometimes accompanied by text that explains that different assessments are reflected in the scores; these approach may be needed for clearer interpretation and understanding.

 

Model 2

Model 2 (see Figure 2) is described as the “apples + oranges = fruit” model (Roeber, 2001). It acknowledges that the general assessment and alternate assessment are different measures and does not try to mix “apples” and “oranges.” Instead, it allows that a score on the alternate assessment holds the same value as a score in the same proficiency level on the general assessment and can be reported as “fruit.” In other words, the effect of earning a “2” on either assessment would be the same for educators in that they would investigate how instruction might be improved for both students if they received a score that was below “acceptable” relative to the scoring system.

Figure 2. Model 2, Different Proficiency Levels Treated as Same

General Assessment

Combined

 Alternate Assessment

 

General Assessment Proficiency Level 1

GA description and GA %

 

GA + AA

Includes total % of  students in both GA Proficiency Level 1 and AA Proficiency Level 1

 

Alternate Assessment Proficiency Level 1

AA description and AA %

 

General Assessment Proficiency Level 2

GA description and GA %

 

GA + AA

Includes total % of   students  in both GA Proficiency Level 2 and AA Proficiency Level 2

 

Alternate Assessment Proficiency Level 2

AA description and AA %

 

General Assessment Proficiency Level 3

GA description and GA %

 

GA + AA

Includes total % of  students in both GA Proficiency Level 3 and AA Proficiency Level 3

 

Alternate Assessment Proficiency Level 3

AA description and AA %

 

General Assessment Proficiency Level 4

GA description and GA %

 

GA + AA

Includes total % of  students in both GA Proficiency Level 4 and AA Proficiency Level 4

 

General Assessment Proficiency Level 4

AA description and AA %


Note:  GA = general assessment; AA – alternate assessment

Proficiency levels vary by state; the four in this table are just examples and could represent labels like the following:   1 = novice, failing, unsatisfactory; 2 = partially proficient, needs improvement;  3 = proficient, meets expectations;  4 = exceeds expectations, advanced.

Pros. There are several pro-Model 2 statements. Among them are the following reasons why an approach that considers the proficiency levels to be different, but counts them as the same might be positive:

     The same value operates in Model 2 as in Model 1 in that the scores of the alternate assessments are valued as equal to the scores of the general assessments.

     This model encourages schools to take responsibility for the learning of all students because all count in the same way.

     The unit of reporting and accountability (classroom, school, or district) does not perceive that “the scores of those students pull down the ratings.” Alternate assessment scores may actually improve the overall ratings or a classroom, school, or district.

     The assessments are different and are reported separately as well as together; this fosters clarity and discourages confusion.

Cons. There are several statements that can be made about Model 2 that are contrary to its use. Among them are the following reasons why an approach that places all students in different proficiency levels, but then merges them might be negative:

     Some might argue that this approach is still “statistically unsound,” in that the aggregation is technically not appropriate.

     When the alternate assessment is used for high stakes purposes in a high stakes for students environment, there may be a report that shows the achievement level of alternately assessed students at a level comparable to generally assessed students.

     The report format may be difficult for parents to interpret.

Implications. Model 2 has a number of implications for its use. Among these are the following implications:

     This approach reaps the benefits of equitable consequences while avoiding the potential misinterpretation that the knowledge and skills demonstrated on the alternate assessment are the same as those demonstrated on the general assessment.

 

Model 3

In Model 3 (see Figure 3), there can be no aggregation by proficiency level, since the number of proficiency levels on the alternate assessment is intentionally different from the number of proficiency levels on the general assessment. The total number of students in the denominators of the alternate assessment and the general assessment may or may not be summed to ensure that there is accounting for 100% of the students.

Figure 3. Model 3, Different Proficiency Levels

Alternate Assessment Proficiency Levels

Alternate Assessment
Proficiency Level 1

Alternate Assessment
Proficiency Level 2
Alternate Assessment
Proficiency Level 3

 

Alternate Assessment Proficiency Levels

General Assessment
Proficiency Level 1

General Assessment
Proficiency Level 2
General Assessment
Proficiency Level 3
General Assessment
Proficiency Level 4


Pros
. There are several pro statements that can be made about Model 3. Included among them are the following:

     There is a clear distinction between the assessments. Each operates as a separate entity with separate rating scales.

     The proficiency levels may be named differently, thus avoiding reporting students with significant disabilities in categories labeled as “failing” or “unsatisfactory.”

     Statistical soundness issues resulting from the aggregation of proficiency levels from different assessments are avoided.

Cons. Statements can also be made about Model 3 that are contrary to its support. The following are among these:

     If states do not sum the number of students in both denominators to create a single denominator, it will be easier to leave some students out of the accountability system.

     Scores on the alternate assessment will not be easy to use for accountability purposes, since they represent a very small number of students who will not fit into the reporting system developed for the majority.

Implications. Model 3 has several implications for its use. Among the implications are the following:

     It will be difficult to aggregate scores in the future, if that becomes necessary.

     Reports to the public on the achievement of students taking the alternate assessment may be difficult since the number of students is often so small that it may fall below a state’s minimal number for reporting.

 

Model 4

Model 4 is shown in Figure 4. This model is based on an alternate assessment development process in which the general standards were expanded for the alternate assessment by being mapped backwards from the grade level benchmarks. This process allows for skills assessed by the alternate assessment to begin at a lower level than a student must have to show proficiency in the general assessment. Often, these lower levels on the alternate assessment correspond to the “failing” level of the general assessment. Still, in this model, it is possible for a student who is difficult to assess, such as a Dr. Stephen Hawking or a Helen Keller, to use the alternate assessment process to demonstrate achievement on higher level skills comparable to those in the general assessment. If there were high stakes for students, such as earning a diploma, a student in this type of alternate assessment would be able to demonstrate skills to earn a diploma. It is possible to find alternate assessment scores reaching into levels 3 and 4 on the general assessment, which would then be comparable to the skills demonstrated on the general paper and pencil tests.

 

Figure 4.  Model 4, Overlapped Proficiency Levels

 

General Assessment Proficiency Level 1

 

General Assessment Proficiency Level 2

 

General Assessment Proficiency Level 3

 

General Assessment Proficiency Level 4

Alternate Assessment Proficiency Level

 

 

 

 

1

 

 

2

 

3

 

 

4

 

 

 

Alternate Assessment Proficiency Level 5

 

Alternate Assessment Proficiency Level 6

 

 

 

 

 

 

Pros. Model 4 has several positive aspects to it. Included among the pro statements that can be made for Model 4 are the following:

     The scales of the alternate assessment and the general assessment are arranged to show an accurate relationship between the different skills demonstrated on the different assessments based on how the alternate assessment was developed.

     The alternate assessment scale allows skills to be demonstrated on the alternate assessment in the higher levels of the general assessment.

     The names of the three proficiency levels on the alternate assessment can be different from the lowest level of the general assessment levels into which they are embedded, thus avoiding objectionable labels, such as “failing.”

Cons. Statements can also be made about Model 4 that are contrary to its support. The following are among these:

     Most students in the alternate assessment will be perceived as operating in the “failing” or lowest category.

     If schools are the units of accountability, students in the alternate assessment may be perceived as lowering the ratings of the school.

     Aggregation of scores from the alternate assessment and the general assessment will load on the lowest general assessment proficiency level.

     It is challenging technically to accurately align the two scales, since students take either the general assessment or the alternate assessment.

Implications. Model 4 has several implications for its use. Among the implications are the following:

     When there are high stakes for students, it will be necessary to validate that scores earned on the alternate assessment in the diploma-granted categories are comparable to scores earned on the general assessment.

     It is important to try to have a group of students who participate in both the alternate assessment and the general assessment. If a group of students participated in both assessments, it would be possible to scale the scores of the alternate assessment and the general assessment on a continuous scale.

 

Model 5

Model 5 is shown in Figure 5. This model puts all of the scores from the alternate assessment into a proficiency level below all of the proficiency levels on the general assessment. There are no proficiency level differences within the alternate assessment category. All students appear in the denominator.

Figure 5.   Model 5, Lowest Possible Proficiency Level for Alternate Assessment

Proficiency
Level 0
(Alternate Assessment)
Proficiency
Level 1
(Alternate Assessment)
Proficiency
Level 2
(Alternate Assessment)
Proficiency
Level 3
(Alternate Assessment)
Proficiency
Level 4
(Alternate Assessment)

 

Pros. There are not as many obvious pro statements that can be made about the approach represented by Model 5. However, two statements that have repeatedly been made are the following:

     All students can appear in the denominator.

     This approach maintains the integrity of a single high standard.

Cons. Several statements that are “cons” to this approach have been identified. They are as follows:

     The alternate assessment does not add value to the assessment system.

     A state may be required to justify that all students who took the alternate assessment are below proficiency level 1 of the general assessment.

     The designation of the scores from the alternate assessment as zero may have the same effect as the practice of exempting students from the assessment.

     Assigning the lowest proficiency level scores provides no incentive for improving services or achievement for students in the alternate assessment because it does not recognize improvement in performance.

Implications. Model 5 has several implications for its use. Among the implications are the following:

     Educators may perceive the alternate assessment’s purpose solely as satisfying mandates, but providing no useful instructional information.

     The value of assessing, and therefore educating, students who will not achieve a score above a zero may be questioned.

     An alternative to this model is one in which all of the students who took the alternate assessment are lumped together into an “alternately assessed” category, which does not count in terms of their performance.

 

Model 6

Model 6 puts all of the scores from the alternate assessment into a category called “alternately assessed,” which counts the alternate assessment students as having participated, but does not include any performance information in the reports. All students appear in the denominator.

Figure 6.   Model 6, No Alternate Assessment Proficiency Levels

Alternately Assessed Proficiency
Level 1
(General Assessment)
Proficiency
Level 2
(General Assessment)
Proficiency
Level 3
(General Assessment)
Proficiency
Level 4
(General Assessment)

 

Pros. A few positive statements can be made about the approach represented by Model 6. Included among them are the following:

     All students can appear in the denominator.

     There is no statistical confusion, since no results are reported.

Cons. Several negative statements also can be made about the Model 6 approach. The following are among these:

     The alternate assessment does not add value to the assessment system.

     When no results are published, instructional information is lacking.

     The designation of the scores from the alternate assessment as not counting in any way, other than as participation, may have the same effect as the practice of exempting students from the assessment.

     Assigning the lowest proficiency level scores provides no incentive for improving services or achievement for students in the alternate assessment, because it does not recognize improvement in performance.

Implications. Model 6 has several implications for its use. Among them are the following:

     Educators may perceive the alternate assessment’s purpose solely as satisfying mandates, but providing no useful instructional information.

     The value of educating or assessing students whose achievement will not be reported may be questioned by educators.


Conclusions

This is the first year, 2001, that most states will publish public reports of their alternate assessment results. The models included here reflect a range of approaches that have either been suggested or implemented by the 50 states. Other models are likely to emerge as states gauge the impact of the reporting formats they select.

The reporting models that have been identified thus far bring to light a realization that alternate assessments are part of an assessment system. While these assessments may have been developed by small teams of special educators (not in all states, of course, but in many), they must now be situated within an assessment program that includes all students. The existence of alternate assessments causes states to reflect on all of the components of the total system. Conversations about accommodations, non-standard accommodations and alternate assessment options have been renewed in many states now that broadly granted exemptions for some special students are no longer possible.

The variety of methods created to report the results of alternate assessments demonstrate the struggle of states to incorporate these new assessments into an existing structure – one that previously did not have to address the achievement of students with significant needs, or in many cases, even their presence. There are states that clearly have all of their students in state reports, and states that have clearly described how all of their students with disabilities are doing.

There are many ways to make visible the achievement of students with disabilities in state accountability systems. The interpretation of federal legislation relative to state practices will surely guide future practice. Thus, it is important to keep track of the various models that are used, to explore (as done in this paper) the potential pros and cons about each approach, as well as the implications of the use of each. Following this, it will be extremely important to monitor the impact of the different approaches over time.


References

Cohen, M. (2000, April 6). Letter and attachment (Summary guidance on the inclusion requirement for Title I final assessments). Washington, DC: Office of the Assistant Secretary for Elementary and Secondary Education.

Thompson, S. J., & Thurlow, M. L. (2001). 2001 State special education outcomes: A report on activities at the beginning of a new decade. Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.

Thurlow, M. L., & Wiener, D. (2000). Non-approved accommodations: Recommendations for use and reporting (Policy Directions 11). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.