Testing Students Out of Level in Large-Scale Assessments: What States Perceive and Believe

Out-of-Level Testing Project Report 5

Published by the National Center on Educational Outcomes

Prepared by Jane Minnema, Martha Thurlow, and Jim Scott

March 2001

This document has been archived by NCEO because some of the information it contains may be out of date.

Any or all portions of this document may be reproduced and distributed without prior permission, provided the source is cited as:

Minnema, J., Thurlow, M., & Scott., J. (2001). Testing students out of level in large-scale assessments: What states perceive and believe (Out-of-Level Testing Project Report 5). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved [today's date], from the World Wide Web: http://cehd.umn.edu/NCEO/OnlinePubs/OOLT5.html

Executive Summary

Out-of-level testing is the practice of administering a test at a level above or below the level that is generally recommended for a student based on his or her age or grade. Originally introduced in the 1970s as a way to more precisely measure student achievement growth as an index of program efficacy, today out-of-level testing has become a testing option for testing students with disabilities in large-scale assessment programs. Currently, there are 12 states (Alaska, Arizona, California, Connecticut, Delaware, Iowa, Louisiana, North Dakota, South Carolina, Utah, Vermont, and West Virginia) that test students out of level in state tests. However, the recent increase in implementing out-of-level testing has not occurred without controversy.

Complicating the controversy over using out-of-level testing in large-scale assessment programs is the lack of definitive research results that point to practical solutions. In fact, the existing literature on out-of-level testing seems to raise more questions about testing students out of level than it answers. Without the benefit of research-based guidelines or recommendations, differing opinions about out-of-level testing have arisen among policymakers, educators, and parents nationwide. To date, no study has described the differing opinions, perceptions, and practices involved in testing students out of level in large-scale assessment programs. Therefore, the purpose of this study is to describe state-level perspectives from the states that currently test students with disabilities out of level in large-scale assessment programs.

This report describes the results of a series of telephone interviews with state-level personnel from state educational agencies in those states that use out-of-level testing. An analysis of the narrative data yielded overarching themes of results that are organized according to the rationale for testing students out of level, the advantages and disadvantages to out-of-level testing, and the political context that surrounds the implementation of out-of-level testing programs. These results point to four concerns that are important considerations for policymakers and practitioners who strive to include all students with disabilities in large-scale assessments. First, fundamental issues about the effects of out-of-level testing on students with disabilities are not known. Second, the test validity and reliability issues inherent to testing students out of level are also not resolved in the literature. Third, sound procedures for reporting out-of-level test scores and using the test results for accountability purposes have not been determined. Finally, the overall merit of out-of-level testing has not been fully described or empirically tested. At best, these results suggest that testing students with disabilities out of level in large-scale assessment programs should be done cautiously.

Overview

Out-of-level testing, or the administration of a test at a level above or below the level generally recommended for students based on their age-grade level, is an assessment practice that dates back to the 1970s. Originally, out-of-level testing was used to measure individual student progress or program efficacy. While receiving little attention in the literature throughout the 1980s, out-of-level testing has re-emerged as an assessment practice in the late 1990s. Today, as educational reform efforts increasingly emphasize student and system accountability, the use of out-of-level testing has expanded as states strive to include all students in their statewide assessment programs. However, the implementation of out-of-level testing has not grown without controversy (Thurlow, Elliott, & Ysseldyke, 1999). In fact, this approach to testing remains a contentious issue among policymakers, educators, school administrators, parents, and other community members.

Proponents of out-of-level testing identify multiple benefits for this testing approach while opponents claim various disadvantages (Minnema, Thurlow, Bielinski, & Scott, 2000). The situation is further complicated by the multiple questions that persist in practice without research evidence to point to conclusive answers. For instance, when students are tested on-level, how useful are test results if the items do not measure what a student knows? Or, is it in the student’s best interest to experience frustration or anxiety when tested on material that is too difficult? Moreover, is it helpful to have test results that do not guide appropriate instructional decisions? On the other hand, how valid and reliable are the out-of-level test scores when converted back to in-level scores? If students are not tested on-level, will their classroom instruction challenge them to meet grade-level standards? How can out-of-level test scores be used appropriately within a high stakes accountability system? Further, how can out-of-level test scores be publicly reported in a meaningful way for accountability purposes?

These are some of the issues that surround the practice of testing students out of level. The need for empirical research that answers these important questions is critical. An important first step toward resolving some of these dilemmas is to better understand the current status of testing students out of level in statewide, large-scale assessment programs. The purpose of this report is to describe state-level perspectives that highlight the differing opinions and perceptions about testing students out of level. We also use these perspectives to identify key issues and make general recommendations that are important considerations when testing students out of level.

Method

In this qualitative study we used three data collection strategies to collect narrative data from state educational agencies within the 12 states that use out-of-level testing in large-scale statewide assessments (Alaska, Arizona, California, Connecticut, Delaware, Iowa, Louisiana, North Dakota, South Carolina, Utah, Vermont, and West Virginia). The primary source for these data was the state assessment directors. While we conducted the three data collection strategies simultaneously, we describe each strategy separately here.

First, we collected Internet data from the individual state Web sites that described the large-scale assessment programs for each state where out-of-level testing is currently implemented. The purpose of this data collection strategy was to provide a context for understanding the implementation of out-of-level testing within each state individually. These data included a description of the assessment(s) used (test name, type of test, subject areas tested, grades tested, dates of administration, and high stakes impact). We also collected policy information about test accommodations and out-of-level testing. In addition, we collected data that described the public reporting practices of assessment results at both the state and local level. It should be noted, however, that the Internet data collected for Iowa differed from the other 11 states since Iowa is one of two states nationwide that does not have a mandated standards-based, large-scale assessment program used statewide.

Second, we conducted a series of telephone interviews with one person in each state educational agency who was knowledgeable about the state’s out-of-level testing policy and practice. The same recruitment e-mail was sent to each state assessment director with a follow-up telephone call to schedule an interview appointment. Interview questions were emailed to each participant prior to the telephone interview. (See Appendix A for the telephone interview protocol.) It typically took 20 to 30 minutes to conduct an interview. All interviews were tape recorded and then transcribed for qualitative data analysis.

In five of the states, we interviewed either the state assessment director or the assistant director. The state assessment directors in the remaining seven states referred us to other staff who were more familiar with testing students out of level in their states. These participants included a program administrator, a program director, a consultant, an educational associate, an educational specialist, a coordinator, and a university professor. In one state, we used input from two people. All participants received a copy of this report for participating in this study. As a final data collection strategy, we emailed each participant follow up questions to the telephone interview. These questions addressed either missing interview information or clarified interview content to ensure accurate results.

In producing the final results of this study, we considered two levels of information. The first level of information is a more detailed look at the states that test students out of level to compare their policies and practices (see Out-of-Level Testing Report 4). A second level of information was more global in nature, which required analysis and interpretation of the telephone interviews as a composite set of data. Themes of results were generated through qualitative data analysis to begin describing the practice of testing students out of level nationwide. These themes were organized according to the five primary telephone interview questions. To complete this analysis, we read and re-read the interview data, question by question, to code the categories of information for all of the interviews as an entire group. We then merged these categories of information into meaningful themes of results. The final stage of this qualitative analysis involved verifying these results by a reviewer who conducted an independent analysis of one-fifth of the data set. The final analysis was adjusted according to any discrepancies between these two reviews.

State Perspectives on Out-of-level Testing

These themes of narrative results are organized according to the rationale for testing students out of level, the advantages and disadvantages to out-of-level testing, and the political context surrounding out-of-level testing policy. Each of these topics was the focus of one telephone interview question.

Rationale for Testing Students Out of Level

The qualitative analysis of the data set for this interview question yielded four themes of results (see Table 1). One state’s responses were not included in the analysis of the responses to this interview question. While this state has a long history of testing students out of level, the participant from that SEA was hesitant to fully answer all of the interview questions due to the Title 1 peer review process, which seems to discourage out-of-level testing. “Two weeks ago I would have had ‘a’ great answer … but since the end of June, we’ve had some conversations with Title 1 at the federal level. I’m not convinced that our position on out-of-level [testing] will remain the same.” The following themes of results are based on the responses from the remaining 11 states.

Table 1. Themes of Results on the Rationale for Testing Students Out of Level

Themes	Number of States Responding per Theme (Total N = 12)
Theme 1 – Some students’ assessment needs are met by testing at a lower level than the assigned grade level.	9
Theme 2 – Out-of-level testing is a means of including all students in an accountability system.	7
Theme 3 – Out-of-level testing is a practical solution to a costly assessment problem.	1
Theme 4 – The policy to test students out of level is a mandated policy.	3

Note: Several states’ responses fell into more than one of the themes.

Theme 1 — Some students’ assessment needs are met by testing at a lower level than the assigned grade level

“We test out of level for one primary reason, and that’s that we believe students need and should be tested at their instructional level. We don’t think it’s right to give students algebra problems when they’re working on third grade math.” Those states that test out of level because they believe that the out-of-level tests are fair and appropriate, expressed concern that the “regular statewide assessments were not appropriate” for all students. Testing below grade level was described as an appropriate and fair approach for testing those students who are striving to meet grade level content standards, but at a slower learning pace than their same grade peers.

One aim of out-of-level testing is to match a student’s instructional level to the test item content. Test results are then more usable for guiding classroom teachers in making good instructional decisions. Interviewees also reported that when test items measure the curricular content of a student’s instructional level, the testing experience for the student is less frustrating and causes less emotional trauma. “For our state testing program … there was concern among constituents, parents, students, and teachers that there were some students who were going to be unable to perform at a [certain] grade level and [on-level testing] would be inappropriate and frustrating for some students who were not operating at that level.”

Theme 2 – Out-of-level testing is a means of including all students in an accountability system

Several states viewed out-of-level testing as a “unique” accommodation for students with disabilities. “We are attempting to … be as inclusive as possible to students who are unable to participate in the on-level testing system.” Out-of-level testing provides a test score for those students who might otherwise not perform well enough to obtain a score on a grade level test. With testing “options in … our accountability assessments,” such as out-of-level testing, every student is entered into a local and state level database. When all students are considered for accountability purposes, states are able “to ensure the integrity of the assessment system.”

Out-of-level testing “ensures that we are not putting [students] into an alternate assessment who really don’t need that type of assessment. But at the same time to ensure that they’re not sitting down in front of a test where they can’t answer any of the questions.” States viewed out-of-level testing as an assessment option for those students whose academic skills fall in between a grade level assessment and an alternate assessment. These students are not striving to meet a different set of content standards as are those students for whom an alternate assessment is intended. Students for whom an out-of-level test is intended are striving to meet grade level standards, but at a lower performance level than the level tested by a grade level test.

Out-of-level testing was also viewed as maintaining the integrity of the assessment system when the psychometric properties of an assessment system are considered. Few states test student performance at all grade levels. Since adjacent grades are rarely tested in large-scale assessment programs, it is unlikely that the tests have overlapping items. Vertical equating is thus problematic. A common scale can only be calculated if the highest level questions on one test level overlaps with the lowest level questions on the next grade tested. For instance, the highest level test items that measure 4th grade performance would not overlap with the lowest level test items on an 8th grade assessment. The gap between 4th grade and 8th grade is too large to allow for common performance levels. Out-of-level tests that test performance within the “gap” between grades was thought to have “improved the accuracy of measurement” and yielded “usable data.”

Theme 3 – Out-of-level testing is a practical solution to a costly assessment problem.

Another rationale for administering out-of-level tests is characterized “as compensating for the inadequacy of the regular assessment system.” States have invested extensive time and resources in developing large-scale assessment systems, but find that the existing systems do not adequately meet all students’ assessment needs. However, these “assessments have so many positive points” that it would not be feasible or practical to “build one [assessment system] where out-of-level tests weren’t necessary.” Some of these respondents viewed out-of-level tests as an affordable solution that “still … have valid, accurate accountability data” by “giving school districts options so that they can appropriately assess kids.”

Theme 4 – The policy to test students out of level is a mandated policy.

Some states indicated that out-of-level tests are only administered because the state board of education had mandated the policy or a “command decision … was made here at the [state] department” to test out of level. It is interesting to note that both respondents that spoke to a mandated out-of-level testing policy also registered dissatisfaction with the testing policy. A third respondent indicated that there was no “state policy that defines a rationale for testing students out of grade level.” This respondent stated further that testing out of level is a local decision made by special education teachers where “the state policy … has encouraged districts or schools to not test out of grade level” by treating the out-of-level test as a nonstandard test administration.

Advantages of Out-of-Level Testing

The distinction between a rationale for out-of-level testing and the advantages of testing students out of level is an artificial one. Some of the statements that explain the rationale for out-of-level testing overlap with the advantages to out-of-level testing. Two themes did emerge, however, from the qualitative analysis of the responses to this interview question (see Table 2). These themes increase our understanding of a state-level perspective on out-of-level testing.

Table 2. Themes of Results on the Advantages of Testing Students Out of Level

Themes

Number of States
Responding per Theme
(Total N = 12)

Theme 1 – Out-of-level testing provides a more accurate measure of ability, which is better for students, parents, teachers, and policymakers.

Theme 2 – Out-of-level testing has no advantages.

Note: Several states did not make any statements that reflected either of these themes.

Theme 1 – Out-of-level testing provides a more accurate measure of ability, which is better for students, parents, teachers, and policymakers

Out-of-level testing provides the flexibility necessary to measure performance at the point where students are accessing the general education curriculum. When students are “tested on things that they should know,” the test results contain “very specific information that teachers and parents can get from participation in the out-of-level test.” Teachers, parents, and students understand academic performance according to the standards that the student is striving to achieve. Policymakers receive information about statewide academic achievement for making within-state comparisons. One respondent summarized this claim by saying, “I think that you have two choices if you don’t test out of level. You either have to develop a different assessment for those students or you have to test them on grade level on inappropriate material.”

Theme 2 – Out-of-level testing has no advantages

Two of the 12 states indicated no advantages to testing students out of level. When asked to speak to the advantages of out-of-level testing, one participant responded, “I don’t really like out-of-level testing. It kind of muddies the water for us …”

Disadvantages of Out-of-level Testing

Numerous disadvantages of testing students out of level emerged from this interview process regardless of whether the participants favored testing students out of level. Only one state saw no disadvantages to out-of-level testing. A second participant indicated that it was too early in the state’s experience of testing students out of level to identify any disadvantages. Three themes illustrate the suggested disadvantages of out-of-level testing according to the remaining 11 participants (see Table 3).

Table 3. Themes of Results on the Disadvantages of Testing Students Out of Level

Themes	Number of States Responding per Theme (Total N = 12)
Theme 1 – Out-of-level testing results do not necessarily add value to a large-scale assessment system.	3
Theme 2 – Incorporating out-of-level test scores into system and student accountability systems is problematic.	9
Theme 3 – In allowing out-of-level testing, large-scale assessment programs are vulnerable to assessment misuses.	3

Note: Several states’ responses fell into more than one of the themes.

Theme 1 – Out-of-level testing results do not necessarily add value to a large-scale assessment system

One of the aims of out-of-level testing is to obtain more accurate and usable test results. However, these participants indicated that out-of-level testing programs do not always achieve this purpose. By adjusting “the testing program to meet the needs of the student … you don’t get a totally accurate picture of the student’s abilities.” Students who are tested out of level receive tests that contain material that is not age-appropriate. In addition, the test is labeled with a lower grade level than the student’s assigned grade. If students “become defensive and do not want to take that test,” the resulting test scores could be inaccurate.

After administering out-of-level tests in its statewide assessment program, one state summed up its experience by saying, “We’ve learned after the first year of implementation [that] the way our program is constructed, the results are not particularly enlightening.” Similarly, another participant responded, “The main disadvantage [to testing students out of level] is that it’s hard to interpret the meaning of the test score once you get more than one grade level out.”

According to these participants, interpreting the meaning of out-of-level test scores is problematic for two reasons. First, the curricular constructs measured by an out-of-level test differ from those constructs measured by an on-grade-level test. For instance, when a 4th grade student is tested on 1st grade reading material, the test results indicate that the 4th grade student is learning to read not that he or she is able to read for meaning, as would be expected for a 4th grade student. The out-of-level test score doesn’t yield “any diagnostic information of what we can do to address the 4th grader’s reading issues.” In other words, testing students out of level may not provide usable test information. “The test isn’t long enough or specific enough in terms of reading competencies to help the teachers know what to do after they get that test score.”

Second, a student who is tested out of level is not compared to a grade level normative group. A norm-referenced test provides a “snapshot of where the student is on some continuum when compared with age-normal peer groups.” However, when a student is tested out of level, the reference group shifts to a younger normative population. Thinking again about the 4th grade student, an out-of-level test score would only indicate that he or she performed 50% higher than the 1st grade students who were tested. The out-of-level test score only indicates that a student is achieving at a level below his or her assigned grade, and above a segment of students who are enrolled in a lower age-grade level. Teachers, parents, and students alike are confused when they do not know how a student is achieving in comparison to his or her “grade normal peer group.”

Theme 2 – Incorporating out-of-level test scores into system and student accountability systems is problematic

One of the major dilemmas raised about out-of-level testing was the uncertainty about how to report out-of-level test results for system accountability purposes. “Questions still exist. How will all this [out-of-level test scores] be aggregated for state and federal reporting.” One participant stated that, “We don’t know what to do with the results here. ... But when you’re doing state accountability, out-of-level testing doesn’t make a lot of sense to me.” Other participants indicated that their “reporting mechanisms … are under development now, and one of the areas that we will be considering is how [to do the] reporting [for] all of the students who are taking out-of-level assessments.” Yet, another participant from a state that did have procedures in place for reporting out-of-level test scores was concerned “that the child [who is tested out of level] is going to be counted in level one,” the lowest level of performance. “… [A]s far as accountability, the school will get a little bit of credit for including the child in testing regardless of what their performance was.”

Generally speaking, most participants indicated that “out-of-level testing is the kind of thing you would do at the local level because you need to make some curriculum decisions.” Even so, these participants were able to identify disadvantages when using out-of-level testing scores for student accountability. For instance, some participants noted that state assessment programs contain inequitable features. In one state, students must meet the standard in reading and mathematics for grade promotion. As in most states, students are selected for out-of-level testing by an IEP team. Team members can make a student’s grade promotion decision, thereby eliminating any negative consequences or high stakes impact for the student with disabilities. However, for those students who are tested out of level, but do not have an IEP, grade promotion is dependent on passing statewide assessments. Some students have “very specific consequences relating to retention” and other students do not. Yet, another participant asked, “What message are you sending students if you allow out-of-level testing at grades 3, 6, and 8, and then don’t allow it at the high school level?” Testing practices appear to be inequitable when out-of-level testing is administered as part of a statewide assessment program.

A caveat to the above discussion must be highlighted to fairly present the patterns of results for this interview question. Not every state that allows out-of-level testing in statewide assessments raised concerns about using out-of-level test scores for accountability purposes. Four participants stated that out-of-level test scores could be reported appropriately in both student and system accountability programs. In these cases, states use equating procedures to convert out-of-level test scores to in-level test scores for public reporting so that all students who are tested receive an on-grade level test score. One participant summarized this point of view by saying, “I think that with caution and with structure and with a lot of monitoring to make sure that procedures are being implemented correctly, that it is in fact possible to make that balance between meeting kids’ needs and providing accurate scores for accountability.” It is interesting to note that the participants who indicated that out-of-level test scores could be reported appropriately in accountability programs did not always advocate for testing students out of level.

Reflecting a final disadvantage, participants also raised an instructional concern about allowing out-of-level testing in large-scale assessments. “We think that the main disadvantage is that the out-of-level testing does not address the curriculum material of the grade in which the student is enrolled.” When test items are not aligned with curricular content, “People may use out-of-level testing to not set challenging goals for students.” When “… you don’t know how every kid’s operating on their grade level with their same age peers,” instructional delivery may focus on a lower set of content standards than those standards expected for grade level performance.

Theme 3 – In allowing out-of-level testing, large-scale assessment programs are vulnerable to assessment misuses

Possibly the most serious concern about administering out-of-level tests within large-scale assessment programs is the temptation to exclude low scoring students from state level, aggregated performance reporting. “There’s always inappropriate uses … it’s a way to keep lower kid’s scores out of the mix for score accountability.” In this way, “Out-of-level would be used to inflate a schools’ scores or to make it look like a school is doing better than it might be doing.” Excluding lower performance test results is particularly tempting for those states that have high stakes for local school districts. In some states, administrators and teachers suffer the consequences of declining performance on statewide assessments. However, when out-of-level test scores are not properly reported for low-achieving students, true group performance is masked. The purpose of system accountability is defeated when school systems cannot be held accountable for all students’ academic performance, which includes any student who is not mastering on-grade level state standards at the same pace as his or her same age peers. Educators’ jobs may be saved, but at the cost of inaccurate public reporting and ineffective system accountability.

To complicate the situation further, low achieving students oftentimes have disabilities. In many states, students must have a disability and an IEP to be selected for out-of-level testing. IEP teams are expected to make accurate decisions about testing a student out of level. All states disseminate out-of-level testing information to local school personnel through statewide trainings, mailings, or Internet postings. Even so, one participant reported, “The ability of local administrators or educators to make decisions that … actually improve the quality of the instruments being used is questionable in some circumstances.”

In addition, as is true for all educational policy, out-of-level testing policy that is written at the state level cannot ensure consistent or high-quality implementation at the local level. “We know it’s [policy implementation] uneven, but how to address the unevenness is a problem when you’ve got essentially a moving target.” One participant aptly summed up the challenge to out-of-level testing policy by suggesting, “Without policies and procedures and without an audit and without documentation of eligibility, there’s clearly the potential for misuse of out-of-level assessment.”

Political Context Surrounding Out-of-Level Testing

Over the past few years, anecdotal reports have surfaced from both policymakers and educators that refer to the political nature of testing students out of level. It is generally known that some out-of-level testing policies were developed and implemented in contentious environments. However, to date, there are no data to substantiate these contentions. Thus, it is important to describe both the political context of testing students out of level and the resulting effects on assessment systems. Four themes of results are a first step toward verifying the political climate surrounding out-of-level testing within those states that test students out of level (see Table 4).

Table 4. Themes of Results on the Political Context of Testing Students Out of Level

Themes

Number of States
Responding per Theme
(Total N = 12)

Theme 1 – The discussion about out-of-level testing occurred within multiple groups of stakeholders who held diverse opinions.

Theme 2 – Stakeholders react emotionally to the issues related to out-of-level testing.

Theme 3 –.The political context surrounding out-of-level testing has systemic ramifications throughout all levels of an educational system.

Theme 4 – States that have tested students out of level for numerous years experience negligible political effects.

Note: Several states’ responses fell into more than one of the themes.

Theme 1 – The discussion about out-of-level testing occurred within multiple groups of stakeholders who held diverse opinions

The participants in this series of telephone interviews reported that a variety of special interest groups had a stake in testing students out of level. These special interest groups included parents, teachers, and legislators. One participant indicated that when developing an out-of-level testing policy, “The initial reaction and the most aggressive was from parents who were in favor of out-of-level testing.” Another participant reported that “there were a number of parents in the state who called me as we were preparing our participation guidelines.” These parents were generally concerned “that their children would be put through a test that meant nothing … that the children knew and the parents knew … ahead of time that if they were forced to take an on-level test they were not going to pass that test.” Practitioners were also identified as voicing an opinion about implementing an out-of-level testing policy. “A teacher … wrote a letter to the Commissioner [of Education] and said that she did what she thought she was supposed to do and she felt like she betrayed her students because kids even with accommodations … were completely frustrated by tests that were completely above their level.” Regarding legislative involvement, our telephone interviews did not identify the specific issues raised by state legislators about out-of-level testing. However, two participants indicated that “it was talked about a lot in the legislature.” In both of these instances, concerns centered on how “all these children were being included into the testing.”

Our interviews also suggested that these stakeholder groups represented an array of opinions about out-of-level testing. Definite opinions were articulated as out-of-level testing policies developed. However, there was no consistent pattern to these opinions, so that identifying a specific group of stakeholders with a particular opinion about out-of-level testing was not possible: “… the parents were not unified for or against out-of-level. The district personnel were not unified for or against out-of-level. It was more a mixed bag.”

Further, there was no consistent pattern across states in the settings for these conversations about out-of-level testing. Participants, however, did identify both specific and general locations. Specifically, out-of-level testing was discussed at town meetings, parent meetings, and state school board meetings. Generally speaking, some participants indicated that out-of-level testing was discussed at either the local or state level, depending on the urgency of the issue. For instance, there was a sense that out-of-level testing was discussed “on an ongoing” basis at either the local or state level where the testing policy had been implemented for a number of years. In other states, however, “the current understanding that we have [is] that [out-of-level testing] just came across our plates.” While the conversations may begin either in state legislatures or state educational agencies, discussions would expand to local educational agencies and the general public as the out-of-level testing policy developed. These participants also reported that the settings for these conversations have changed over time. “Last year it was at the school and district level but this year predominately it’s right now at the state level.”

Theme 2 – Stakeholders respond emotionally to the issues pertaining to out-of-level testing

These participants described out-of-level testing as evoking a variety of feelings within each of the stakeholder groups. For the most part, parents of students who could participate in out-of-level testing reacted with relief, as indicated in the following statement that described briefing a parent about out-of-level testing, “Frankly it [out-of-level testing] seems to give them some comfort.” One participant indicated that “Parents at this point seem to be fairly grateful. There was really some concern about whether their children would be put through a test that meant nothing.” In terms of the teachers who administer out-of-level tests, one respondent suggested that the out-of-level testing policy in his or her state was for the most part “to pacify the teachers who believe their students cannot work on grade level.” Teachers in some states were described as an “advocacy group” who engaged in “some advocating [for] the inclusiveness side because of their understanding of what they believe it [out-of-level testing] will do for kids in the long run.”

Stakeholder groups tended to respond to the out-of-level testing policy at varying levels of intensity. One participant, in referring to attendance at a workshop for special educators and test coordinators, stated that “… there’s some grumbling going on but they’re generally accepting it [out-of-level testing] as a set of requirements.” A more intense emotive reaction is also represented in the following response, “ … our solution was to have an out-of-level program and that met the needs of some of those people but it also made others very angry.”

Theme 3 – The political context surrounding out-of-level testing has systemic ramifications throughout all levels of an educational system

Eight of the participants in this study indicated that testing students out of level had ramifications beyond the initial testing situation. In one state where the consequences are high for those schools that do comply with state regulations, “the pressure becomes greater and greater to get test scores up [and] to exclude kids. . . . ” Another participant reported that students who are tested out of level are “going to be taken out of a certain graduation track.” Yet another participant suggested that “ … where it becomes political is at the reporting.” While out-of-level testing may appear more fair for some students, the test results are not necessarily “giving credit where credit’s due to the general population of students and their achievement patterns....” In other words, when the effects of this testing approach are considered systemically, the true achievement for all students may be distorted by test scores that do not represent on-grade level progress toward achieving a set of state standards.

Theme 4 – States that have tested students out of level for numerous years experience negligible political effects

Of the SEA personnel interviewed for this study, three participants indicated that “… out-of-level testing per se is really not debated within the public.” Currently, “It [out-of-level testing] doesn’t seem to be one of the big special ed issues.” Each of these three states has tested students out of level for at least three years or more.

Overall, the results of this study identify the wide variability in out-of-level testing state perspectives and policies across all states that test students out of level in large-scale assessments. These results also point to the unresolved issues that surround out-of-level testing policies. The following section of this report describes a series of key issues that are currently unresolved across states that test students out of level.

Key Issues Raised by the Interviews

In the following section, we discuss five key issues that emerged from the telephone interview process (see Table 5).

These issues were raised by the participants as either concerns about testing students out of level in general or specific problems that states have encountered when implementing an out-of-level testing program.

Table 5. Key Issues Raised by the Telephone Interviews

Issue 1 – The practice of out-of-level testing is vulnerable to misuse because:
(1)      many states do not have written procedures about documenting out-of-level testing in student records.
(2)      not all states monitor the number of students tested out of level.
(3)      IEP teams may decide to test a student out of level as an easy solution to a complicated assessment problem.

Issue 2 – The IEP team decision making process to test a student out of level is complicated by:
(1)      the lack of concrete criteria for selecting students for out-of-level tests.
(2)      an inadequate consideration of the future consequences of testing students out of level.
(3)      the concern that parents may not understand the effects of testing students out of level.

Issue 3 – Out-of-level test scores are difficult to report at the state level because:
(1)      it is difficult to count students tested out of level accurately at the local level.
(2)      aggregating out-of-level test scores with in-level test scores may result in reporting biased scores.
(3)      reporting disaggregated out-of-level test scores separately does not yield group performance data.

Issue 4 – Using out-of-level test scores for accountability purposes is difficult because:
(1)      establishing test validity may not account for the purpose of the assessment.
(2)      out-of-level tests designed to measure student progress may not be useful for system accountability purposes.
(3)      school improvement plans are constrained when grade level scores do not include grade level test results.
(4)      teachers do not always understand the need to test students at the local level for system accountability purposes.

Issue 5 – There is wide variability across states in the practices used to test students out of level.

Issue 1 – The practice of out-of-level testing is vulnerable to misuse

Some participants expressed concern about the possible misuse of out-of-level testing. This issue is difficult to explain in general terms because of the discrepancies in out-of-level testing policies across states. For instance, all 12 states that test students out of level require documentation in a student’s IEP that a statewide assessment will not be administered on grade level. However, many states indicated that there were no specific procedures in place to do so. One state indicated that conversations are ongoing to develop documentation guidelines, but the challenge lies in translating these conversations into consistent assessment practices. Even when out-of-level tests are consistently documented, only half of the states have developed monitoring procedures to ensure that the number of students tested out of level is not excessive. A few states have set numeric limits on the number of students to test out of level, but this practice has not been adopted in most states that currently test students out of level. Misuse of out-of-level testing is likely if consistent documentation and monitoring procedures are not established to guard against inappropriate testing.

The manner in which decisions are made about testing students out of level in most states is open to misuse as well. In all states, the decision to test out of level is made by an IEP team. It is quite possible that well-meaning educators or family members may assume that some students with disabilities could not participate in statewide tests. In these cases, exclusion from testing may seem to be in the students’ best interest. However, recent research has demonstrated that students with disabilities, who were once thought to be unable to participate in large-scale assessments, can not only participate but perform better than expected (Bielinski & Ysseldyke, 2000). Further, supporting the participation and performance of these students may seem difficult if the student needs an accommodated test. From this stand point, out-of-level testing may appear to be an easier approach to including some students with disabilities in large-scale assessments.

Issue 2 – The IEP team decision making process to test a student out of level is complicated

It is assumed that IEP teams make good decisions about testing students out of level. However, there are no research studies that confirm or disconfirm this assumption. At best, the decision to test a student out of level is made subjectively. But again, no research study has described how and why these assessment decisions are made. Just as importantly, there are no investigations that describe whether students or parents participate in the decision-making process or, if they do participate, how well informed the decisions are to use out-of-level tests.

To complicate the decision-making process further, only a portion of the states that test students out of level have developed criteria to use in identifying students for these assessments. Most of these sets of criteria lack concrete determinants necessary for separating students for out-of-level testing from students who are more appropriately tested on level. Further, only a few states require separate documentation in student files from the IEP itself. There are some states that have developed out-of-level test forms that require IEP teams to follow specific steps in selecting students appropriately for out of level tests. But for those states that do not use out-of-level testing paperwork, there are no assurances about how teams select students for out-of-level tests.

Further, there are no guarantees that the team has considered the long-range implications of testing a student out of level. There are also no guarantees that the parents of students who are tested out of level understand the consequences of out-of-level testing. For instance, some states do not grant a regular high school diploma to those students who have taken a statewide assessment out of level. It is essential that IEP team members, including the student and the student’s parents, understand the ensuing ramifications of taking out-of-level tests. To avoid unintended consequences of testing students with disabilities out of level, it is imperative that all team members fully participate in the decision to use out-of-level tests, which includes selecting students appropriately for this testing approach.

Issue 3 – Out-of-level test scores are difficult to report at the state level

Out-of-level testing also introduces several problems for states in reporting assessment results. The procedures involved in accurately transferring the number of students tested out of level from the classroom level to the state level for reporting purposes are complicated. One state expressed concern about the difficulty of obtaining an exact number of students who participated in out-of-level testing. Problems arise in schools on the day of testing, such as student or teacher absenteeism, inaccurate counts of out-of-level tests, or improper submission of the number of tests to the SEA.

Once the SEA receives the out-of-level test scores, the procedures to report assessment results are also complicated. States have resolved this issue through a variety of means. One state has developed transformation rules for entering out-of-level test scores in an accountability index. Other states that administer norm-referenced tests as a statewide assessment use the test company’s recommended procedures for converting out-of-level test scores to in-level test scores. However, the majority of states expressed concern about how to report out-of-level test scores. For instance, aggregating out-of-level tests with in-level test scores may result in reporting biased test scores (Bielinski, Thurlow, Minnema, & Scott, 2000). Alternatively, disaggregating out-of-level test scores and reporting these scores separately does not provide group performance data that are representative of all students in a given grade.

Issue 4 – Using out-of-level test scores for accountability purposes is difficult

In thinking a step beyond public reporting of assessment results to using test scores for accountability purposes, out-of-level testing again creates problems for states. Some participants indicated a need to better consider the purpose of large-scale assessments when making decisions about an out-of-level testing program. One participant suggested that decisions made to improve the validity of an instrument rarely take into account the purpose of the test. It is possible to develop an assessment instrument that is valid and reliable within one context but not another. For instance, a criterion-referenced test taken out of level and used for student accountability purposes is presumed to measure a student’s ability with greater validity. If the test items are linked to the student’s curriculum, test results are thought to be more useful for making instructional decisions than the results from an in-level test. However, when the same test results are applied to a system accountability program, the test results are thought to be less useful. It is difficult to make school improvement decisions for certain grades when all of the scores reported are not on level. In fact, some participants indicated that their state did not know how to use out-of-level test scores for system accountability purposes. This concern seemed to be more problematic for those states that do not convert out-of-level test scores to in-level scores. For the states that equate out-of-level test scores, using the test scores for both student and system accountability purposes seemed to be less problematic.

Also tied to these accountability issues is the concern that teachers lack system accountability literacy. Some of the participants identified the need for additional resources to assist teachers in understanding the connection between testing an individual student at the classroom level and initiating school improvement through system accountability. These participants further asserted that with system accountability literacy, teachers would make better use of in-level testing. This belief may be particularly true for students with disabilities who, with the exception of a small percentage of students within each school district, are capable of participating in the regular assessment (Bielinski & Ysseldyke, 2000).

Issue 5 – There is wide variability across states in the practices used to test students out of level

A final issue that is important in understanding the status of out-of-level testing is the wide variability in the practice of testing students out of level. For example, procedures used to administer out-of-level tests differ from state to state as do the procedures used to report out-of-level test results. The type of test used for out-of-level testing varies also. States seem even to disagree as to whether out-of-level testing is an accommodation that results in a standard test presentation or a modification that yields a nonstandard test presentation. The ramifications of this variability in out-of-level testing practices are extensive for educators, policymakers, and researchers. One such effect concerns investigating specific aspects of testing students out of level. There is a critical need for empirical research to sort out the difficult issues that surround out-of-level testing. However, researching the construct is hampered by the lack of consistency in the practice of out-of-level testing nationwide.

Study Constraints

There are several study constraints that are important to highlight when considering the implications of these results. First, this research design is a cross-sectional view of the current status of testing students out of level nationally. In other words, these data only describe state practices and perspectives on testing students out of level at one point in time. Cross-sectional research designs are particularly problematic for policy research, since educational policy is implemented within a context of rapidly changing public opinions, attitudes, and values. The interview data collected for this study only describe the context of conducting out-of-level testing for a limited period of time. A series of interviews with the same SEA would more aptly capture the rapidly changing context that surrounds testing students out of level.

A cross-sectional design for policy research is problematic for a second reason. The development and implementation of educational policy is a process that evolves over time. In other words, state perspectives on out-of-level testing and the practices used to implement the policy are not static, but change with time. From this perspective, a cross-sectional look at out-of-level testing does not capture the element of change that is central for understanding the policy in its entirety. Since data were collected at only one point in time, the developmental aspect of policy is not fully explained. Again, a series of interviews would better explicate the aspect of change in out-of-level testing policies nationwide.

A final constraint on this study concerns the purposive sample of participants. These participants were either state assessment directors or state personnel nominated by a state assessment director. Since only one participant from each SEA was interviewed, the final data set represented the perspectives or practices of only one person from each state. Moreover, participants were recruited from either assessment or special education divisions of SEAs. Within a single SEA, it is likely that one division’s opinions, perspectives, and knowledge do not necessarily match those of another division. Possibly, if additional participants were interviewed from each SEA who represented different divisions within the SEA, this study would have garnered a more complete description of state perspectives and practices regarding out-of-level testing.

Recommended Next Steps in Research

In order to make appropriate decisions about including all students in large-scale assessment programs, the need for additional research in the area of out-of-level testing is crucial. To better understand the status of out-of-level testing nationwide, it is important to describe how many students are tested out of level and who they are. In addition to the prevalence of out-of-level testing, there is a critical need to determine whether intended or unintended consequences are occurring when students are tested out of level. Finally, it is essential for the field to develop well-researched guidelines and parameters to guide states in making decisions about out-of-level testing.

Conclusions

Of the states that test out of level, some do so because a governing body has mandated the testing policy. Other states have a history of testing out of level so that the testing policy is an integral component of a statewide assessment program. The remaining states have elected to develop an out-of-level testing program based on either stakeholder input or action research. If states continue to implement out-of-level testing in large-scale assessments, it is essential to consider the four concerns that follow.

First, there are no research studies to date that demonstrate the value or lack therein of testing students out of level (Minnema, Thurlow, Bielinski, & Scott, 2000). Minnema et al. (2000) further state that the field has yet to determine a set of guidelines that support the appropriate use of out-of-level testing. Without a research base to guide decisions about how to best test students out of level in large-scale assessments, it behooves policymakers and practitioners to implement an out-of-level testing program cautiously. At minimum, out-of-level testing policy should be written to discourage testing students out of level. Again, while not yet documented in the research base, a fundamental concern about out-of-level testing is that teachers will reduce their instructional expectations for a student who takes a test at a grade level lower than his or her assigned grade. Testing a negligible number of students out of level will ensure that more students are challenged to strive toward grade level standards.

Second, one rationale for testing students out of level is the desire to include all students in a statewide assessment programs. Out-of-level testing appears to be a logical solution for assessing those students who are striving to meet grade level standards, but at a slower learning pace than their same age peers. The decision to test students out of level is further reinforced by those test companies that market out-of-level tests. However, it has been shown that states may not receive all of the psychometric information necessary to make informed decisions about an out-of-level testing program (Bielinski, Thurlow, Minnema, & Scott, 2000). Compounding the situation further is the lack of focused research studies that explicate the psychometric properties of out-of-level tests. It is therefore essential that states proceed cautiously in testing students out of level until the inherent validity and reliability issues are sorted out in the research base.

Another consideration for states that test students out of level is the unresolved dilemma about how to use out-of-level test scores for system accountability purposes. To further complicate the problem, there are no agreed upon reporting procedures that can be used as a recommended format for either student or system accountability purposes. Linked to these accountability problems is the concern that out-of-level testing will become a means to exclude lower performing students from accountability systems. This concern is particularly relevant for those states that have high stakes system accountability procedures in place. It is therefore suggested that if states do test students out of level, states carefully report all student test results publicly that include out-of-level test scores. It is further recommended that systematic monitoring procedures be put in place to ensure that the selection of students for out-of-level testing is as appropriate as possible. Structured monitoring practices can guard against excluding high numbers of students who are tested out of level from district and state reporting systems.

A final concluding consideration is the current status of the research base that supports the practice of out-of-level testing. While out-of-level testing is a contentious issue in policy making and practitioner circles, the merits of out-of-level testing could be debated continuously. Without data to substantiate either supporting or opposing out-of-level testing, the issues cannot be definitively resolved. It is important for all state level assessment and special education personnel to access current research results that inform their understanding about the issues that surround out-of-level testing, and support them in making appropriate policy decisions about testing students out of level.

References

Bielinski, J., & Ysseldyke, J. (2000). Interpreting trends in the performance of special education students. (Technical Report 27). Minneapolis: University of Minnesota, National Center on Educational Outcomes.

Bielinski, J., Thurlow, M., Minnema, J., & Scott, J. (2000). How out- of-level testing affects the psychometric quality of test scores (Out-of-Level Testing Report 2). Minneapolis: University of Minnesota, National Center on Educational Outcomes.

Minnema, J., Thurlow, M., Bielinski, J., & Scott, J. (2000). Past and present understandings of out-of-level testing: A research synthesis. (Out-of-Level Testing Report 1). Minneapolis: University of Minnesota, National Center on Educational Outcomes.

Thurlow, M., Elliott, J., & Ysseldyke, J. (1999). Out-of-level testing: Pros and cons (Policy Directions 9). Minneapolis: University of Minnesota, National Center on Educational Outcomes.

Appendix A

Telephone Interview Protocol

(1) Why does your state use out-of-level testing?

(2) Probe: Are these reasons the complete rationale for testing out of level in your state?

(3) What guidelines and policies are written about out-of-level testing in your state? How are school districts informed about these guidelines and policies?

(4) Describe the procedures used when testing out of level in your state?

• Probe: How are students selected for testing out of level?

• Probe: How many levels below or above grade level does your state test?

• Probe: What is done with the out-of-level score after testing?

• Probe: If norm referenced testing is not done in your state, what makes up an out-of-level test in your state?

(5) In your opinion, what are the advantages of using out-of-level testing? What are the disadvantages of using out-of-level testing?

(6) Is out-of-level testing used for system accountability in your state? If so, how? Or for student accountability? And again, if so, how?

(7) Does out-of-level testing impact grade promotion in your state? If so, how? Does out-of-level testing impact meeting graduation requirements? Again, if so, how?

(8) Are there auditing or quality control procedures in place in your state to make sure that out-of-level testing is used appropriately? If so, what are they?

(9) What are the requirements for documenting out-of-level testing in students’ Individual Education Plans or other student records?

(10) Are the results of out-of-level testing reported to parents? If so, how? And, when? Also, are the results of out-of-level testing reported to the public? If so, how? And, when?

(11) As the final question, please describe the settings in which out-of-level testing is discussed in your state?

• Probe: For instance, are policymakers discussing out-of-level testing informally?

• Probe: Is there a public reaction to testing out of level in yours state?

• Probe: Would anyone else be aware of other settings in which out-of-level testing is discussed?

Top of page