Testing Students Out of Level in Large-Scale Assessments: What States Perceive and BelieveOut-of-Level Testing Project Report 5Published by the National Center on Educational OutcomesPrepared by Jane Minnema, Martha Thurlow, and Jim Scott March 2001This document has been archived by NCEO because some of the information it contains may be out of date. Any or all portions of this document may be reproduced and distributed without prior permission, provided the source is cited as: Minnema, J., Thurlow, M., & Scott., J. (2001). Testing students out of level in large-scale assessments: What states perceive and believe (Out-of-Level Testing Project Report 5). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved [today's date], from the World Wide Web: http://cehd.umn.edu/NCEO/OnlinePubs/OOLT5.html Executive Summary Out-of-level testing
is the practice of administering a test at a level above or below the level that is
generally recommended for a student based on his or her age or grade. Originally
introduced in the 1970s as a way to more precisely measure student achievement growth as
an index of program efficacy, today out-of-level testing has become a testing option for
testing students with disabilities in large-scale assessment programs. Currently, there
are 12 states (Alaska, Arizona, California, Connecticut, Delaware, Iowa, Louisiana, North
Dakota, South Carolina, Utah, Vermont, and West Virginia) that test students out of level
in state tests. However, the recent increase in implementing out-of-level testing has not
occurred without controversy. Complicating the
controversy over using out-of-level testing in large-scale assessment programs is the lack
of definitive research results that point to practical solutions. In fact, the existing
literature on out-of-level testing seems to raise more questions about testing students
out of level than it answers. Without the benefit of research-based guidelines or
recommendations, differing opinions about out-of-level testing have arisen among
policymakers, educators, and parents nationwide. To date, no study has described the
differing opinions, perceptions, and practices involved in testing students out of level
in large-scale assessment programs. Therefore, the purpose of this study is to describe
state-level perspectives from the states that currently test students with disabilities
out of level in large-scale assessment programs. This report
describes the results of a series of telephone interviews with state-level personnel from
state educational agencies in those states that use out-of-level testing. An analysis of
the narrative data yielded overarching themes of results that are organized according to
the rationale for testing students out of level, the advantages and disadvantages to
out-of-level testing, and the political context that surrounds the implementation of
out-of-level testing programs. These results point to four concerns that are important
considerations for policymakers and practitioners who strive to include all students with
disabilities in large-scale assessments. First, fundamental issues about the effects of
out-of-level testing on students with disabilities are not known. Second, the test
validity and reliability issues inherent to testing students out of level are also not
resolved in the literature. Third, sound procedures for reporting out-of-level test scores
and using the test results for accountability purposes have not been determined. Finally,
the overall merit of out-of-level testing has not been fully described or empirically
tested. At best, these results suggest that testing students with disabilities out of
level in large-scale assessment programs should be done cautiously. Overview Out-of-level
testing, or the administration of a test at a level above or below the level generally
recommended for students based on their age-grade level, is an assessment practice that
dates back to the 1970s. Originally, out-of-level testing was used to measure individual
student progress or program efficacy. While receiving little attention in the literature
throughout the 1980s, out-of-level testing has re-emerged as an assessment practice in the
late 1990s. Today, as educational reform efforts increasingly emphasize student and system
accountability, the use of out-of-level testing has expanded as states strive to include
all students in their statewide assessment programs. However, the implementation of
out-of-level testing has not grown without controversy (Thurlow, Elliott, & Ysseldyke,
1999). In fact, this approach to testing remains a contentious issue among policymakers,
educators, school administrators, parents, and other community members. Proponents of
out-of-level testing identify multiple benefits for this testing approach while opponents
claim various disadvantages (Minnema, Thurlow, Bielinski, & Scott, 2000). The
situation is further complicated by the multiple questions that persist in practice
without research evidence to point to conclusive answers. For instance, when students are
tested on-level, how useful are test results if the items do not measure what a student
knows? Or, is it in the students best interest to experience frustration or anxiety
when tested on material that is too difficult? Moreover, is it helpful to have test
results that do not guide appropriate instructional decisions? On the other hand, how
valid and reliable are the out-of-level test scores when converted back to in-level
scores? If students are not tested on-level, will their classroom instruction challenge
them to meet grade-level standards? How can out-of-level test scores be used appropriately
within a high stakes accountability system? Further, how can out-of-level test scores be
publicly reported in a meaningful way for accountability purposes? These are some of
the issues that surround the practice of testing students out of level. The need for
empirical research that answers these important questions is critical. An important first
step toward resolving some of these dilemmas is to better understand the current status of
testing students out of level in statewide, large-scale assessment programs. The purpose
of this report is to describe state-level perspectives that highlight the differing
opinions and perceptions about testing students out of level. We also use these
perspectives to identify key issues and make general recommendations that are important
considerations when testing students out of level. Method In this qualitative
study we used three data collection strategies to collect narrative data from state
educational agencies within the 12 states that use out-of-level testing in large-scale
statewide assessments (Alaska, Arizona, California, Connecticut, Delaware, Iowa,
Louisiana, North Dakota, South Carolina, Utah, Vermont, and West Virginia). The primary
source for these data was the state assessment directors. While we conducted the three
data collection strategies simultaneously, we describe each strategy separately here. First, we collected
Internet data from the individual state Web sites that described the large-scale
assessment programs for each state where out-of-level testing is currently implemented.
The purpose of this data collection strategy was to provide a context for understanding
the implementation of out-of-level testing within each state individually. These data
included a description of the assessment(s) used (test name, type of test, subject areas
tested, grades tested, dates of administration, and high stakes impact). We also collected
policy information about test accommodations and out-of-level testing. In addition, we
collected data that described the public reporting practices of assessment results at both
the state and local level. It should be noted, however, that the Internet data collected
for Iowa differed from the other 11 states since Iowa is one of two states nationwide that
does not have a mandated standards-based, large-scale assessment program used statewide. Second, we conducted
a series of telephone interviews with one person in each state educational agency who was
knowledgeable about the states out-of-level testing policy and practice. The same
recruitment e-mail was sent to each state assessment director with a follow-up telephone
call to schedule an interview appointment. Interview questions were emailed to each
participant prior to the telephone interview. (See Appendix A for the telephone interview
protocol.) It typically took 20 to 30 minutes to conduct an interview. All interviews were
tape recorded and then transcribed for qualitative data analysis. In five of the
states, we interviewed either the state assessment director or the assistant director. The
state assessment directors in the remaining seven states referred us to other staff who
were more familiar with testing students out of level in their states. These participants
included a program administrator, a program director, a consultant, an educational
associate, an educational specialist, a coordinator, and a university professor. In one
state, we used input from two people. All participants received a copy of this report for
participating in this study. As a final data collection strategy, we emailed each
participant follow up questions to the telephone interview. These questions addressed
either missing interview information or clarified interview content to ensure accurate
results. In producing the
final results of this study, we considered two levels of information. The first level of
information is a more detailed look at the states that test students out of level to
compare their policies and practices (see Out-of-Level Testing Report 4). A second level
of information was more global in nature, which required analysis and interpretation of
the telephone interviews as a composite set of data. Themes of results were generated
through qualitative data analysis to begin describing the practice of testing students out
of level nationwide. These themes were organized according to the five primary telephone
interview questions. To complete this analysis, we read and re-read the interview data,
question by question, to code the categories of information for all of the interviews as
an entire group. We then merged these categories of information into meaningful themes of
results. The final stage of this qualitative analysis involved verifying these results by
a reviewer who conducted an independent analysis of one-fifth of the data set. The final
analysis was adjusted according to any discrepancies between these two reviews. State Perspectives on Out-of-level Testing These themes of
narrative results are organized according to the rationale for testing students out of
level, the advantages and disadvantages to out-of-level testing, and the political context
surrounding out-of-level testing policy. Each of these topics was the focus of one
telephone interview question. Rationale for Testing Students Out of Level The qualitative
analysis of the data set for this interview question yielded four themes of results (see
Table 1). One states responses were not included in the analysis of the responses to
this interview question. While this state has a long history of testing students out of
level, the participant from that SEA was hesitant to fully answer all of the interview
questions due to the Title 1 peer review process, which seems to discourage out-of-level
testing. Two weeks ago I would have had a great answer
but since
the end of June, weve had some conversations with Title 1 at the federal level.
Im not convinced that our position on out-of-level [testing] will remain the
same. The following themes of results are based on the responses from the remaining
11 states. Table
1. Themes of Results on the Rationale for Testing Students Out of Level
Note: Several states responses fell into more than
one of the themes. Theme 1 Some students assessment needs are met by testing at a lower level than the assigned grade level We test out of
level for one primary reason, and thats that we believe students need and should be
tested at their instructional level. We dont think its right to give students
algebra problems when theyre working on third grade math. Those states that
test out of level because they believe that the out-of-level tests are fair and
appropriate, expressed concern that the regular statewide assessments were not
appropriate for all students. Testing below grade level was described as an
appropriate and fair approach for testing those students who are striving to meet grade
level content standards, but at a slower learning pace than their same grade peers. One aim of
out-of-level testing is to match a students instructional level to the test item
content. Test results are then more usable for guiding classroom teachers in making good
instructional decisions. Interviewees also reported that when test items measure the
curricular content of a students instructional level, the testing experience for the
student is less frustrating and causes less emotional trauma. For our state testing
program
there was concern among constituents, parents, students, and teachers that
there were some students who were going to be unable to perform at a [certain] grade level
and [on-level testing] would be inappropriate and frustrating for some students who were
not operating at that level. Theme 2 Out-of-level testing is a means of including all students in an accountability system Several states
viewed out-of-level testing as a unique accommodation for students with
disabilities. We are attempting to
be as inclusive as possible to students
who are unable to participate in the on-level testing system. Out-of-level testing
provides a test score for those students who might otherwise not perform well enough to
obtain a score on a grade level test. With testing options in
our
accountability assessments, such as out-of-level testing, every student is entered
into a local and state level database. When all students are considered for accountability
purposes, states are able to ensure the integrity of the assessment system. Out-of-level testing
ensures that we are not putting [students] into an alternate assessment who really
dont need that type of assessment. But at the same time to ensure that theyre
not sitting down in front of a test where they cant answer any of the
questions. States viewed out-of-level testing as an assessment option for those
students whose academic skills fall in between a grade level assessment and an alternate
assessment. These students are not striving to meet a different set of content standards
as are those students for whom an alternate assessment is intended. Students for whom an
out-of-level test is intended are striving to meet grade level standards, but at a lower
performance level than the level tested by a grade level test. Out-of-level testing
was also viewed as maintaining the integrity of the assessment system when the
psychometric properties of an assessment system are considered. Few states test student
performance at all grade levels. Since adjacent grades are rarely tested in large-scale
assessment programs, it is unlikely that the tests have overlapping items. Vertical
equating is thus problematic. A common scale can only be calculated if the highest level
questions on one test level overlaps with the lowest level questions on the next grade
tested. For instance, the highest level test items that measure 4th grade performance would not overlap with the lowest level test
items on an 8th grade assessment. The gap between 4th grade and 8th grade is too large to allow for common performance levels.
Out-of-level tests that test performance within the gap between grades was
thought to have improved the accuracy of measurement and yielded usable
data. Theme 3 Out-of-level testing is a practical solution to a costly assessment problem. Another rationale
for administering out-of-level tests is characterized as compensating for the
inadequacy of the regular assessment system. States have invested extensive time and
resources in developing large-scale assessment systems, but find that the existing systems
do not adequately meet all students assessment needs. However, these
assessments have so many positive points that it would not be feasible or
practical to build one [assessment system] where out-of-level tests werent
necessary. Some of these respondents viewed out-of-level tests as an affordable
solution that still
have valid, accurate accountability data by
giving school districts options so that they can appropriately assess kids. Theme 4 The policy to test students out of level is a mandated policy. Some states
indicated that out-of-level tests are only administered because the state board of
education had mandated the policy or a command decision
was made here at the
[state] department to test out of level. It is interesting to note that both
respondents that spoke to a mandated out-of-level testing policy also registered
dissatisfaction with the testing policy. A third respondent indicated that there was no
state policy that defines a rationale for testing students out of grade level.
This respondent stated further that testing out of level is a local decision made by
special education teachers where the state policy
has encouraged districts or
schools to not test out of grade level by treating the out-of-level test as a
nonstandard test administration. Advantages of Out-of-Level Testing The distinction
between a rationale for out-of-level testing and the advantages of testing students out of
level is an artificial one. Some of the statements that explain the rationale for
out-of-level testing overlap with the advantages to out-of-level testing. Two themes did
emerge, however, from the qualitative analysis of the responses to this interview question
(see Table 2). These themes increase our understanding of a state-level perspective on
out-of-level testing. Table
2. Themes of Results on the Advantages of Testing Students Out of Level
Note: Several states did not make any statements that
reflected either of these themes.
Theme 1 Out-of-level testing provides a more accurate measure of ability, which is better for students, parents, teachers, and policymakers Out-of-level testing
provides the flexibility necessary to measure performance at the point where students are
accessing the general education curriculum. When students are tested on things that
they should know, the test results contain very specific information that
teachers and parents can get from participation in the out-of-level test. Teachers,
parents, and students understand academic performance according to the standards that the
student is striving to achieve. Policymakers receive information about statewide academic
achievement for making within-state comparisons. One respondent summarized this claim by
saying, I think that you have two choices if you dont test out of level. You
either have to develop a different assessment for those students or you have to test them
on grade level on inappropriate material. Theme 2 Out-of-level testing has no advantages Two of the 12 states
indicated no advantages to testing students out of level. When asked to speak to the
advantages of out-of-level testing, one participant responded, I dont really
like out-of-level testing. It kind of muddies the water for us
Disadvantages of Out-of-level Testing Numerous
disadvantages of testing students out of level emerged from this interview process
regardless of whether the participants favored testing students out of level. Only one
state saw no disadvantages to out-of-level testing. A second participant indicated that it
was too early in the states experience of testing students out of level to identify
any disadvantages. Three themes illustrate the suggested disadvantages of out-of-level
testing according to the remaining 11 participants (see Table 3). Table
3. Themes of Results on the Disadvantages of Testing Students Out of Level
Note: Several states responses fell into more than
one of the themes. Theme 1 Out-of-level testing results do not necessarily add value to a large-scale assessment system One of the aims of
out-of-level testing is to obtain more accurate and usable test results. However, these
participants indicated that out-of-level testing programs do not always achieve this
purpose. By adjusting the testing program to meet the needs of the student
you dont get a totally accurate picture of the students abilities.
Students who are tested out of level receive tests that contain material that is not
age-appropriate. In addition, the test is labeled with a lower grade level than the
students assigned grade. If students become defensive and do not want to take
that test, the resulting test scores could be inaccurate. After administering
out-of-level tests in its statewide assessment program, one state summed up its experience
by saying, Weve learned after the first year of implementation [that] the way
our program is constructed, the results are not particularly enlightening.
Similarly, another participant responded, The main disadvantage [to testing students
out of level] is that its hard to interpret the meaning of the test score once you
get more than one grade level out. According to these
participants, interpreting the meaning of out-of-level test scores is problematic for two
reasons. First, the curricular constructs measured by an out-of-level test differ from
those constructs measured by an on-grade-level test. For instance, when a 4th grade student is tested on 1st grade reading material, the test results indicate that the 4th grade student is learning to read not that he or she is able to
read for meaning, as would be expected for a 4th grade student. The out-of-level test score doesnt yield
any diagnostic information of what we can do to address the 4th graders reading issues. In other words, testing
students out of level may not provide usable test information. The test isnt
long enough or specific enough in terms of reading competencies to help the teachers know
what to do after they get that test score. Second, a student
who is tested out of level is not compared to a grade level normative group. A
norm-referenced test provides a snapshot of where the student is on some continuum
when compared with age-normal peer groups. However, when a student is tested out of
level, the reference group shifts to a younger normative population. Thinking again about
the 4th grade student, an out-of-level test score would only indicate
that he or she performed 50% higher than the 1st grade students who were tested. The out-of-level test score only
indicates that a student is achieving at a level below his or her assigned grade, and
above a segment of students who are enrolled in a lower age-grade level. Teachers,
parents, and students alike are confused when they do not know how a student is achieving
in comparison to his or her grade normal peer group. Theme 2 Incorporating out-of-level test scores into system and student accountability systems is problematic One of the major
dilemmas raised about out-of-level testing was the uncertainty about how to report
out-of-level test results for system accountability purposes. Questions still exist.
How will all this [out-of-level test scores] be aggregated for state and federal
reporting. One participant stated that, We dont know what to do with the
results here. ... But when youre doing state accountability, out-of-level testing
doesnt make a lot of sense to me. Other participants indicated that their
reporting mechanisms
are under development now, and one of the areas that we
will be considering is how [to do the] reporting [for] all of the students who are taking
out-of-level assessments. Yet, another participant from a state that did have
procedures in place for reporting out-of-level test scores was concerned that the
child [who is tested out of level] is going to be counted in level one, the lowest
level of performance.
[A]s far as accountability, the school will get a
little bit of credit for including the child in testing regardless of what their
performance was. Generally speaking,
most participants indicated that out-of-level testing is the kind of thing you would
do at the local level because you need to make some curriculum decisions. Even so,
these participants were able to identify disadvantages when using out-of-level testing
scores for student accountability. For instance, some participants noted that state
assessment programs contain inequitable features. In one state, students must meet the
standard in reading and mathematics for grade promotion. As in most states, students are
selected for out-of-level testing by an IEP team. Team members can make a students
grade promotion decision, thereby eliminating any negative consequences or high stakes
impact for the student with disabilities. However, for those students who are tested out
of level, but do not have an IEP, grade promotion is dependent on passing statewide
assessments. Some students have very specific consequences relating to
retention and other students do not. Yet, another participant asked, What
message are you sending students if you allow out-of-level testing at grades 3, 6, and 8,
and then dont allow it at the high school level? Testing practices appear to
be inequitable when out-of-level testing is administered as part of a statewide assessment
program. A caveat to the
above discussion must be highlighted to fairly present the patterns of results for this
interview question. Not every state that allows out-of-level testing in statewide
assessments raised concerns about using out-of-level test scores for accountability
purposes. Four participants stated that out-of-level test scores could be reported
appropriately in both student and system accountability programs. In these cases, states
use equating procedures to convert out-of-level test scores to in-level test scores for
public reporting so that all students who are tested receive an on-grade level test score.
One participant summarized this point of view by saying, I think that with caution
and with structure and with a lot of monitoring to make sure that procedures are being
implemented correctly, that it is in fact possible to make that balance between meeting
kids needs and providing accurate scores for accountability. It is interesting
to note that the participants who indicated that out-of-level test scores could be
reported appropriately in accountability programs did not always advocate for testing
students out of level. Reflecting a final
disadvantage, participants also raised an instructional concern about allowing
out-of-level testing in large-scale assessments. We think that the main disadvantage
is that the out-of-level testing does not address the curriculum material of the grade in
which the student is enrolled. When test items are not aligned with curricular
content, People may use out-of-level testing to not set challenging goals for
students. When
you dont know how every kids operating on
their grade level with their same age peers, instructional delivery may focus on a
lower set of content standards than those standards expected for grade level performance. Theme 3 In allowing out-of-level testing, large-scale assessment programs are vulnerable to assessment misuses Possibly the most
serious concern about administering out-of-level tests within large-scale assessment
programs is the temptation to exclude low scoring students from state level, aggregated
performance reporting. Theres always inappropriate uses
its a way
to keep lower kids scores out of the mix for score accountability. In this
way, Out-of-level would be used to inflate a schools scores or to make it look
like a school is doing better than it might be doing. Excluding lower performance
test results is particularly tempting for those states that have high stakes for local
school districts. In some states, administrators and teachers suffer the consequences of
declining performance on statewide assessments. However, when out-of-level test scores are
not properly reported for low-achieving students, true group performance is masked. The
purpose of system accountability is defeated when school systems cannot be held
accountable for all students academic performance, which includes any student who is
not mastering on-grade level state standards at the same pace as his or her same age
peers. Educators jobs may be saved, but at the cost of inaccurate public reporting
and ineffective system accountability. To complicate the
situation further, low achieving students oftentimes have disabilities. In many states,
students must have a disability and an IEP to be selected for out-of-level testing. IEP
teams are expected to make accurate decisions about testing a student out of level. All
states disseminate out-of-level testing information to local school personnel through
statewide trainings, mailings, or Internet postings. Even so, one participant reported,
The ability of local administrators or educators to make decisions that
actually improve the quality of the instruments being used is questionable in some
circumstances. In addition, as is
true for all educational policy, out-of-level testing policy that is written at the state
level cannot ensure consistent or high-quality implementation at the local level. We
know its [policy implementation] uneven, but how to address the unevenness is a
problem when youve got essentially a moving target. One participant aptly
summed up the challenge to out-of-level testing policy by suggesting, Without
policies and procedures and without an audit and without documentation of eligibility,
theres clearly the potential for misuse of out-of-level assessment. Political Context Surrounding Out-of-Level Testing Over the past few
years, anecdotal reports have surfaced from both policymakers and educators that refer to
the political nature of testing students out of level. It is generally known that some
out-of-level testing policies were developed and implemented in contentious environments.
However, to date, there are no data to substantiate these contentions. Thus, it is
important to describe both the political context of testing students out of level and the
resulting effects on assessment systems. Four themes of results are a first step toward
verifying the political climate surrounding out-of-level testing within those states that
test students out of level (see Table 4). Table
4. Themes of Results on the Political Context of Testing Students Out of Level
Note: Several states responses fell into more than
one of the themes.
Theme 1 The discussion about out-of-level testing occurred within multiple groups of stakeholders who held diverse opinions The participants in
this series of telephone interviews reported that a variety of special interest groups had
a stake in testing students out of level. These special interest groups included parents,
teachers, and legislators. One participant indicated that when developing an out-of-level
testing policy, The initial reaction and the most aggressive was from parents who
were in favor of out-of-level testing. Another participant reported that there
were a number of parents in the state who called me as we were preparing our participation
guidelines. These parents were generally concerned that their children would
be put through a test that meant nothing
that the children knew and the parents
knew
ahead of time that if they were forced to take an on-level test they were not
going to pass that test. Practitioners were also identified as voicing an opinion
about implementing an out-of-level testing policy. A teacher
wrote a letter
to the Commissioner [of Education] and said that she did what she thought she was supposed
to do and she felt like she betrayed her students because kids even with accommodations
were completely frustrated by tests that were completely above their level.
Regarding legislative involvement, our telephone interviews did not identify the specific
issues raised by state legislators about out-of-level testing. However, two participants
indicated that it was talked about a lot in the legislature. In both of these
instances, concerns centered on how all these children were being included into the
testing. Our interviews also
suggested that these stakeholder groups represented an array of opinions about
out-of-level testing. Definite opinions were articulated as out-of-level testing policies
developed. However, there was no consistent pattern to these opinions, so that identifying
a specific group of stakeholders with a particular opinion about out-of-level testing was
not possible:
the parents were not unified for or against out-of-level. The
district personnel were not unified for or against out-of-level. It was more a mixed
bag. Further, there was
no consistent pattern across states in the settings for these conversations about
out-of-level testing. Participants, however, did identify both specific and general
locations. Specifically, out-of-level testing was discussed at town meetings, parent
meetings, and state school board meetings. Generally speaking, some participants indicated
that out-of-level testing was discussed at either the local or state level, depending on
the urgency of the issue. For instance, there was a sense that out-of-level testing was
discussed on an ongoing basis at either the local or state level where the
testing policy had been implemented for a number of years. In other states, however,
the current understanding that we have [is] that [out-of-level testing] just came
across our plates. While the conversations may begin either in state legislatures or
state educational agencies, discussions would expand to local educational agencies and the
general public as the out-of-level testing policy developed. These participants also
reported that the settings for these conversations have changed over time. Last year
it was at the school and district level but this year predominately its right now at
the state level. Theme 2 Stakeholders respond emotionally to the issues pertaining to out-of-level testing These participants
described out-of-level testing as evoking a variety of feelings within each of the
stakeholder groups. For the most part, parents of students who could participate in
out-of-level testing reacted with relief, as indicated in the following statement that
described briefing a parent about out-of-level testing, Frankly it [out-of-level
testing] seems to give them some comfort. One participant indicated that
Parents at this point seem to be fairly grateful. There was really some concern
about whether their children would be put through a test that meant nothing. In
terms of the teachers who administer out-of-level tests, one respondent suggested that the
out-of-level testing policy in his or her state was for the most part to pacify the
teachers who believe their students cannot work on grade level. Teachers in some
states were described as an advocacy group who engaged in some
advocating [for] the inclusiveness side because of their understanding of what they
believe it [out-of-level testing] will do for kids in the long run. Stakeholder groups
tended to respond to the out-of-level testing policy at varying levels of intensity. One
participant, in referring to attendance at a workshop for special educators and test
coordinators, stated that
theres some grumbling going on but
theyre generally accepting it [out-of-level testing] as a set of requirements.
A more intense emotive reaction is also represented in the following response,
our solution was to have an out-of-level program and that met the needs of some of
those people but it also made others very angry. Theme 3 The political context surrounding out-of-level testing has systemic ramifications throughout all levels of an educational system Eight of the
participants in this study indicated that testing students out of level had ramifications
beyond the initial testing situation. In one state where the consequences are high for
those schools that do comply with state regulations, the pressure becomes greater
and greater to get test scores up [and] to exclude kids. . . . Another participant
reported that students who are tested out of level are going to be taken out of a
certain graduation track. Yet another participant suggested that
where
it becomes political is at the reporting. While out-of-level testing may appear more
fair for some students, the test results are not necessarily giving credit where
credits due to the general population of students and their achievement
patterns.... In other words, when the effects of this testing approach are
considered systemically, the true achievement for all students may be distorted by test
scores that do not represent on-grade level progress toward achieving a set of state
standards. Theme 4 States that have tested students out of level for numerous years experience negligible political effects Of the SEA personnel
interviewed for this study, three participants indicated that
out-of-level
testing per se is really not debated within the public. Currently, It
[out-of-level testing] doesnt seem to be one of the big special ed issues.
Each of these three states has tested students out of level for at least three years or
more. Overall, the results
of this study identify the wide variability in out-of-level testing state perspectives and
policies across all states that test students out of level in large-scale assessments.
These results also point to the unresolved issues that surround out-of-level testing
policies. The following section of this report describes a series of key issues that are
currently unresolved across states that test students out of level. Key Issues Raised by the Interviews In the following
section, we discuss five key issues that emerged from the telephone interview process (see
Table 5). These issues were
raised by the participants as either concerns about testing students out of level in
general or specific problems that states have encountered when implementing an
out-of-level testing program.
Table 5. Key Issues Raised by the Telephone Interviews
Issue 1 The practice of out-of-level testing is vulnerable to misuse Some participants
expressed concern about the possible misuse of out-of-level testing. This issue is
difficult to explain in general terms because of the discrepancies in out-of-level testing
policies across states. For instance, all 12 states that test students out of level
require documentation in a students IEP that a statewide assessment will not be
administered on grade level. However, many states indicated that there were no specific
procedures in place to do so. One state indicated that conversations are ongoing to
develop documentation guidelines, but the challenge lies in translating these
conversations into consistent assessment practices. Even when out-of-level tests are
consistently documented, only half of the states have developed monitoring procedures to
ensure that the number of students tested out of level is not excessive. A few states have
set numeric limits on the number of students to test out of level, but this practice has
not been adopted in most states that currently test students out of level. Misuse of
out-of-level testing is likely if consistent documentation and monitoring procedures are
not established to guard against inappropriate testing. The manner in which
decisions are made about testing students out of level in most states is open to misuse as
well. In all states, the decision to test out of level is made by an IEP team. It is quite
possible that well-meaning educators or family members may assume that some students with
disabilities could not participate in statewide tests. In these cases, exclusion from
testing may seem to be in the students best interest. However, recent research has
demonstrated that students with disabilities, who were once thought to be unable to
participate in large-scale assessments, can not only participate but perform better than
expected (Bielinski & Ysseldyke, 2000). Further, supporting the participation and
performance of these students may seem difficult if the student needs an accommodated
test. From this stand point, out-of-level testing may appear to be an easier approach to
including some students with disabilities in large-scale assessments. Issue 2 The IEP team decision making process to test a student out of level is complicated It is assumed that
IEP teams make good decisions about testing students out of level. However, there are no
research studies that confirm or disconfirm this assumption. At best, the decision to test
a student out of level is made subjectively. But again, no research study has described
how and why these assessment decisions are made. Just as importantly, there are no
investigations that describe whether students or parents participate in the
decision-making process or, if they do participate, how well informed the decisions are to
use out-of-level tests. To complicate the
decision-making process further, only a portion of the states that test students out of
level have developed criteria to use in identifying students for these assessments. Most
of these sets of criteria lack concrete determinants necessary for separating students for
out-of-level testing from students who are more appropriately tested on level. Further,
only a few states require separate documentation in student files from the IEP itself.
There are some states that have developed out-of-level test forms that require IEP teams
to follow specific steps in selecting students appropriately for out of level tests. But
for those states that do not use out-of-level testing paperwork, there are no assurances
about how teams select students for out-of-level tests. Further, there are
no guarantees that the team has considered the long-range implications of testing a
student out of level. There are also no guarantees that the parents of students who are
tested out of level understand the consequences of out-of-level testing. For instance,
some states do not grant a regular high school diploma to those students who have taken a
statewide assessment out of level. It is essential that IEP team members, including the
student and the students parents, understand the ensuing ramifications of taking
out-of-level tests. To avoid unintended consequences of testing students with disabilities
out of level, it is imperative that all team members fully participate in the decision to
use out-of-level tests, which includes selecting students appropriately for this testing
approach. Issue 3 Out-of-level test scores are difficult to report at the state level Out-of-level testing
also introduces several problems for states in reporting assessment results. The
procedures involved in accurately transferring the number of students tested out of level
from the classroom level to the state level for reporting purposes are complicated. One
state expressed concern about the difficulty of obtaining an exact number of students who
participated in out-of-level testing. Problems arise in schools on the day of testing,
such as student or teacher absenteeism, inaccurate counts of out-of-level tests, or
improper submission of the number of tests to the SEA. Once the SEA
receives the out-of-level test scores, the procedures to report assessment results are
also complicated. States have resolved this issue through a variety of means. One state
has developed transformation rules for entering out-of-level test scores in an
accountability index. Other states that administer norm-referenced tests as a statewide
assessment use the test companys recommended procedures for converting out-of-level
test scores to in-level test scores. However, the majority of states expressed concern
about how to report out-of-level test scores. For instance, aggregating out-of-level tests
with in-level test scores may result in reporting biased test scores (Bielinski, Thurlow,
Minnema, & Scott, 2000). Alternatively, disaggregating out-of-level test scores and
reporting these scores separately does not provide group performance data that are
representative of all students in a given grade. Issue 4 Using out-of-level test scores for accountability purposes is difficult In thinking a step
beyond public reporting of assessment results to using test scores for accountability
purposes, out-of-level testing again creates problems for states. Some participants
indicated a need to better consider the purpose of large-scale assessments when making
decisions about an out-of-level testing program. One participant suggested that decisions
made to improve the validity of an instrument rarely take into account the purpose of the
test. It is possible to develop an assessment instrument that is valid and reliable within
one context but not another. For instance, a criterion-referenced test taken out of level
and used for student accountability purposes is presumed to measure a students
ability with greater validity. If the test items are linked to the students
curriculum, test results are thought to be more useful for making instructional decisions
than the results from an in-level test. However, when the same test results are applied to
a system accountability program, the test results are thought to be less useful. It is
difficult to make school improvement decisions for certain grades when all of the scores
reported are not on level. In fact, some participants indicated that their state did not
know how to use out-of-level test scores for system accountability purposes. This concern
seemed to be more problematic for those states that do not convert out-of-level test
scores to in-level scores. For the states that equate out-of-level test scores, using the
test scores for both student and system accountability purposes seemed to be less
problematic. Also tied to these
accountability issues is the concern that teachers lack system accountability literacy.
Some of the participants identified the need for additional resources to assist teachers
in understanding the connection between testing an individual student at the classroom
level and initiating school improvement through system accountability. These participants
further asserted that with system accountability literacy, teachers would make better use
of in-level testing. This belief may be particularly true for students with disabilities
who, with the exception of a small percentage of students within each school district, are
capable of participating in the regular assessment (Bielinski & Ysseldyke, 2000). Issue 5 There is wide variability across states in the practices used to test students out of level A final issue that
is important in understanding the status of out-of-level testing is the wide variability
in the practice of testing students out of level. For example, procedures used to
administer out-of-level tests differ from state to state as do the procedures used to
report out-of-level test results. The type of test used for out-of-level testing varies
also. States seem even to disagree as to whether out-of-level testing is an accommodation
that results in a standard test presentation or a modification that yields a nonstandard
test presentation. The ramifications of this variability in out-of-level testing practices
are extensive for educators, policymakers, and researchers. One such effect concerns
investigating specific aspects of testing students out of level. There is a critical need
for empirical research to sort out the difficult issues that surround out-of-level
testing. However, researching the construct is hampered by the lack of consistency in the
practice of out-of-level testing nationwide. Study Constraints There are several
study constraints that are important to highlight when considering the implications of
these results. First, this research design is a cross-sectional view of the current status
of testing students out of level nationally. In other words, these data only describe
state practices and perspectives on testing students out of level at one point in time.
Cross-sectional research designs are particularly problematic for policy research, since
educational policy is implemented within a context of rapidly changing public opinions,
attitudes, and values. The interview data collected for this study only describe the
context of conducting out-of-level testing for a limited period of time. A series of
interviews with the same SEA would more aptly capture the rapidly changing context that
surrounds testing students out of level. A cross-sectional
design for policy research is problematic for a second reason. The development and
implementation of educational policy is a process that evolves over time. In other words,
state perspectives on out-of-level testing and the practices used to implement the policy
are not static, but change with time. From this perspective, a cross-sectional look at
out-of-level testing does not capture the element of change that is central for
understanding the policy in its entirety. Since data were collected at only one point in
time, the developmental aspect of policy is not fully explained. Again, a series of
interviews would better explicate the aspect of change in out-of-level testing policies
nationwide. A final constraint
on this study concerns the purposive sample of participants. These participants were
either state assessment directors or state personnel nominated by a state assessment
director. Since only one participant from each SEA was interviewed, the final data set
represented the perspectives or practices of only one person from each state. Moreover,
participants were recruited from either assessment or special education divisions of SEAs.
Within a single SEA, it is likely that one divisions opinions, perspectives, and
knowledge do not necessarily match those of another division. Possibly, if additional
participants were interviewed from each SEA who represented different divisions within the
SEA, this study would have garnered a more complete description of state perspectives and
practices regarding out-of-level testing. Recommended Next Steps in Research In order to make
appropriate decisions about including all students in large-scale assessment programs, the
need for additional research in the area of out-of-level testing is crucial. To better
understand the status of out-of-level testing nationwide, it is important to describe how
many students are tested out of level and who they are. In addition to the prevalence of
out-of-level testing, there is a critical need to determine whether intended or unintended
consequences are occurring when students are tested out of level. Finally, it is essential
for the field to develop well-researched guidelines and parameters to guide states in
making decisions about out-of-level testing. Conclusions Of the states that
test out of level, some do so because a governing body has mandated the testing policy.
Other states have a history of testing out of level so that the testing policy is an
integral component of a statewide assessment program. The remaining states have elected to
develop an out-of-level testing program based on either stakeholder input or action
research. If states continue to implement out-of-level testing in large-scale assessments,
it is essential to consider the four concerns that follow. First, there are no
research studies to date that demonstrate the value or lack therein of testing students
out of level (Minnema, Thurlow, Bielinski, & Scott, 2000). Minnema et al. (2000)
further state that the field has yet to determine a set of guidelines that support the
appropriate use of out-of-level testing. Without a research base to guide decisions about
how to best test students out of level in large-scale assessments, it behooves
policymakers and practitioners to implement an out-of-level testing program cautiously. At
minimum, out-of-level testing policy should be written to discourage testing students out
of level. Again, while not yet documented in the research base, a fundamental concern
about out-of-level testing is that teachers will reduce their instructional expectations
for a student who takes a test at a grade level lower than his or her assigned grade.
Testing a negligible number of students out of level will ensure that more students are
challenged to strive toward grade level standards. Second, one
rationale for testing students out of level is the desire to include all students in a
statewide assessment programs. Out-of-level testing appears to be a logical solution for
assessing those students who are striving to meet grade level standards, but at a slower
learning pace than their same age peers. The decision to test students out of level is
further reinforced by those test companies that market out-of-level tests. However, it has
been shown that states may not receive all of the psychometric information necessary to
make informed decisions about an out-of-level testing program (Bielinski, Thurlow,
Minnema, & Scott, 2000). Compounding the situation further is the lack of focused
research studies that explicate the psychometric properties of out-of-level tests. It is
therefore essential that states proceed cautiously in testing students out of level until
the inherent validity and reliability issues are sorted out in the research base. Another
consideration for states that test students out of level is the unresolved dilemma about
how to use out-of-level test scores for system accountability purposes. To further
complicate the problem, there are no agreed upon reporting procedures that can be used as
a recommended format for either student or system accountability purposes. Linked to these
accountability problems is the concern that out-of-level testing will become a means to
exclude lower performing students from accountability systems. This concern is
particularly relevant for those states that have high stakes system accountability
procedures in place. It is therefore suggested that if states do test students out of
level, states carefully report all student test results publicly that include out-of-level
test scores. It is further recommended that systematic monitoring procedures be put in
place to ensure that the selection of students for out-of-level testing is as appropriate
as possible. Structured monitoring practices can guard against excluding high numbers of
students who are tested out of level from district and state reporting systems. A final concluding
consideration is the current status of the research base that supports the practice of
out-of-level testing. While out-of-level testing is a contentious issue in policy making
and practitioner circles, the merits of out-of-level testing could be debated
continuously. Without data to substantiate either supporting or opposing out-of-level
testing, the issues cannot be definitively resolved. It is important for all state level
assessment and special education personnel to access current research results that inform
their understanding about the issues that surround out-of-level testing, and support them
in making appropriate policy decisions about testing students out of level. References Bielinski, J., &
Ysseldyke, J. (2000). Interpreting trends in the
performance of special education students. (Technical Report 27). Minneapolis:
University of Minnesota, National Center on Educational Outcomes. Bielinski, J.,
Thurlow, M., Minnema, J., & Scott, J. (2000). How out- of-level testing affects the
psychometric quality of test scores (Out-of-Level Testing Report 2). Minneapolis:
University of Minnesota, National Center on Educational Outcomes. Minnema, J.,
Thurlow, M., Bielinski, J., & Scott, J. (2000). Past and present understandings of
out-of-level testing: A research synthesis. (Out-of-Level Testing Report 1). Minneapolis:
University of Minnesota, National Center on Educational Outcomes. Thurlow, M.,
Elliott, J., & Ysseldyke, J. (1999). Out-of-level testing: Pros and cons (Policy
Directions 9). Minneapolis: University of Minnesota, National Center on Educational
Outcomes. Appendix A Telephone Interview Protocol (1)
Why does your state use out-of-level testing? (2)
Probe: Are these reasons the complete rationale for testing out
of level in your state? (3)
What guidelines and policies are written about out-of-level
testing in your state? How are school districts informed about these guidelines and
policies? (4)
Describe the procedures used when testing out of level in your
state?
Probe: How are students selected for testing out of level?
Probe: How many levels below or above grade level does your state
test?
Probe: What is done with the out-of-level score after testing?
Probe: If norm referenced testing is not done in your state, what
makes up an out-of-level test in your state? (5)
In your opinion, what are the advantages of using out-of-level
testing? What are the disadvantages of using out-of-level testing? (6)
Is out-of-level testing used for system accountability in your
state? If so, how? Or for student accountability? And again, if so, how? (7)
Does out-of-level testing impact grade promotion in your state?
If so, how? Does out-of-level testing impact meeting graduation requirements? Again, if
so, how? (8)
Are there auditing or quality control procedures in place in
your state to make sure that out-of-level testing is used appropriately? If so, what are
they? (9)
What are the requirements for documenting out-of-level testing
in students Individual Education Plans or other student records? (10) Are
the results of out-of-level testing reported to parents? If so, how? And, when? Also, are
the results of out-of-level testing reported to the public? If so, how? And, when? (11) As
the final question, please describe the settings in which out-of-level testing is
discussed in your state?
Probe: For instance, are policymakers discussing out-of-level testing
informally?
Probe: Is there a public reaction to testing out of level in yours
state?
Probe: Would anyone else be aware of other settings in which
out-of-level testing is discussed? |