Psychometric Impact
of Out-of-level Testing
This survey asks for your expert
opinion about how out-of-level testing effects test score reliability and validity. An
out-of-level test is defined here as any test taken by a student that is other than the
test taken by the majority of students in her/his grade.
A context is provided for each question because the conditions within which
out-of-level testing occurs may influence your opinions.
Please note: you may change you
survey responses at any time before pressing the submit button. However, once you have submitted the survey, all
responses will be final and additional changes will not be possible. If you have any questions, or if you run into any
problems filling this survey out, please contact John Bielinski at bieli001@umn.edu
Thank you for
taking the time to complete this survey.
From item response
theory, it can be shown that measurement precision increases as the match between test
difficulty and person ability increases. One
benefit of out-of-level testing is that test score reliability may be increased for those
students who would otherwise earn a very low or a very high score on the in-level (grade
level) test. However, when the performance of
the students taking an out-of-level test is to be reported on the scale of the in-level
test, some form of equating of scales or linking of item parameter estimates is required. Equating may add measurement error to the scaled
score. Questions 1-3 ask your opinion on the
degree to which the potential gain in measurement precision from out-of-level testing is
offset by the measurement error introduced in the equating process.
A state wants to
assess math proficiency for all its 8th graders. An off-the-shelf norm-referenced test was chosen
that was standardized on an 8th grade sample, and was reasonably well aligned
to that states mathematics standards. The
8th grade test is part of a multi-level testing system in which test scores
from any level of the test can be placed onto a common scale. The test publisher conducted vertical
scaling/equating studies for this purpose.
Please rate the extent to which the measurement error added in the vertical equating process offsets the gain in measurement precision obtained by giving a student an out-of-level test under each condition.
A.) Assume that a brief
locator test was used to assign each student to a test level
.
The student was
assigned to the test |
None |
Too little to Warrant concern |
Some |
A lot |
·
One level below |
A |
A |
A |
A |
·
Two levels below |
A |
A |
A |
A |
·
More than two
levels below |
A |
A |
A |
A |
B.) Assume that the
classroom teacher made the decision as to which level each student should take
.
The student was
assigned to the test |
None |
Too little to Warrant concern |
Some |
A lot |
·
One level below |
A |
A |
A |
A |
·
Two levels below |
A |
A |
A |
A |
·
More than two
levels below |
A |
A |
A |
A |
A state wants to
assess math proficiency for all its 8th graders. The state has developed a test specifically
designed to measure that states math standards.
The state has also developed a math test to assess the math standards for its 5th
graders, and one to assess the math standards for its 3rd graders. The state conducted an equating study so that a
score from either the 3rd grade or the 5th grade test could be
translated into a score on the 8th grade test.
A random sample of 500 8th graders took both the 8th grade
and the 5th grade test, and another random sample of 500 8th graders
took both the 8th grade test and the 3rd grade test. Equipercentile equating was used to translate
scores from the 3rd grade and the 5th grade test to the scale of the
8th grade test. A classroom
teacher familiar with the student made the decision as to which test to give that student. All scores, regardless of the level of the test a
student took were reported on the scale of the 8th grade test.
Please rate the extent to which the measurement error added in the vertical equating process offsets the gain in measurement precision obtained by giving a student an out-of-level test under each condition.
An 8th
grader was assigned to the |
None |
Too little to Warrant concern |
Some |
A lot |
·
5th
grade test |
A |
A |
A |
A |
·
3rd
grade test |
A |
A |
A |
A |
Participants
in the vertical equating studies conducted by the large test publishing companies are
selected so that they best represent the demographic characteristics and ability levels of
the population of students in a particular grade. However,
the students who actually take an out-of-level test are likely to differ from their grade
level peers in both ability and demographic characteristics. What, if anything, do you consider would be the
effect on test results from this dissimilarity between the participants in vertical
equating studies and the students who actually take an out-of-level test?
Introduces random
error
A
Other possible effect
A
Test Score Validity
Out-of-level
testing may increase measurement precision for low or high scoring students. However, there is concern that it may alter the
validity of the test score. Below we are
interested in getting your expert opinion about the degree to which an out-of-level test
score alters the validity of the data. Because
meaningful evaluation of validity requires information about how the scores are to
be reported and used, we present each set within a specific context. The contexts differ in the way in which test
scores are reported and how the results are to be used.
· An off-the-shelf
norm-referenced test was used. The test was
part of a multi-level testing system in which vertical equating was done so that scores
from each test could be placed onto a common scale.
· No student was
allowed to take a level of the test more than two levels below the test intended for
her/his grade level.
· The percent of
students within a grade taking an out-of-level test varied across schools.
· No distinction was
made between students getting an out-of-level test and those getting the in-level test.
Report Format A: At the school
level, the percent of students in a grade that met the state standard |
|||||
|
Effect
on Validity |
||||
Scores will be
used
|
Dramatically
Reduce |
Somewhat Reduce |
No Effect |
Somewhat Enhance |
Dramatically
Enhance |
for
school-to-school comparisons. |
A |
A |
A |
A |
A |
to monitor
adequately yearly progress. Each school must
demonstrate gain in the percent of students meeting the state standard. |
A |
A |
A |
A |
A |
Report Format B: At the school
level, the mean scaled score for each grade tested |
|||||
|
Effect
on Validity |
||||
Scores will be
used
|
Dramatically
Reduce |
Somewhat Reduce |
No Effect |
Somewhat Enhance |
Dramatically
Enhance |
for
school-to-school comparisons. |
A |
A |
A |
A |
A |
to monitor
adequately yearly progress. Each school must
demonstrate a specified gain their mean scaled score. |
A |
A |
A |
A |
A |
Report Format C: Individual
student score |
|||||
|
Effect
on Validity |
||||
Scores will be
used
|
Dramatically
Reduce |
Somewhat Reduce |
No Effect |
Somewhat Enhance |
Dramatically
Enhance |
to determine
eligibility for high school graduation. Each
student must pass the test to graduate. |
A |
A |
A |
A |
A |
to guide
classroom instruction. |
A |
A |
A |
A |
A |
· A state developed
test was used. The testing system included a
3rd grade, a 5th grade, and an 8th grade test. A score from a lower level test (e.g. 5th
grade test) could be translated into a score on a higher-level test (e.g. 8th
grade test). The equating scenario described
in the reliability section, scenario II was used.
· An 8th
grader could take either the 3rd grade, the 5th grade, or the 8th
grade test. A teacher familiar with the
student made the determination as to which test was most appropriate.
· The percent of
students within a grade taking an out-of-level test varied across schools.
· No distinction was
made between students getting an out-of-level test and those getting the in-level test.
· Consider only the 8th
grade test results.
Report Format A: At the school
level, the percent of students in a grade that met the state standard |
|||||
|
Effect
on Validity |
||||
Scores will be
used
|
Dramatically
Reduce |
Somewhat Reduce |
No Effect |
Somewhat Enhance |
Dramatically
Enhance |
for
school-to-school comparisons. |
A |
A |
A |
A |
A |
to monitor
adequately yearly progress. Each school must
demonstrate gain in the percent of students meeting the state standard. |
A |
A |
A |
A |
A |
Report Format B: Individual
student score |
|||||
|
Effect
on Validity |
||||
Scores will be
used
|
Dramatically
Reduce |
Somewhat Reduce |
No Effect |
Somewhat Enhance |
Dramatically
Enhance |
to determine
eligibility for high school graduation. Each
student must pass the test to graduate. |
A |
A |
A |
A |
A |
to guide
classroom instruction. |
A |
A |
A |
A |
A |