Massachusetts: One State's Approach to Setting Performance Levels on the Alternate Assessment

NCEO Synthesis Report 48


Published by the National Center on Educational Outcomes

Prepared by:
Dan Wiener
Assessment Coordinator for Special Populations
Massachusetts Department of Education

November 2002


Any or all portions of this document may be reproduced and distributed without prior permission, provided the source is cited as:

Wiener, D. (2002). Massachusetts: One state's approach to setting performance levels on the alternate assessment(Synthesis Report 48). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved [today's date], from the World Wide Web: http://cehd.umn.edu/NCEO/OnlinePubs/Synthesis48.html


Executive Summary

In Massachusetts, about one percent of all students being assessed submit portfolios for the Massachusetts Comprehensive Assessment System (MCAS) Alternate Assessment. These portfolios are based on "expanded" state standards that describe academic outcomes appropriate for students with significant disabilities. Teachers collect "evidence" of their students’ performance on the standards during targeted instructional activities or structured student observations to create portfolios that contain an array of work samples, instructional data sheets, audio- and videotapes, or other evidence organized into "portfolio strands" in each content area.

MCAS Alternate Assessments are submitted to the state for scoring and designation of a performance level that gives parents and teachers information on how well these students are learning the general curriculum relative to their past performance and the performance of other students. The process used by the Massachusetts Department of Education to assign performance levels to alternate assessments is the focus of this report. This technical phase, called standard setting, reflects several steps that typically occur between scoring and reporting. However, the process reflects theoretical debates and decisions that occurred much earlier in the development process of the alternate assessment, sometimes years before the first portfolio was compiled and submitted. Several of these earlier conversations and their consequences are also described in this report since the recommendations form the philosophical basis of much that followed.

The alternate assessment in Massachusetts is one pathway to meet the state requirements for earning a "competency determination" needed to receive a regular high school diploma. Therefore, it was necessary to calibrate performance levels precisely between the alternate assessment and the general assessment, especially at the Needs Improvement level, which is the level required to earn the competency determination. Massachusetts decided to use an analytical rubric to convert raw scores to performance levels. Combinations of scores that could be obtained across the alternate assessment scoring rubrics for Level of Complexity, Demonstration of Knowledge and Skills, and Independence were discussed and reasoned perceptions were used to assign performance levels of Awareness, Emerging, Progressing, and Needs Improvement or above (Proficient, Advanced) in each portfolio strand. The reasoning behind the Massachusetts approach and the ways in which performance levels in each strand are combined to produce an overall performance level is described further in the report. This approach reflects not only the Massachusetts standards, but also its unique culture and values.


Overview

States are finding different ways to adapt their accountability systems to include all students, because the achievement of students with disabilities has typically lagged behind that of their non-disabled peers, and because recent state and federal laws require the participation of all students. Special educators are considering how, rather than whether, students with disabilities will participate in statewide assessments, while assessment policies themselves have become more flexible in accommodating the administration of those tests. Curriculum experts are placing increased emphasis on teaching students with disabilities the same content and skills being taught to their non-disabled peers, while regular and special educators are working together to adapt curriculum and instruction so diverse learners may participate more fully in academic activities.

A comparatively small number of students with the most complex and significant disabilities, though, have been more difficult to include in statewide assessments. Academic skills and subject matter have not always been a part of the curriculum for this population, and information has not systematically been collected on what these students have learned. The performance of these students is not easily determined using the same standardized paper-and-pencil tests used with the majority of students, but since participation in these assessments is now required, states have had to decide how best to include these students by giving them "alternate assessments." Alternate assessment methods and formats are determined by each state individually, though their common purpose is to improve instruction for these students and report their academic performance. By using alternate assessments with this population, schools can document what is being taught for purposes of system accountability, and demonstrate to parents and the public to what degree each of these students has learned state standards.

A majority of states have adopted individual academic portfolios as the most effective method of assessment for these "difficult-to-assess" students. Student portfolios accommodate a range of approaches to document learning, and afford teachers options for determining the ideal time, place, and method to assess their students. Portfolios provide teachers, students, and their parents with tangible evidence of student performance and feedback on their progress. While the contents of each is unique, their structure allows for evaluation and scoring using uniform criteria that can be shared with teachers beforehand.

The demands of creating and managing portfolios, and compiling this information for submission to the state, however, requires additional expertise on the part of teachers and time in which to complete this work. This fundamental change in classroom practice has required states to make a strong and continued commitment to provide professional development and technical assistance to educators who conduct alternate assessments, and to engage in an open dialogue about the efficiency, rigor, and usefulness of the process with those who are most affected by it.


Alternate Assessment in Massachusetts

In Massachusetts, about 5,000 students, or one percent of all students being assessed, submit portfolios for the Massachusetts Comprehensive Assessment System (MCAS) Alternate Assessment. In creating portfolios, their teachers must first identify challenging outcomes for each student based on the standards in each content area being assessed. Many states, including Massachusetts, use an "expanded" version of their standards that describes academic outcomes that are appropriate for students with significant disabilities. Teachers then collect "evidence" of their students’ performance on those standards during targeted instructional activities or structured student observations. Portfolios may contain an array of work samples, instructional data sheets, audio- and videotapes, and other evidence organized into "portfolio strands" in each content area.

Once MCAS Alternate Assessments are submitted to the state, these are scored and a performance level assigned in each content area so parents and teachers have information on how well these students are learning the general curriculum relative to their past performance and the performance of other students. The process used by the Massachusetts Department of Education to assign performance levels to alternate assessments is the focus of this report. This technical phase, called standard setting, reflects several steps that typically occur between scoring and reporting (Quenemoen, Rigney, & Thurlow, 2002). However, the process reflects theoretical debates and decisions that occurred much earlier in the development process of the alternate assessment, sometimes years before the first portfolio was compiled and submitted. Several of these earlier conversations and their consequences are also described in this report since the recommendations form the philosophical basis of much that followed. First among these conceptual discussions was defining who should take an alternate assessment.

 

A Diverse Group of Advisors

Late in 1998, the Massachusetts Department of Education began convening regular task force meetings comprised of DOE staff (from Special Education and Assessment units), the contractor team (Measured Progress and the ILSSA group at the University of Kentucky), and the Massachusetts Alternate Assessment Advisory Committee (a diverse stakeholders group) who provided recommendations to the Department on a range of assessment issues, including:

  • how to provide guidance to IEP teams about which students to consider for alternate assessments;

  • what alternate assessments should look like;

  • how alternate assessments should be scored;

  • which scores should "count" toward overall performance; and

  • how to describe and report the performance of students who take alternate assessments.

 

Guidelines for IEP Teams: Who Should Take Alternate Assessments?

It was assumed from the beginning that students who needed alternate assessments were, for the most part, those who could not take paper-and-pencil tests and whose academic performance was based on the expanded standards appropriate for students with significant disabilities. However, the task force also identified students whose disabilities were not primarily cognitive whom they felt should also be considered for alternate assessments by their IEP and 504 teams. Generally, this smaller group of identified students had disabilities that presented them with "unique and significant challenges" to participation in standardized statewide testing regardless of the accommodations they could use on those tests. They recommended, for example, that students with severe behavioral and emotional disabilities, or those with cerebral palsy, sensory impairments (deaf, blind, or deaf and blind), or fragile health and medical conditions should also be considered for alternate assessments, regardless of their levels of academic performance since taking on-demand statewide tests could present them with insurmountable barriers to their participation, and therefore deny them access to the assessment (Massachusetts Department of Education, 1999).

Based on guidelines provided to Massachusetts IEP Teams since 1999, students across the full spectrum of academic performance, then, are eligible to take alternate assessments, even when they are able to demonstrate the same (or higher) levels of performance as a tested student. They simply require an alternate assessment format to demonstrate their knowledge and skills. Therefore, the MCAS reporting system required sufficient flexibility and integrity to provide meaningful feedback on students who demonstrate a "comparable performance" to a student who scores at the highest levels on the standard tests. It also became necessary to incorporate a method by which a student could meet the state’s graduation requirement through an alternate assessment. The task force strongly advised that the alternate assessment be a different, though not easier, pathway to demonstrate the same performance as a tested student.

 

Scoring Alternate Assessments

The task force next considered and selected criteria on which to base the scores of alternate assessment portfolios. They advised the Department to develop criteria based primarily on student performance, since that is what the standard assessment measured, rather than assessing how well the student’s program provided opportunities to learn this material. Some on the task force, however, felt that student achievement could not be separated from program effectiveness. In the end, a scoring rubric was developed in which four out of six categories are based on student performance, and two reflect the effectiveness of the student’s program:

  • Completeness of the portfolio

  • Level of complexity: the difficulty of academic tasks and knowledge attempted by the student

  • Demonstration of Skills and Concepts: the accuracy of the student’s performance

  • Independence: cues, prompts, and other assistance required by the student to perform the tasks or activities

  • Self-evaluation: the extent to which opportunities are provided to reflect, set goals, evaluate, and monitor the student’s own performance

  • Generalized Performance: the number of contexts and instructional approaches provided to the student to perform tasks and demonstrate knowledge

Scores are determined and reported in each of the rubric areas listed above. Once numerical scores are obtained for a portfolio in these rubric areas, raw scores must somehow be combined to identify an overall performance level in the content area. Before performance levels can be determined, however, several important questions must be answered:

  • What will each performance level be called; how many performance levels will there be; and how will each be defined?

  • Which numerical scores in which rubric areas will be counted in determining the overall performance level?

  • How will numerical scores in those rubric areas be combined to yield a performance level?

  • What range or combination of scores will yield a particular performance level?

 

Defining Performance Levels

The task force recommended that performance levels be identical to performance levels on standard MCAS tests; but that the lowest performance level, called "Warning/Failing at Grade 10" for tested students, would be sub-divided into three distinct levels in order to provide more meaningful descriptions of performance at these lower levels. Figure 1 illustrates the performance levels and definitions used by Massachusetts to report assessment results on the standard and the alternate assessments, and the relationship between the two reporting scales.

 

Figure 1. MCAS Performance Levels

 Figure 1. MCAS Performance Levels


Counting Scores Toward an Overall Performance Level

On several occasions, the task force revisited the question of which scores to count in calculating the overall level of performance. In reviewing the goals, methods, and purpose of the general assessment, they realized, in essence, that regular MCAS tests measure the ability of a student to respond to test items accurately, with no assistance from peers or from the adult(s) administering the test, and that test results are based solely on the correctness of the student’s responses.

In the end, their recommendation was to "parallel the goals, methods, and purpose of the general assessment, where possible," when no other solution is obvious. With this advice, the task force established a foundation for future decision-making, and returned to this guidance frequently.

With these assumptions about the general assessment, and the advice of the task force to parallel the general assessment where possible, the Department decided it would base alternate assessment performance levels on raw numerical portfolio scores given in the areas of completeness, complexity, accuracy, and independence only; but not on self-evaluation or generalized performance, since scores in these last two areas depended on opportunities provided to the student, not on the student’s direct performance of the skill being assessed. Scores in all rubric areas, however, would be reported to schools and parents in order to provide those who work most closely with the student detailed information on his or her performance as shown in Figure 2.

Separate scores are reported for each strand in Level of Complexity, Demonstration of Skills and Concepts (accuracy), and Independence, while scores in the secondary areas of Self-Evaluation and Generalized Performance are combined for the entire content area.

Figure 2. Excerpt of Sample Parent/Guardian Report

 Figure 2. Excerpt of Sample Parent/Guardian Report

 

How Will Numerical Scores be Combined to Yield a Performance Level?

The Massachusetts Department of Education consulted with Ed Roeber of Measured Progress to assist in developing a strategy or formula for combining scores to obtain an overall performance level for each content area. Over time, Dr. Roeber recommended several options for calculating a numerical score total in each content area of a portfolio. The following were two mathematical formulas considered by the Department:

Method #1 - Calculate the sum of scores in three rubric areas:
LC + DSC + Ind = Total Score

Method #2 - Multiply LC by the sum of the other two rubric areas:
LC x (DSC + Ind) = Total Score

Key
LC = Level of Complexity
DSC = Demonstration of Skills and Concepts
Ind = Independence

Consider the following scenario using both scoring methods:

Student A Student B
Raw Scores: Raw Scores:
LC=3 LC=2
DSC=3 DSC=4
I=3 I=4
Student A Total Score (Method #1) = 9 Student B Total Score (Method #1) = 10
Student A Total Score (Method #2) = 18 Student B Total Score (Method #2) = 16


Using Method #1, Student A scored lower (9) than Student B (10), although Student A worked on more challenging subject matter (LC=3) than Student B (LC=2). Using Method #2, on the other hand, Student A scored higher (18) than Student B (16), thereby rewarding Student A for attempting more challenging material. For certain score combinations, Method #1 appeared to create a disincentive for students to attempt increasingly complex skills and content, and discouraged teachers from providing more challenging instruction to their students, which was certainly not the intent of the alternate assessment.

Because the LC score is used as a multiplier in Method #2, scores also were spread over a wider range (1-40), avoiding the possibility of overlapping totals. Method #1, on the other hand, spreads scores across a narrow range (1-13) since scores are simply added together. It was agreed that Method #2 would be explored further for its effectiveness, impact, and unintended consequences, if any.

 

Combining Scores to Yield a Performance Level

Dr. Roeber suggested that MCAS-Alt project leadership meet with regular assessment psychometricians and data analysts from the Department and from Measured Progress to review and select the most effective formula for calculating a total content area score, and to identify "cut scores" for specific performance levels based on a range of calculated score totals. During ensuing discussions, however, questions were raised about the necessity of generating a single total numerical score for each strand and content area in the alternate assessment, and whether it might cause confusion to introduce another, entirely different score scale beside the 200-280 score scale already in use for MCAS test results. Some felt this would reinforce the separateness of the alternate assessment and wondered instead whether a system could be developed that used reasoned judgment, instead of a calculation, to describe overall student performance based on different raw score combinations. After a spirited discussion, this reasoning prevailed, and the idea of calculating a total numerical portfolio score was abandoned in favor of a different approach.

Whether a mathematical equation or a reasoned approach is used to determine a student’s performance, however, some kind of scale, analytical rubric, or other consistent method must be used to convert raw scores to performance levels (Roeber, 2002). The analytical rubric developed for this purpose in Massachusetts is actually a series of grids based on a student’s score as shown in Figure 3.

Sixty-four different possible score combinations were discussed and analyzed by the group, and a performance level identified by consensus for each. Decisions were based on reasoned perceptions of what each score combination revealed about the student’s performance, and the relative position of that performance level within the hierarchy of other levels. It was easier to analyze and assign performance levels beginning with the lowest and highest levels, then working toward the middle. In the end, the group was able to define and categorize all score combinations. The model was tested using various arbitrary score combinations to check that the defined performance level made sense, given the student’s scores, and that scores were appropriately scaled relative to adjacent scores.

An analysis of several arbitrary score combinations reveals, for example, that a student who scores LC=3, DSC=2, and Ind=3 according to the MCAS-Alt scoring rubric, is a student who is working on modified (or "expanded") learning standards, who demonstrates 26-50% accuracy, and who needs assistance 51-75% of the time during standards-based activities (Massachusetts Department of Education, 2001). From this information, the student would appear to be performing above the definition of Awareness in this content area, but not yet at Progressing, in which the student would perform the skills and demonstrate the knowledge with greater independence and accuracy. Since this student is somewhere between the Awareness and Progressing performance levels, we can say with relative confidence that the student is at the Emerging level. Another student who hypothetically scored LC=3, DSC=3, Ind=4 is also working on modified standards, but performs with a sufficiently high rate of accuracy and independence to be placed in the Progressing performance level. He or she is probably ready to attempt even more challenging tasks, skills, and concepts in the coming year, since the data suggest he or she has mastered skills and content in the current portfolio. Figure 3 shows the complete analytical rubric for determining performance levels in each portfolio strand.

 

Figure 3. Analytical Rubric for Determining Performance Levels in Each Portfolio Strand

Figure 3. Analytical Rubric for Determining Performance Levels in Each Portfolio Strand

 

Calculating the Overall Performance Level

Once performance levels are determined for each of three required portfolio strands in the content area, based on the analytical rubric shown in Figure 3, these are averaged and rounded to the nearest whole number to determine the overall performance level in that subject. To calculate the average of three performance levels, consecutive numerical values are given to each performance level, as follows: Awareness = 1, Emerging = 2, Progressing = 3, Needs Improvement = 4, etc. Figure 4 shows how different combinations are averaged to yield a final performance level.

 

Figure 4. Performance Levels in Each Strand are Averaged to Determine an Overall Performance Level

Student

Portfolio Strand

Performance Level

#1

#2

#3

A

Aw (1)

Aw (1)

Em (2)

Awareness (ave. 1.33)

B

Aw (1)

Em (2)

Em (2)

Emerging (ave. 1.67)

C

Em (2)

Pg (3)

NI (4)

Progressing (ave. 3.0)

 


Meeting the State’s Graduation Requirement Through MCAS Alternate Assessment

A performance level of Needs Improvement or higher is required on grade 10 MCAS assessments in English Language Arts and Mathematics in order to earn a "competency determination" (the state’s requirement to receive a regular high school diploma). As previously stated, alternate assessment is one pathway to meet that requirement. Therefore, it is necessary to calibrate performance levels precisely between the alternate assessment and the general assessment, especially at the Needs Improvement level. What does a Needs Improvement portfolio look like, and what specifically constitutes a "comparable performance" to a student who was tested and earned this score? Although portfolio scorers can accurately determine a portfolio’s completeness, accuracy, and independence of performance, an additional level of review seemed necessary in order to assure the breadth, quality, and comparability of the student’s performance to that of other students who passed the grade 10 MCAS tests in those subjects.

To accomplish this, the Department convenes a panel of math and English language arts content specialists each year to review a selection of grade 10 portfolios set aside for this purpose, and to make recommendations to the Department on whether these students have demonstrated achievement at or above Needs Improvement level based on the evidence in their portfolios. Panelists, themselves, were selected by the Department for their secondary-level teaching expertise in the content area; their experience serving on the state’s Assessment Development Committees that develop and review general assessment test items with the state’s test contractor; and their extensive familiarity with Massachusetts Curriculum Frameworks. Panelists are familiar with work typical of students who "passed" the grade 10 MCAS tests in ELA and Mathematics since they teach these students on a daily basis. Panel members were asked to examine pre-scored portfolios at Level of Complexity 4 and 5, and to verify whether they felt the contents:

  • document the full range of learning standards, covering knowledge and skills tested on grade 10 MCAS tests in the content area;

  • demonstrate a level of performance typical of students who perform at the Needs Improvement level on the MCAS test in that subject; and

  • exemplify an even higher performance level than Needs Improvement; for example, Proficient or Advanced.


Conclusion

Although the number of students each year who perform at or above the Needs Improvement level on grade 10 ELA and Math alternate assessments is relatively small, this number can be expected to grow over time. Of course, as teachers also gain familiarity with portfolio management techniques, submission requirements, curriculum alignment, and instructional improvements, the scores of all students will rise. It is important for states to demonstrate the effectiveness of their statewide alternate assessments to improve the nature of instruction for students with significant disabilities generally, and to show that these improvements translate into expanded opportunities for these students both in and out of school. It is also important to demonstrate the capacity of the alternate assessment to assist students to meet the same important scholastic requirements as other students.

Developing a statewide alternate assessment presents states with a range of difficult choices, such as how to determine participation, measure performance, and report results. The demand for professional development and technical assistance required by such a system can be intensive, and there must be an ongoing commitment by state assessment personnel to maintain communication and accessibility with the public. In the end, each state must ultimately develop an alternate assessment that reflects not only its standards, but its unique culture and values that is integrated with the standard assessment system, and that promotes the greatest benefits to the most students.


References

Kleinert, H., & Kearns, J. (2001). Alternate assessment: Measuring outcomes and supports for students with disabilities. Baltimore, MD: Paul H. Brookes

Massachusetts Department of Education. (1999). Participation guidelines for MCAS Alternate Assessment. Malden, MA: Author.

Massachusetts Department of Education. (2001). Rubric for scoring portfolio strands in the 2002 Educator’s Manual for MCAS Alternate Assessment. Malden, MA: Author.

Quenemoen, R., Rigney, S., & Thurlow, M. (2002). Use of alternate assessment results in reporting and accountability systems: Conditions for use based on research and practice (Synthesis Report 43). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.

Roeber, E. (2002). Setting standards on alternate assessments (Synthesis Report 42). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.