View or download the full report as a PDF document.

Developing and Improving Modified Achievement Level Descriptors: Rationale, Procedures, and Tools

Rachel Quenemoen • Debra Albus • Chris Rogers • Sheryl Lazarus

June 2010

Quenemoen, R., Albus, D., Rogers, C., & Lazarus, S. (2010). Developing and improving modified achievement level descriptors: Rationale, procedures, and tools. Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes

Introduction
Methods Used to Develop the Tools
Procedures to Articulate the Educational Logic of ALDs for AA-MAS
- Four-Step Process for Use of Procedures and Tools
Evaluating Differences Between the General Assessment and AA-MAS ALDs
References
Appendix A: Side-By-Side Tables of Achievement Level Descriptors for Grade-Level and Modified Assessments
Appendix B: achievement Level Descriptor Analysis Decision Rules
Appendix C: Procedures and tools to Evaluate or Develop AA-MAS ALDs
Tool Template to Evaluate or Develop ALDs for AA-MAS
- Section I
- Section II: Comparisons and Rationales for Changes to General Assessment ALDs

Introduction

Some states are developing alternate assessments based on modified achievement standards (AA-MAS) to measure the academic achievement of some students with disabilities (Albus, Lazarus, Thurlow, & Cormier, 2009; Lazarus, Thurlow, Christensen, & Cormier, 2007). These assessments measure the same content as the general assessment for a given grade-level, but the AA-MAS may have different expectations of content mastery than the general assessment, according to federal regulations and guidance. The United States Department of Education’s Non-regulatory Guidance (2007b) for AA-MAS states:

This assessment is based on modified academic achievement standards that cover the same grade-level content as the general assessment. The expectations of content mastery are modified, not the grade-level content standards themselves. The requirement that modified academic achievement standards be aligned with grade-level content standards is important; in order for these students to have an opportunity to achieve at grade level, they must have access to and instruction in grade-level content. (p. 9)

State policymakers have struggled to understand the underlying educational logic of the distinctions of the same grade-level content but different expectations of content mastery. Filbin (2008) described content alignment issues as one of the primary challenges for the first six states that submitted their AA-MAS for PeerReviewunderthe2001Elementaryand Secondary Education Act (ESEA) requirements. She found that it is challenging to design an assessment based on grade-level content standards that is of an appropriate difficulty and complexity for this population, based on peer review analyses. Since that first review, special education, curriculum, and measurement experts have posed several questions related to the nature of the distinctions between content coverage and difficulty or complexity (Perie, 2009a).

A key to understanding the relationship of content and difficulty underlying a standards-based test is in the standards themselves. In a standards-based assessment, and specifically in a test that is defined as having “modified achievement standards,” these standards should communicate what kind of performance on which content targets demonstrates acceptable achievement. A standards-based test requires clear definitions of the content being assessed—in relation to articulated content standards—as well as definitions of “how well” students need to perform on the content to be considered proficient—or performance standards. These descriptions are included in the process of standard-setting on a standards-based test.

Standards-based reform has resulted in increased attention to performance standards (Cizek, 2006; Crane & Winter, 2006; Haertel, 2008; Hambleton, 2001; Perie, 2009b; Zieky, Perie, & Livingston, 2008). In 2003, the Council of Chief State School Officers took a broad approach to the definition, defining performance standards as:

Indices of qualities that specify how adept or competent a student demonstration must be and that consist of the following four components: (1) levels that provide descriptive labels or narratives for student performance (i.e., advanced, proficient, etc.); (2)descriptions of what students at each level must demonstrate relative to the tasks; (3) examples of student work at each level illustrating the range of performance within each level; and (4) cut scores clearly separating each performance level. (p. 10)

It is the second component of performance standards—the descriptions of what students must demonstrate on the assessment—that we address here.

Although measurement experts have referred to the four components together as performance standards, and the descriptions of student performance as performance level descriptors (PLDs), ESEA 2001 and IDEA 2004 refer to them as “achievement standards.” The AA-MAS gets its name from that statutory language. Given that we are focusing on the AA-MAS, the term we use in this paper is achievement standards, and we specifically refer to the second component described in the CCSSO definition of these achievement standards as achievement level descriptors (ALDs).

Purpose and Use of This Paper

The purpose of this paper is to provide a rationale, procedures, and tools to develop and continuously improve AA-MAS ALDs. As states make decisions on whether and how to develop an AA-MAS, they will also be developing a defense of the choices they make. Filben (2008) documented the early peer review process and outcomes and it is clear that choices made must be built on a complex educational logic reflecting content coverage, complexity, and the characteristics of the potential participants. In this paper, we propose a process to guide state work so that stakeholders and policymakers can articulate, from the very beginning, the educational rationale for their choices and the implications of this rationale for the specific design choices they make related to their ALDs. By building on this rationale, involving key policymakers and stakeholders through a systematic process to articulate the underlying logic, and documenting how this logic has influenced state choices using the tools and templates provided, states will have compelling evidence for peer review defense. More importantly, they will have confidence in the educational implications of the choices for students and schools in their state.

Uses of this paper in development of AA-MAS ALDs

Background information for policymakers and stakeholders involved in guiding state choices: A summary of why and how ALDs reflect policy imperatives is provided, for use as background for policymakers and to prepare and train stakeholders for participation in advisory roles. Pages 1–12.

Procedures for working with stakeholder and policymaker groups in development and improvement of ALDs: Concrete procedural steps are provided for facilitators who will guide stakeholders and policymakers as they work through the key questions and come to consensus on state choices. See Pages 13–16; Appendices A–C.

Tools and templates for clarifying and articulating the educational logic of the state choices: Key questions are posed for group discussion and reflection; templates and examples are provided for recording consensus understandings and agreements. SeePages17–22;Appendices A and C.

Although ALDs from four states were used to develop the paper, our comparison of these states’ general assessment and AA-MAS ALDs is not meant to make judgments on the quality of each state’s work. Instead, our comparative examples from these states are used to develop and test the rationale, procedures, and tools we provide for states to use as they develop and evaluate their ALDs for AA-MAS in relation to the general assessment. These four ground-breaking states developed ALDs prior to the release of final regulations or to the policy discussion that surrounded the regulations. We recognize these states for their work and realize that they did not design their AA-MAS ALDs for this type of scrutiny. Still, we believe they have provided a great service to states that follow by demonstrating how states may consider the characteristics of modified achievement standards, and over time, the field will have a better understanding of the educational logic inherent in these tests.

It should be noted that this paper is based on considerations of best practice, and it does not attempt to present an authoritative interpretation of federal policies related to AA-MAS. The processes and tools described in this paper are not necessarily endorsed by the federal government, but they may be helpful to states in meeting federal requirements related to AA-MAS.

Background and Selected Literature for Policymakers and Stakeholders

Achievement level descriptors for a standards-based assessment reflect both the content assessed and the challenge or difficulty of the assessment. ALDs describe how different performance levels on a test reflect specific skills and knowledge in the content being assessed. They are important for that reason—it is where teachers, parents, and the public should be able to learn not only what a student should know and do to be proficient, but how well they should do it. In addition, because the ALDs describe how one level of achievement differs from another, they show which specific content, skills, or knowledge are the next steps in learning. As such, the ALDs can be powerful policy statements and often serve as the only source where content and achievement expectations for students are specifically written down in concise terms.

The choices states make about how the achievement standards differ between the general assessment and the AA-MAS reflect an educational logic of sorts, whether or not test developers have formally articulated the logic. In theory, in a comprehensive assessment system like those developed under current ESEA requirements, states that are developing AA-MAS should determine whether the AA-MAS leads logically to other achievement standards within the assessment system, for example, to grade-level achievement standards (GLAS) or to alternate achievement standards (AAS), or if they stand-alone and are disconnected. Those discussions should then guide development of ALDs for each test. States will vary on these decisions. Perie, Hess, and Gong (2008) have suggested that in some states, the early AA-MAS ALDs and items reflected added supports and scaffolding but the content coverage was the same as the general assessment. In other states, the AA-MAS ALDs and items reflected content knowledge and skills that were different from the general assessment. As the regulatory language refined state understanding of the need for the same content coverage as the general assessment, content differences have been minimized in most states approaches.

Based on regulatory language (USED, 2007a) and guidance (USED, 2007b), the comparative status of the AA-MAS to the general assessment as the same content but different expectations of mastery should be reflected in the language of each test’s ALDs. That is, the ALDs of the two tests should be comparable in terms of content coverage by grade but reflect less challenging attainment of the content for similar performance levels, such as proficiency on the general assessment in comparison to proficiency on the AA-MAS.

Less challenging achievement standards may be defined in one or more of several ways by varying several conditions. For example, Perie (2009b) suggests that the descriptors can vary in these ways: (1) reducing the cognitive complexity of the required skill, (2) decreasing the number of elements required, or (3) adding appropriate supports and scaffolds to the description of the knowledge and skills required. Further, she suggests that some combination of the options can be used:

In practice, those drafting the modified achievement level descriptors could choose to adopt more than one of these strategies. That is, they could choose to reduce the depth of knowledge required for proficiency on some of the skills, add scaffolds to the statements about other skills, and provide specific examples to others indicating that the student is required to perform a narrower range of tasks than what is required in the grade-level achievement standards. (pp. 244-245)

ALDs are not always developed prior to test development. Measurement experts disagree on whether they should be drafted to guide test development or determined statistically later by difficulty of items and cut scores (Perie, 2009b). For these initial states, whether they developed them first or statistically after the fact, there should be a noticeable logic underlying the content differences if the test is to achieve the apparent intent of the regulations.

Because the “proficient” level has primary importance in current standards-based accountability designs, ALDs describing the proficient level would arguably be the most promising of the levels to detect the underlying differences and assumptions between general and modified ALDs. Thus, we have limited our analysis to comparing ALDs at the “proficient” level in development of the following tools and procedures. By comparing and contrasting how states describe “proficiency” for the general assessment and the AA-MAS, we were able to identify patterns of variation between them, and assign category names to the patterns for easier analysis. We also identified procedures to make the comparisons more efficient and visible. These categories and procedures were formatted into analyses worksheets and were field-tested on the initial state examples. Practitioners, researchers, and other interested stakeholders can use these tools—the category names and procedures—in development of new ALDs or evaluation of existing ALDs.

Top of page | Table of Contents

Methods Used to Develop the Tools

Collection of achievement level descriptors from state Web sites was completed in early 2009. The collection included only those states that had both general and AA-MAS ALDs for the proficient level available online for reading and math, at grades 4, 8, and 10. This process resulted in ALDs from four states which were then used to develop and test the tools. Appendix A provides side-by-side ALD texts taken from the full document versions of ALDs posted online for each state.

Category Names for Comparing and Contrasting ALDs

In this report, we demonstrate processes and tools to help build a defense of state choices for AA-MAS. We compare and contrast ALDs for the general assessments and the AA-MAS. We have not included a comparison of each state’s content standards, and have tried to avoid the use of terms associated with each of the most widely used alignment methodologies. Although the ALDs reflect the content standards and are often considered in alignment studies, the terms used in alignment methodologies have specific and complex meanings that are inherent to each of the approaches (Porter & Smithson, 2002; Rothman, Slattery, Vranek, & Resnick, 2002; Wakeman, Flowers, & Browder, 2007; Webb, 1999).

Instead, we used more generic terms that can be tailored to a specific setting, as appropriate, as test developers or policymakers work to improve the quality of their ALDs. For example, rather than using terms like “cognitive complexity” or “depth of knowledge,” we used categories of “content” (what), “application” (how), and “degree” (how well). Rather than using a term like “scaffolding,” we chose the general category of “context” (under what conditions). These categories and their definitions are shown in Table 1.

Researchers or practitioners who use this approach to compare and contrast ALDs on specific assessments can refine these coding categories consistent with the terminology used in test development and alignment studies in their state. For example, as the tools are tailored to state use by state staff or facilitators, additional terms or clarifications for each category could include for example the term “frequency” or “how often or consistently” in the definition of degree. This comparative analysis tool is simply a tool, and can be amended to better match existing policy and practice choices.

Table 1. Categories Used for Comparing and Contrasting ALDs in Tool Development

Content: What is to be known by the student.
Application: How the student uses the content.
Degree: How well or how much is to be known by the student.
Context: Under what conditions the student demonstrates the content.

To test our categories, two project researchers coded all achievement descriptors for each state’s general assessment and AA-MAS. After they independently coded text for the proficient levels, the results were compared and any disagreements were discussed and resolved. Remaining questions or discrepancies were brought to a third project staff person for resolution. There were relatively few areas for resolution, and in all cases, were recorded as decisions rules. See Appendix B for decision rules developed during the process of applying the coding categories, along with other questions and issues identified by research staff. When the tool is used by states, similar notes on decision rules, questions, and issues should be identified to flag areas for further discussion and clarification.

After the initial coding and resolution was completed, the preliminary comparisons were presented to members of a project expert panel (measurement, content, and special education experts) for validation of the process. The expert panel indicated that the categories for coding could be helpful to the field, and endorsed the procedures as useful for both researchers and for practitioners.

Coding Category Examples from State ALDs for General Assessments and AA-MAS

When coding differences in ALDs, project staff looked at the sets of ALDs side by side, as shown in Table 2. Staff members then determined whether each difference was a content difference, an application difference, a degree difference, a context differences, or multiple differences. Full texts are provided in Appendix A, first in original form and then in coded form. Appendix B provides additional information on how decisions were made for coding. Examples of each type of difference are presented in Table 2 in bold within the listed descriptors. The difference categories are more fully described in Tables 3 through 7. Only one example of each coding category is shown in Table 2; others were identified in the actual analyses.

Table 2. Examples of Difference Categories in Original Text Samples for the General

Note: Bolded words indicate a substantive difference.

An example of a content difference is presented in Table 3. Content difference is defined as “what is to be known by the student.” These texts were coded as a content difference because the general ALD mentions that the student will be able to read for meaning and detail as well as have an adequate math vocabulary and the AA-MAS ALD only mentions having an adequate math vocabulary.

Table 3. Coding Example: Content Difference in ALDs for the General Assessment and AAMAS Grade 8 Mathematics at “Meets Standard” Level for State 1

General ALD	AA-MAS ALD
Can read for meaning and detail and have an adequate math vocabulary	Have an adequate math vocabulary