NCEO Policy Directions

Published by the National Center on Educational Outcomes
Number 17 / November 2003

Rethinking Basic Assumptions of Test Development: Assessment Frameworks for Inclusive Accountability Tests

Prepared by Rachel Quenemoen and Scott Marion

Any or all portions of this document may be reproduced and distributed without prior permission, provided the source is cited as:

Quenemoen, R., & Marion, S. (2003). Rethinking basic assumptions of test development: Assessment frameworks for inclusive accountability tests (Policy Directions No. 17). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved [today's date], from the World Wide Web: http://cehd.umn.edu/NCEO/OnlinePubs/Policy17.htm

Background

For the No Child Left Behind Act of 2001 (NCLB), states are required to assess the achievement of all students on state-defined grade level content standards. They are to use the results to hold schools, and the state, accountable for all students achieving state-defined proficiency on grade-level content by the year 2014. Schools and the state are to show "adequate yearly progress" (AYP) toward reaching that goal in the meantime. Results from each year’s assessment of student achievement can be used to redesign instructional programs so that more students are successful each year.

If educators were asked today whether all students are achieving at the proficiency level expected by 2014, virtually no educator in the U.S. could say yes. Some would say that the goal is unrealistic; others would say that fundamental changes in instructional approaches, structures, and budgets have to occur to make the goal achievable.

Some educators see a need to improve assessments so that they inform instruction on grade level content. These educators are calling for assessments based on a limited number of clearly defined and important constructs that reflect what all students should know and be able to do at grade level, and that are constructed to yield unambiguous evidence of whether groups or classrooms of students have mastered those constructs at grade level. They are rethinking common assumptions about the entire test development process—from the definition of the constructs to be tested, through development, to implementation.

This requires a transparent discussion on how to put into practice the new assumptions about teaching and learning that have come out of standards-based reform so that assessments can be designed to meet the accountability requirements of NCLB, yet give better information about the performance of students within the prescribed content. Compatible classroom assessments can provide additional diagnostic information for all students. Clarification and articulation of content standards—that is, precisely defined constructs—are the foundation of good assessments in order to maximize the measurement of achievement rather than the effects of disability, cultural background, or socioeconomic status.

As an example, the Commission on Instructionally Supportive Assessment (2002) identified nine requirements for assessment development (see Table 1). These requirements were ones the Commission believed that, if implemented, would create responsible assessments for the improvement of students’ learning.

Table 1. Instructionally Supportive Assessments

Purpose, Procedure, and Challenges

Purpose: To rethink test development procedures, starting from the alignment of content standards, instruction, constructs being assessed, and assessment. Instructionally supportive assessments are based on the belief that required accountability tests must be useful to educators concerned about improving the instruction of children.

Procedure: Starting from a very comprehensive approach, the Commission on Instructionally Supportive Assessment (2002) identified nine requirements for assessment development:

Prioritized content standards.

High-priority content standards described.

Standard-by-standard reporting for high-priority standards.

Optional classroom assessment procedures for other standards.

Breadth of curriculum monitored by the state.

Well designed assessments with accommodations and alternate methods of assessment.

Minimum of 3 years to produce assessments.

Professional development on how to optimize children’s learning.

Evidence of appropriateness for accountability, assessment of standards, and enhancing instruction, and lack of negative consequences.

Specific reasons to support each of these requirements are provided. The requirements can be met either by drawing on the capabilities of state agencies or by issuing competitive requests for proposals to firms or individuals capable of carrying out one or more of the required activities.

Challenges: Instructionally supportive assessment involves rethinking the entire system, including the content standards, aligned curriculum, and rich and varied instruction, to ensure that educators know what to teach and that children are taught. It is easier to focus on narrow aspects of assessments.

Opportunities for Students with Disabilities

Dr. Jim Popham, chair of the Commission on Instructionally Supportive Assessment, summarized what he considers to be the implications of the assessment requirements of the No Child Left Behind Act for children with disabilities. This summary is presented in Table 2.

Table 2. Inclusive Assessment Requirements and Instructionally Supportive Assessments

Implications Identified by Jim Popham

Federal law now requires that essentially all American children, including those with disabilities, must complete state assessments (sometimes with disability-specific accommodations and alternate assessments) in seven grade levels.
Irrespective of whether one regards such assessments as educationally desirable, it is now a federal requirement.
As is often the case with federal or state education-related laws, such statutes should be implemented in a manner that is most educationally beneficial to the children affected by those laws.
In order for state assessments to be educationally beneficial, they must possess two assessment attributes, namely, they must (a) describe with clarity the skills and/or knowledge they assesses and (b) provide results in a form so that teachers can identify which parts of their test-related instruction were effective or ineffective.
These two assessment attributes cannot be satisfied by standardized tests that are (a) traditionally constructed to permit norm-referenced interpretations, or (b) customized to provide standards-based interpretations about students’ mastery of enormous numbers of curricular outcomes.
Thus, to optimize the educational benefits for all children who must complete the new federally required achievement tests, including those children with disabilities, statewide tests intended to satisfy the new federal requirements must attempt to assess only a small number of extremely significant content standards.

(See NCEO teleconference materials at http://cehd.umn.edu/NCEO/Presentations/tele5.htm)

There are opportunities created when "extremely significant" content standards are identified, as explained in point 6 in Table 2. For example, universally designed assessments, which are assessments designed from the beginning to be accessible and valid for the widest range of students, are typically portrayed as one aspect of assessments that are designed to support instruction and accountability for all students. The identification of a small number of "extremely significant" content standards to be the focus of large-scale assessments in itself can be viewed as an aspect of a universally designed assessment. Benefits of merging these ideas for students with disabilities are shown in Table 3.

Table 3. Benefits of Instructionally Supportive Assessments for Students with Disabilities

Benefits

By identifying and clearly defining a reasonable number of content standards, or clusters of content, that are of high importance and challenging, it becomes easier for educators and IEP teams to understand what they need to hold inviolate as they consider and prioritize the focus of a student’s learning time.
Clear definitions of high importance content standards will also make it easier to see the link from grade to grade, so that students will not miss content needed for higher grades simply because educators and IEP teams did not know of its importance when the student was in an earlier grade.
When we have clear descriptions of the high importance content standards, it will be easier to describe what accommodations are appropriate in instruction and in assessment.
When the high importance content is identified, it should become clearer how to define and develop the needed research base for accommodations. That is because assessments of higher-level complex learning are less likely to focus on numerous little skills measured by items that may test the very skills that a disability blocks (e.g., decoding, even at upper grade levels). We may find that students with disabilities perform at higher levels when the assessments truly focus on higher-level knowledge and skills.

Opportunities and Implications for State Policymakers

As state policymakers grapple with the challenge of truly "rethinking" entrenched approaches to content standards and assessment, there are practical and political issues to consider. These include the "yeah, but…" scenarios heard by assessment directors in discussions among policymakers and practitioners across the country. Here are a few of the scenarios and responses related to the implications for students with disabilities of the NCLB assessment requirements.

Yeah, but how can we expect essentially all students, who may be instructed at very different levels, to participate in the same tests?

Response: Federal laws require not only assessments, but also require identical content expectations for all students (other than the small exception proposed for students with significant cognitive disabilities). NCLB requires specifically that all children be assessed on grade level content and achievement standards. There are multiple solutions to this "yeah but…" and they include revisiting how we design instruction and curriculum to ensure all students learn the challenging content from the earliest grade levels.

Clarification and articulation of extremely significant content standards is essential to ensuring a coherent, aligned, and focused teaching and learning system that is accessible to all learners. Even with shifts in how we ensure all children learn, we still have to address the adequacy of current assessment approaches. With the opportunities of instructionally supportive and universally designed assessments aligned to extremely significant constructs, many more students can meet the assessment targets.

Yeah, but if we hunker down, this too shall pass.

Response: The basic assumptions of standards-based reform that underpin the Federal laws – all students, meeting high standards, and accountability on the part of schools and states – have been in place for over a decade. These assumptions have broad bipartisan support, and have shown remarkable political and popular durability. Also, these assumptions are consistent with the predominant agenda of public educators at local and state levels, that is, of meaningful education with high expectations for all students. Looking for ways that we can use the law to further our agenda of improving results for all students, and improving the quality of our educational assessments to measure the results of instruction more accurately, is consistent with that agenda.

Yeah, but how can we change our content standards? Our content experts worry that "if it is not tested, it will not get taught." And we don’t want to inadvertently narrow or "dumb down" the curriculum.

Response: This is perhaps the biggest challenge. In many states, content experts and teachers worked over a period of years to negotiate the academic content standards. In that process, they tended to include more content, in part to avoid conflict and to satisfy multiple constituencies in an inherently politicized system. In many states, this may have resulted in a vast number of content standards or benchmarks, too many to assess in the time available for state-level testing and, perhaps, too many to teach and learn in the time available for instruction.

The pressures created by NCLB provide an opportunity to revisit this approach. Content experts and teachers will agree that schools and teachers should be held accountable for reasonable learning targets, and assessments designed to measure the targets should be reasonable and fair as well. As long as the state-defined content cannot be assessed in total on an assessment, and states have not defined and communicated the content to be assessed, assessment becomes a punishing variation of Russian roulette, with teachers and schools guessing at what will be assessed. No content experts or teachers support that approach, which may be the current reality.

One of the foundations for instructionally supportive accountability tests is that they focus on a modest number of assessment targets. If the accountability purpose of the test is to encourage teaching of important content, then those targets must be truly significant and communicated clearly to students, teachers, and the general citizenry.

The entire system of state content standards does not have to be redesigned. Instead, states can work to define an assessment framework that functions as a companion document to already approved content standards. This requires alignment justifications demonstrating how assessed skills and bodies of knowledge are derived from the state’s existing content standards. The identified content clusters and important constructs that are assessed will incorporate many of the smaller grain size content standards or benchmarks.

There is no magic way to do this, but there are some general principles that should be followed, based on an example used in one state.

The steps that state took can be summarized more generically as:

Raise awareness and develop consensus among various groups of the need to prioritize or rework content standards.
For each content area and grade, form groups to identify:
- non-essential knowledge and skills;
- unnecessary duplications of content across or within grade levels;
- content that cannot be assessed through large-scale methods; and
- content that can be logically clustered into larger constructs incorporating small grain content.
Share the resulting revisions across content areas and grades as a check on progression and importance.
Have people who are very familiar with the content area and people less familiar review the resulting content and progression to ensure that the resulting descriptions are considered important by people other than the writing team, and by those who are not content experts.
Develop an assessment framework based on the resulting content descriptions.

This process does not require new content standards, and specifically avoids identifying "easier" content standards. It instead requires identifying high import challenging content clusters that incorporate essential bits of content and knowledge across the discipline and across grades. It should result in a limited number of challenging and important constructs that will serve as the framework for the accountability test.

The process could also be modified by using a version of Norman Webb’s Balance of Representation protocol. This is another tool that would push people to prioritize the required knowledge and skills for their content areas (see N.L Webb’s Alignment of Science and Mathematics Standards... and Criteria for Alignment of Expectations... in the Resources).

Yeah, but the NCLB accountability requirements are so challenging that many of our schools will fail to meet AYP requirements within the first few years. This seems like an undoable challenge!

Response: We need to use the best research in teaching, learning, and assessment to help states design assessment systems that can promote student learning. As we organize fewer, high import, challenging content clusters to assess, we also provide a clearer "path" to the essential skills and knowledge. This will allow all teachers and all IEP teams, for the first time in many states, to understand how all the hundreds of "tiny grain size" standards work together to result in durable and important skills and concepts along the grade levels. Although NCLB accountability provisions may not allow for a great deal of innovation and tailoring, working to make the assessment instructionally sensitive and universally designed will make a big difference ensuring that all schools, districts, and states are successful. Most important of all, it will make a difference in ensuring that all students are successful.

Resources

Alignment of Science and Mathematics Standards and Assessments in Four States (NISE Monograph No. 18). Webb, N.L. (1999). Published jointly by the National Institute for Science Education and the Council of Chief State School Officers.

Criteria for Alignment of Expectations and Assessments in Mathematics and Science Education (Monograph No. 6). Webb, N.L. Madison: Wisconsin Center for Education Research, Council of Chief State School Officers and National Institute for Science Education.

Building Tests to Support Instruction and Accountability: A Guide for Policymakers. Commission on Instructionally Supportive Assessment. (2001). AASA, NAESP, NASSP, NEA, NMSA. Available from the American Association of School Administrators Web site at http://aasa.org/issues_and_insights/assessment/Building_Tests.pdf. Also see companion piece on Illustrative Language for an RFP to Build Tests to Support Instruction and Accountability, at http://aasa.org/issues_and_insights/assessment/Illustrative_Language_for_an_RFP.pdf.

Crafting Curricular Aims for Instructionally Supportive Assessment, with Examples of Appropriate Curricular Aims. Popham, W. J., Farr, R., & Lindquist, M. (2003). Available from the National Center for Educational Outcomes at
http://cehd.umn.edu/NCEO/Presentations/CraftingCurricula.pdf.

Materials for the January 27, 2003 NCEO teleconference: Part Two of "Building Tests to Support Instruction and Accountability for All Students." Popham, W. J., Thurlow, M.L., & Marion, S., Presenters. (2003). Available from the National Center on Educational Outcomes at http://cehd.umn.edu/NCEO/Presentations/tele5.htm.

Universal Design Applied to Large-Scale Assessments (Synthesis Report 44). Thompson, S.J., Johnstone, C. J., & Thurlow, M. L. (2002). Minneapolis, MN: University of Minnesota, available from the National Center on Educational Outcomes at http://cehd.umn.edu/NCEO/OnlinePubs/Synthesis44.html.

Universally Designed Assessments: Better Tests for Everyone! (Policy Directions 14). Thompson, S., & Thurlow, M. (2002). Minneapolis, MN: University of Minnesota, available from the National Center on Educational Outcomes at http://cehd.umn.edu/NCEO/OnlinePubs/Policy14.htm.

Top of page