Accessible Reading Assessments for Students with Disabilities: The Role of Cognitive, Grammatical, Lexical, and Textual/Visual Features

Jamal Abedi
CRESST/University of California, Davis

Seth Leon
CRESST/University of California, Los Angeles

Jenny Kao
CRESST/University of California, Los Angeles

Robert Bayley
University of California, Davis

Nancy Ewers
University of California, Davis

Joan Herman
CRESST/University of California, Los Angeles

Kimberly Mundhenk
University of California, Davis

November 2010

Abedi, J., Leon, S., Kao, J., Bayley, R., Ewers, N., Herman, J., & Mundhenk, K. (2010). Accessible reading assessments for students with disabilities: The role of cognitive, grammatical, lexical, and textual/visual features. Minneapolis, MN: University of Minnesota, Partnership for Accessible Reading Assessment.

For the full report, please refer to the PDF formatted version.

Executive Summary

The purpose of this study is to examine the characteristics of reading test items that may differentially impede the performance of students with disabilities. By examining the relationship between select item features and performance, the study seeks to inform strategies for increasing the accessibility of reading assessments for individuals from this group. Including students with disabilities in large-scale, statewide assessment and accountability systems, as mandated by the Individuals with Disabilities Education Act (IDEA, 2004) and the “No Child Left Behind” (NCLB) Act of 2001 (NCLB, 2002), can help identify issues and guide instruction to improve education for these students.

Research on reading complexities for students has primarily focused on the role of vocabulary and sentence length, and has also touched upon issues of legibility such as format, typeface, and visuals. Although research reveals that readability measures are widely used and beneficial for matching students’ reading levels with appropriate text, they do not identify the precise grammatical and cognitive components within sentences, paragraphs, or passages that may contribute to complexity for students with disabilities. While current research does address the critical need to accurately assess the reading performance of students with disabilities, a void in operationalizing reading complexity exists.

With the selected features in the present study (cognitive, grammatical, lexical, and textual/visual), we are building on previous research by exploring the role that these features may play on reading test items that may cause these items to function differentially for students with disabilities. Thus, the following research questions guided the analyses and reporting of this study:

How and to what extent does the cognitive complexity of reading assessments (item type, depth of knowledge, and scope) impact the performance of students with disabilities?
How and to what extent do textual/visual features of reading assessments (number of pages, words per page, typeface changes, point size changes, font changes, and unnecessary visuals) impact the performance of students with disabilities?
How and to what extent do lexical density (average lexical density and number of uncommon words) and lexical features (number of words greater than seven letters, number of relevant paragraphs, and number of words in items and relevant paragraphs) of reading assessments impact the performance of students with disabilities?
How and to what extent do grammatical features of reading assessments (subordinate clauses, complex verbs, passive voice, relative clauses, number of entities used as subjects, and noun phrases) impact the performance of students with disabilities?
Among the five major categories of complexity features in an assessment, which category or categories most discriminate in terms of reading performance between students with disabilities and their peers with no disabilities?

Method

To investigate our research questions, we evaluated current English language arts standardized assessments from three states to determine their cognitive, lexical, grammatical, and textual/visual complexity using differential item functioning (DIF) and discriminant analysis.

Population and Sample

The population for this study is students in grade 8 in three states. Because the states were not selected randomly, the level of generalizability of this sample to the population is limited and the results should be interpreted with caution when generalizing to the entire grade 8 student population.

Assessments

The present study analyzed a total of nine assessment forms from grade 8 reading assessments in three states, for a total of 490 reading test items. Active reading assessments and student data from three states were obtained with permission. The states are referred to as State A, B, and C to preserve anonymity. The State A assessments (from 2006 and 2008) consisted mostly of multiple-choice items, with a few extended-response items (usually only one per passage). These extended response items were not included in the present study because student data were not available for extended-response items. The State A assessments also contained field test items that were excluded from the analyses because student data were not available for these items. The State B assessments (from 2006, with four forms) consisted of multiple-choice items only. The State C assessments (from 2006, 2007, and 2008) were reading and writing assessments combined, which meant that some sections of the assessment consisted of a mixture of reading and writing items, while other sections solely assessed reading. Additionally, passages consisted of a blend of multiple-choice and extended-response items. Items that strictly measured writing standards were excluded from the present study; however, some items that have overlapping standards were retained.

Rating Guidelines Development

The development of a rubric used to evaluate test items and reading passages began with discussions about features that could interfere with the ability of students with disabilities to access content in reading assessments. A review of the literature and consultation with experts in the field resulted in our selection of five features that we used to capture accessibility for the purposes of this study:

Cognitive Features
Grammatical Features
Lexical A Features
Lexical Density B Features
Textual/Visual Features

The National Center on Educational Outcomes (NCEO) at the University of Minnesota provided “Considerations for Universally Designed Assessment Items” (Thompson, Johnstone, Anderson, & Miller, 2005). Based on literature reviewed in this report, and following consultations with experts in linguistics, we arrived at six grammatical features:

passive verbs
complex verbs (other than passive verbs)
relative clauses
subordinate clauses (other than relative clauses)
complex noun phrases
entities as subjects

Grammatical features were reduced to six to capture grammar usage efficiently. We also included a lexical features count, and other ways to rate clear format and clear visuals. The lexical features included counting the total number of words and the total number of unique words in order to compute lexical density. To capture difficult vocabulary, words consisting of seven letters or more were also counted (as adapted from Shaftel et al., 2006). Additionally, we used a corpus of common words (Bauman & Culligan, 1995) to count uncommon words and words of seven or more letters (lexical A features). Two categories, lexical A and lexical B, were created to distinguish features that are more or less likely to impact the construct being measured. Based on expert opinion and consensus of this research team, it was decided that changes to lexical A features may have less serious impact on the construct tested than lexical density B features.

Rating Process

Cognitive and grammatical categories were rated by external raters. Thirteen raters were assigned to one of two groups. The Grammar Group (7 raters) was responsible for rating the grammatical features. The Cognitive Group (6 raters) was responsible for rating the cognitive and textual/visual features. Raters from the applied linguistics department or with backgrounds teaching English as a Second Language were assigned to the Grammar Group, while all other raters were assigned to the Cognitive Group. A one-day training session was planned. Raters were presented with an overview of the features and instructions on the rubric, and then given released assessment items for practice. Items were first rated as a group, then discussed, then individually rated, and then discussed.

All passages, paragraphs, visuals, and items were assigned unique ID numbers to facilitate data entry and analyses. All paragraphs and visuals were numbered so that raters could list the relevant paragraphs and visuals that were necessary to answer an item. After numbering all passages 1 to 71, a random number generator was used to randomly distribute passages across the 6 raters for each group.

Results

To answer our five research questions, the reading assessments from three states were rated on 21 accessibility features in five general categories: (1) cognitive complexity, (2) textual/visual complexity, (3) lexical A complexity, (4) lexical density B complexity, and (5) grammatical complexity. The cognitive complexity category included measures of passage and item types, depth of knowledge, and scope. The textual/visual complexity category included column count, number of pages, words per page, number of typeface changes, number of point size changes, number of font changes, and number of unnecessary visuals. The lexical A complexity category included a count of the number of words greater than seven letters in items and paragraphs, the number of relevant paragraphs, and the number of words in items and relevant paragraphs. The lexical density B complexity category included the average lexical density (total unique words per page/total words per page), and the number of uncommon words in items and relevant paragraphs. The grammatical complexity category included counts of the number of subordinate clauses, complex verbs, passive voice verbs, relative clauses, entities, and noun phrases.

Two different approaches were employed for analyzing the data: (1) a Multiple Discriminant (MD) approach and (2) a Differential Item Functioning (DIF) approach. In the MD approach, we examined the impact of the accessibility features between students with and those without disabilities across the entire test; and in the DIF approach, Differential Bundle Functioning (DBF) and Differential Test Functioning (DTF) approaches were applied to see the impact of the accessibility features on the entire test as well as on the individual test items or a group of test items (bundle of items) that share specific accessibility features.

Differential Level of Impact of Accessibility Features on Reading Assessments for Students with Disabilities: Results from a Multiple Discriminant Analyses

A multiple discriminant function provides a direct approach in comparing the impact of the 21 accessibility features on the performance of students with disabilities (SWDs) and non-SWDs. Data from the three states were used for this study. A data file was created in which a student’s incorrect response (0 score) in each test item was replaced with ratings from each of the corresponding features. Therefore, the total score of a particular feature for each student was the incorrect responses (0) plus the rating of the feature for each individual item. As a result, 21 scores were created, with one for each accessibility feature. For example, using feature #2, item type, if a student responded to test item 1 incorrectly and item 1 had an item type rating of 4, then the student’s incorrect score on item 1 would be 4. A similar procedure was used for creating other feature scores, thus the units of analysis in this study were individual students, not test items.

Results of the discriminant analyses suggest that: (1) some of the accessibility features, such as textual features, have more impact on reading than other features, and (2) some of these features have more differentiating powers between students with disabilities than others.

Results from Differential Item Functioning Using the Non-Compensatory Differential Item Functioning (NCDIF) Index and Logistic Regression

Results from discriminant analyses indicated that some of the accessibility features have more impact on student outcomes, particularly for students with disabilities. These results can be interpreted at the total test level. However, we also wanted to know whether some of the test items were more impacted because of these features than other items. We therefore conducted a series of Differential Item Functioning (DIF) and Differential Test Functioning (DTF) analyses. A multiple regression approach was then applied in order to examine the relationship between each of the complexity features and the signed uniform DIF () findings. In our first set of analyses we examined the relationships of each individual feature to signed uniform DIF findings. Next we constructed a more comprehensive model that included measures from each of our complexity categories. These analyses were conducted at the item level across all nine reading assessments. The data were split into three strata representing items with low percentage range above guessing (PRAG) (0-11), moderate PRAG (12-29) and high PRAG (30 or above). As anticipated there was a strong correlation between item PRAG and the signed uniform DIF results (r = -0.762). The majority of items indicating DIF against SWDs had PRAG values over 30.

Of the 21 features modeled, 15 made significant contributions in the high PRAG items while only one feature had a significant r-square change within the both the low and moderate PRAG items. Each of the 15 significant features in the high PRAG items had model coefficients in the expected direction. The strongest individual cognitive feature was depth of knowledge. Among the grammatical features, complex verbs and subordinate clauses made the largest contributions. Lexical density at the passage level and words greater in length than seven letters that were present in items and their relevant paragraphs were each also strongly related to the DIF findings. Finally, a number of passage level textual/visual features were also significantly related to the DIF findings. Among those the strongest features were point size and font changes along with the number of unnecessary visuals.

We used a multivariate approach to examine whether unique contributions to DIF were present across the five complexity categories and multiple features. Latent variables were created: GRAMMAR (a combination of complex verbs and subordinate clauses), LEXICAL (lexical density at the passage level and words greater in length than seven letters that were present in items and their relevant paragraphs), TEXTVIS (unnecessary visuals, point size changes, and font changes), lexical_item (lexical density at the item stem level), the individual predictor depth of knowledge, and the individual predictor scope.

Among the six complexity variables examined, five had negative coefficients indicating increased DIF against SWD with higher values of the features. Therefore as the values of GRAMMAR, depth of knowledge, TEXTVIS, and lexical density (LEXICAL and lexical_item) increase, an item in the High PRAG category is more likely to exhibit DIF against SWDs. The scope variable has a positive coefficient; a result that is not consistent with the rest of our findings. The TEXTVIS and scope measures were the two features that made the largest contribution toward explaining the variation in the DIF outcome.

Discussion

According to the literature cited in this report, students with disabilities perform substantially lower on standardized tests than students with no identified disabilities in both state (Abedi, Leon, & Mirocha, 2003; Altman, Thurlow, & Vang, 2009; Ysseldyke et al., 1998) and national assessments (Lee, Grigg, & Donahue, 2007). While part of this low performance may be explained by a student’s specific disability or a student’s lack of access to the general education curriculum, a major part of it may be attributed to the limitations of existing state assessments in addressing the needs of these students. That is, a part of the performance difference between students with disabilities and their peers without disabilities may be explained by accessibility issues. Current state assessments may not be sensitive enough to the needs and backgrounds of students with disabilities.

Based on the review of existing literature and consultations with experts in the field, we identified 21 accessibility features that could have major impact on the assessment outcomes of students with disabilities. These 21 features were grouped into the following five major categories: (1) cognitive features, (2) lexical A features, (3) lexical density B features (4) textual and visual features, and (5) grammatical features. The grouping of the 21 features into five main categories seems to be conceptually and analytically sound. Experts confirmed these categorizations, and results of factor analyses of the features within each category yielded strong evidence of internal consistency of the features within the five categories.

Two different analytical approaches were used in this study, a differential item functioning (DIF) approach and a discriminant analysis approach. In the DIF approach, using DTF and differential bundle functioning (DBF) methods, sets of test items representing a particular accessibility feature were compared across groups formed by students’ disability status. Groups of accessibility features that behaved differentially across the two groups were identified and the level of impact on student reading performance was examined by their disability status in a multiple regression model. In the discriminant analyses model, the latent scores of five overall accessibility features were used as discriminating variables to identify the features that mostly discriminate the two groups.

Results of the two analyses consistently suggested that: (1) some of the accessibility features had more impact on reading than other features, and (2) some of these features had more differentiating powers than others between students with disabilities than students without disabilities.

Identifying features with the highest level of impact on the performance of SWDs has major implications for the assessment of these students, particularly when the features could be easily altered without changing the construct to be measured. There are many factors that affect student performance on assessments, and some of these are essential components of the measures, such as the content and construct being measured. These factors cannot be altered because such changes might alter the construct being measured. However, some of these factors, such as textual/visual features, are incidental to the assessment and can be altered without having a major impact on the outcome of measurement. Another category, lexical A features, may provide an opportunity to reduce complexity without changing the construct being tested. For example, students with disabilities may find crowded test pages difficult and may experience fatigue and frustration when answering items in this format. Changing the test to include better readability for students does nothing to alter the construct, yet may significantly increase the performance of students with disabilities on such assessment items.

In summary, the results of this study can help the assessment community in two ways. First, by elaborating on some test accessibility features, this report may serve as a guideline for those who are involved in test development and the instruction and assessment of students with disabilities. Second, and more importantly, this report provides methodology for examining other features that may have a major impact on assessment outcomes for students with disabilities.

For the full report, please refer to the PDF formatted version.

Top of Page