NCEO Report 414

A Review of the Literature on Computerized Speech-to-Text Accommodations

Kristin K. Liu, Martha L. Thurlow, Anastasia M. Press, and Michael J. Dosedel

August 2019

All rights reserved. Any or all portions of this document may be reproduced and distributed without prior permission, provided the source is cited as:

Liu, K. K., Thurlow, M. L., Press, A. M., & Dosedel, M. J. (2019). A review of the literature on computerized speech-to-text accommodations (NCEO Report 414). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.

Table of Contents


Executive Summary

As computerized speech-to-text (STT) technology has become more advanced over the past several years, more students with, and without, disabilities are using STT tools in the classroom and while taking assessments (Warren, Thurlow, Lazarus, & Strunk, 2018). Speech-to-text tools often are installed on school-provided computers or tablets and thus may be widely available to students for instructional use.

This literature review describes what research conducted between 2008 and 2018 tells the field about the use of STT tools by K-12 and post-secondary students with disabilities. First, it highlights what the available literature tells us about the characteristics of students who used STT for instruction and assessment, and the methodologies and outcomes variables associated with those studies. Second, it describes the implementation of STT tools (e.g., training in use of STT, student attitudes toward the tools, and comparison of different types of tools). Third, it describes the effect of the technology on academic outcomes for students with different types of disabilities.

To locate articles we searched the National Center on Educational Outcomes (NCEO) online Accommodations Bibliography (see https://nceo.info/Resources/bibliographies), as well as several other online databases. We then conducted a hand search of relevant journals. After reviewing article abstracts and applying a set of inclusion criteria, we determined that five articles met the criteria. None of the five articles examined use of STT in assessment settings, nor did any involve post-secondary students with disabilities. Thus, all studies examined use of STT in instructional settings by K-12 students. Finally, we coded the five articles for basic study information. All of the studies used a single-subject research design (e.g., alternating treatments, repeated acquisition, repeated measures), so we separated articles based on whether they measured learning outcomes related to writing or math. Four articles examined students’ writing outcomes, and one article examined students’ math outcomes. Studies addressed two types of STT technology: the use of Dragon NaturallySpeaking (DNS) software for writing instruction and the voice input, speech output (VISO) calculator for math instruction.

To aid our review, we summarized each study’s inclusion of certain defining features of single-subject research discussed in Horner, Carr, Halle, McGee, Odom, and Wolery (2005). Defining features that we selected from that article included the incorporation of dependent variables, independent variables, validity measures, reliability measures, description of the participants, and description of the study setting. We selected these features because they can be objectively measured, whereas other features would have relied on our subjective interpretations of the studies.

All five studies included the following features:

Only some of the studies included these features:

Overall study findings, although not consistent across studies, included increased student independence when using STT tools, generally favorable student impressions of the tools, opportunities for immediate feedback on performance, and the low cost of the STT tools compared to other accommodations. Results specifically related to the quantity and quality of writing when compared with writing via other methods (e.g., keyboard, handwriting, or voice recorder) were mixed. Some of the study findings showed STT use increased length and complexity of writing, but also decreased accuracy. However, these findings are preliminary at best.

Studies identified limitations related to a lack of student experience or training with STT technology, small study samples, and selection of dependent variables. A major challenge discussed in these studies was the potential for inaccurate voice recognition by the technology, which sometimes led to decreases in students’ writing fluency and accuracy, as well as to increases in student frustration.

Research reviewed typically did not provide in-depth descriptions or comparisons of STT tools to allow for a complete understanding of the tool’s features or comparison between tools. In addition, variation in the DNS software versions used across studies added to the complexity of fully understanding the STT tools used. Further, the literature did not provide information on whether the use of STT tools changed the writing or math constructs that were taught.

A primary recommendation of this literature review is to conduct more research related to STT technology as an accommodation in both classroom and assessment settings. Given that it is an accommodation that is already in use, educators need research on post-secondary use of STT, and on STT in assessment settings to make informed decisions about the use of STT in the classroom. Until there is additional research, it is incumbent upon practitioners to use their best judgment when considering STT for students. Based on our findings, we recommend that students receive training in the use of their specific technology and have ample opportunities to use it in classroom settings before considering use in a testing setting. Selection of STT technology should be an individual decision based on evaluation of student needs, technology features, supports, social validity, cost, and capacity to integrate with other technology.

At the state level, support can be provided to local districts and educational teams by maintaining information on technology that is compatible with large-scale assessments, and by sharing specific procedures for demonstrating a student’s need for STT. Further, policies and practices should be flexible enough to adjust to ever-changing technology. States may also consider practices such as: (a) describing the types and features of STT technology permitted as accommodations (and that work with testing technology), instead of listing specific STT devices or tools; (b) offering training and support materials for districts and teachers; and (c) tracking the use of STT and its effects to inform continuing policy.


Overview

As computerized speech-to-text (STT) technology has become more advanced over the past several years, more students with disabilities are using STT tools in the classroom and while taking assessments (Warren, Thurlow, Lazarus, & Strunk, 2018). Speech-to-text technology differs from other methods of transferring spoken words to text, such as scribing, in that no additional person is needed for a student’s words to be typed on a screen. Rather, students simply speak into a device with speech recognition capabilities, and an STT software program converts their speech into typed words on a computer monitor or device screen. Examples of STT software or devices include: Dragon NaturallySpeaking (DNS), IBM ViaVoice, and Windows Speech Recognition (WSR) in the Microsoft Operating System (Shandiev, Hwang, Chen, & Huang, 2014).

Speech-to-text tools often are installed on school-provided computers or tablets and thus may be widely available to students for instructional use regardless of whether the student has a disability. These types of programs and devices are capable of supporting a wide range of student needs in the classroom. For example, a student who has difficulty spelling can dictate texts that an STT program can transcribe. The technology may hold particular benefits for students with disabilities. For example, students with physical disabilities who have difficulty typing on a keyboard or moving a mouse can control a computer using their voice (Gardner, 2008; Garrett, Heller, Fowler, Alberto, Fredrick, & O’Rourke, 2011; Lee, 2011). Students who are blind or who have low vision may use STT to give commands to a computer, do math calculations, or create written texts more quickly and independently than with other methods (National Center for Technology Innovation, 2010). Speech-to-text may also help students with disabilities who have memory issues to overcome cognitive barriers posed by writing tasks. For example, the use of STT devices for writing tasks may reduce the amount of working memory a task requires by limiting the amount of mental effort spent transcribing thoughts as text (Lee, 2011). Use of STT to transcribe text may enable students to spend more time generating ideas or attending to issues of word choice and grammatical construction (Noakes, 2017).

During a national forum on speech-to-text and scribing, state education agency staff discussed issues related to students’ use of STT technology, including the importance of students with disabilities using the same STT software on assessments that they use in daily instruction (Warren et al., 2018). The majority of states (n=44) allow the use of STT technology as an acceptable accommodation for students with disabilities taking large-scale reading, writing, and math assessments (Lazarus & Strunk, 2018). State and consortia assessment policies often recommend that educators do not provide an assessment accommodation unless the student has also used the same accommodation in the classroom (e.g., Minnesota Department of Education, 2017, p. 77; Smarter Balanced Assessment Consortium, 2018, p. 23; Partnership for Assessment of Readiness for College and Careers [PARCC], 2017, p. 9). Introducing a student to an unfamiliar accommodation during an assessment can hinder the student’s performance because he or she is still learning how to use the accommodation while taking the test (Minnesota Department of Education, 2017).

Although there is no standard STT dictation or calculation software allowed by all states for all assessments, DNS is specifically designated for use as a writing test accommodation for students with disabilities taking either the Smarter Balanced tests or PARCC tests. Dragon NaturallySpeaking allows students with disabilities to transcribe their speech into text, as well as to format their writing. To use DNS, students speak into a microphone, and the software program transcribes the spoken words onto the screen, automatically adding punctuation, capitalizing the first word of each sentence, and capitalizing proper nouns (Garrett et al., 2011). The software also listens for verbal commands from students, which are used to format and interact with a document. For example, students can change the case of a title by stating, “Capitalize the next five words.” Other formatting capabilities of DNS include: (a) selecting text; (b) bolding, italicizing, striking out, and underlining text; (c) changing font size, style, and color; (d) changing line spacing; (e) changing text alignment; (f) moving text; (g) creating and deleting bullet points or list numbers; (h) adding footnotes, headers, and page numbers; and (i) setting margins. Dragon NaturallySpeaking also allows students to use commands to check spelling and grammar, search for and replace words, save documents, print documents, and move the text cursor to different lines or words (Nuance Communications, 2014).

States that allow the use of STT during assessments often require that the student have previous experience with the exact software that he or she will use on the test, and that the student knows how to correct the generated text. For example, Smarter Balanced indicates:

Speech-to-text software requires that the student go back through all generated text to correct errors in transcription, including use of writing conventions; thus, prior experience with this accommodation is essential. If students use their own assistive technology devices, all assessment content should be deleted from these devices after the test for security purposes. For many of these students, using voice recognition software is the only way to demonstrate their composition skills. Still, use of speech-to-text does require that students know writing conventions and that they have the review and editing skills required of students who enter text via the computer keyboard. It is important that students who use speech-to-text also be able to develop planning notes via speech-to-text, and to view what they produce while composing via speech-to-text. (Smarter Balanced Assessment Consortium, 2018, pp. 23-24)

We undertook this literature review on STT accommodations to meet three primary purposes. First, we wanted to identify the students, methodologies, and outcomes variables that have been used to study computerized STT accommodations over the past decade for K-12 and post-secondary students with disabilities during instruction and assessment. By reviewing these studies, we sought to highlight what the available literature tells us about the student, setting, and research study variables that are important in research on STT accommodations. Second, we intended to describe the implementation of STT tools including how students were trained to use STT accommodations, students’ attitudes toward the use of STT tools, and how STT tools compare to other tools. Finally, we wanted to describe the effect of the technology on outcomes in either writing or mathematics for students with disabilities.


Methods

The literature search involved three steps. First, we searched for research articles, reports, and dissertations that were published between January 2008 and June 2018 in the Accommodations Bibliography on the website of the National Center on Educational Outcomes (NCEO). This database is a compilation of peer-reviewed articles, dissertations, white papers, and reports written about accommodations (see https://nceo.info/Resources/bibliographies). We searched the database using the advanced search tool with the keywords: dictated response, speech to text, speech recognition, voice recognition, scribing, scribe, dictation, and signed responses separately. In total, there were 72 resources identified from NCEO’s Accommodations Bibliography.

Second, we searched several online databases to identify additional articles and dissertations published between January 2008 and June 2018: Education Source, ERIC, PsycINFO, and Linguistics and Language Behavior Abstracts (LLBA). For this second set of searches we used combinations of the following search terms: speech perception software, speech recognition, speech to text, voice recognition, assessment, special education, students with disabilities, post-secondary, and university. We chose search terms based on each database’s thesaurus of terms, and also examined the key terms listed in relevant articles. The database searches yielded 192 results, with some articles and dissertations showing up in more than one set of search terms.

Third, we verified the comprehensiveness of the two online searches by conducting a hand search of eight relevant journals: (a) Exceptional Children; (b) The Journal of Special Education; (c) Journal of Applied Testing Technology (JATT); (d) Journal of Research on Technology in Education; (e) Journal of Education for Students Placed at Risk (JESPAR); (f) Journal of Special Education Technology; (g) The Journal of Educational Research; and (h) Assistive Technology: The Official Journal of RESNA (Rehabilitation Engineering and Assistive Technology Society of North America). These journals either focused on students from special populations (e.g., students with disabilities) or on assistive technology (e.g., computerized STT). None of the hand searches of relevant journals yielded additional articles.

Screening and Coding

We reviewed and evaluated article abstracts according to the following inclusion criteria: (a) published in the U.S.; (b) addressed the use of a computerized (non-human) STT accommodation; (c) described STT technology use to convert student, rather than educator or peer, speech into text; (d) included empirical data from K-12 or post-secondary student populations; and (e) examined the effect of STT use and non-use on some type of assessment or learning outcomes.

Five articles met the inclusion criteria. None of these examined use of STT in assessment settings, nor did any involve post-secondary students with disabilities. Thus, all studies examined use of STT in instructional settings by K-12 students. We coded the five articles for basic study information such as author, title, year published, study design, population demographics, and research questions (see Appendix A for details). All of the studies used a single-subject research design, so we separated articles based on whether they measured learning outcomes related to writing or math (see Table 1). Four articles examined writing learning outcomes, and one article examined math learning outcomes. As shown in Table 1, all of these studies used a single subject approach.

Table 1. Focus and Content Areas in STT Studies

Study Content Area Research Approach
Bouck, Flanagan, Joshi, Sheikh, and Schleppenbach (2011) Math Single Subject
Garrett, Heller, Fowler, Alberto, Fredrick, and O’Rourke. (2011) Writing Single Subject
Lee (2011) Writing Single Subject
McCollum, Nation, and Gunn (2014) Writing Single Subject
Noakes (2017) Writing Single Subject

Methodologies, Participants, and Outcomes Variables of Reviewed Studies

This section provides an overview of the methodologies we used to review the single-subject STT studies. We highlight how STT is used, as well as what the literature tells us about the student, setting, and research study variables that are included. We group studies by analysis method, then discuss details of each study’s methodology and participants. Last, we address the operationalization of the outcomes variables used in the studies.

To aid our review, we focused on certain defining features of single-subject research discussed in Horner, Carr, Halle, McGee, Odom, and Wolery (2005). Defining features that we selected from that article included dependent variables, independent variables, validity, reliability, participants, and setting. We selected these features because they can be objectively measured, whereas other features would have relied on our subjective interpretations of the studies. Appendix A provides a complete listing of each study’s features: content area; dependent variables; type of STT technology used; sample size; analysis method used; and participants’ ages, grades, and disability categories.

Study Designs and Features

Each of the five studies included in this review (Bouck et al., 2011; Garrett et al., 2011; Lee, 2011; McCollum et al., 2014; Noakes, 2017) used a single-subject research design. Horner et al. (2005) suggested that features of single-subject research contributing to the validity and reliability of the studies include: inter-observer agreement calculations, descriptions of study implementation, controls for common threats to internal validity, social validity of the intervention as perceived by STT users, demonstrations of experimental effects at different times, and replication of experimental effects. Horner et al. (2005) also suggested that single-subject studies should include a baseline phase with five or more data points. As shown in Table 2, not all studies in this review contain all of these features.

Table 2. Validity and Reliability of STT Studies Reviewed

Study Characteristics Number of Studies
Included a baseline phase 5
Replicated experimental effects across participants, settings, or materials
to establish external validity
5
Included at least three demonstrations of experimental effect at three
different times
4
Collected data on reliability or inter-observer agreement 4
*Described study implementation management 3
Controlled for common threats to internal validity 3
*Mentioned social validity 3

* We considered a feature to be present if the authors overtly discussed it.

As shown in the table, the studies were most likely to include a baseline phase and to replicate experimental effects across participants, settings, and materials. Least often included in the studies (although still provided by 60%) were descriptions of study implementation management, controlling for common threats to internal validity, and mentioning social validity.

All five of the studies included a baseline phase. The number of baseline measurements ranged from one (McCollum et al., 2014) to 20 (Garrett et al., 2011). However, none of the studies stated the amount of time between measurements. In addition, all studies replicated the experimental effects across participants to establish external validity. For example, all five participants in Garrett et al.’s (2011) study demonstrated higher writing fluency when using speech-to-text tools (i.e., the experimental effect) compared to using a word processing software. The reviewed single subject studies were less likely to address other aspects associated with validity and reliability. Four of the five studies reported calculations on inter-observer agreement for each dependent variable (Bouck et al., 2011; Garrett et al., 2011; Lee, 2011; Noakes, 2017). Inter-observer agreement ranged from 85% (Lee, 2011) to 100% (Bouck et al., 2011; Garrett et al., 2011; Noakes, 2017) across studies and across dependent variables. All of these studies described calculation methods for inter-rater agreement. Three of the five studies described how researchers managed study implementation using checklists (Garrett et al., 2011; Lee, 2011; Noakes, 2017). Checklists included procedural details for intervention steps (Garrett et al., 2011; Noakes, 2017), equipment set-up (Lee, 2011; Noakes, 2017), timing of intervention (Lee, 2011; Noakes, 2017), and directions or instructions given to participants (Lee, 2011; Noakes, 2017).

Finally, three of the studies (Bouck et al., 2011; Garrett et al., 2011; Lee, 2011) addressed the social validity of STT in some way. Each of these studies asked participating students about their perceptions of the STT software they were using, as well as their perceptions of any other methods they were using to complete study tasks. Student perceptions are discussed in more detail in a later section. In addition to collecting data on students’ perceptions, Lee (2011) collected data on parents’ and teachers’ perceptions of STT use.

Participants and Settings

When describing the participants and settings in a single-subject research study, Horner et al. (2005) suggested including information about the participants’ relevant identities (e.g., age, grade level, gender, and disability category), the participant selection criteria, and the setting of the testing area. As shown in Table 3, all studies included information about participants’ disability categories, but the studies varied in whether they included the other types of information. Only three studies provided information on the participants’ ages.

Table 3. Types of Information about Participants and Settings

Study Characteristics Number of Studies
Disability 5
Setting of the testing area 4
Participant selection criteria 4
Grade of participants 4
Gender 4
Age 3

Four of the five studies provided information about the specific setting of the testing area (Bouck et al., 2011; Garrett et al., 2011; Lee, 2011; Noakes, 2017). Settings described included the geographic location of the school (Bouck et al., 2011; Garrett et al., 2011; Lee, 2011), the time of day when sessions were held (Bouck et al., 2011; Garrett et al., 2011), the location of the room used for the study (Bouck et al., 2011; Garrett et al., 2011; Noakes, 2017), conditions in the room (e.g., quiet; Bouck et al., 2011; Lee, 2011; Noakes, 2017), and whether sessions were conducted individually or in groups (Bouck et al., 2011; Garrett et al., 2011; Lee, 2011; Noakes, 2017). Four of the five studies described the participant selection criteria used (Bouck et al., 2011; Garrett et al., 2011; Lee, 2011; McCollum et al., 2014). Selection criteria included:

Four studies (Garrett et al., 2011; Lee, 2011; McCollum et al., 2014; Noakes, 2017) described the grade levels of student participants. Three studies included students in high school, or grades 9-12 (Garrett et al., 2011; Noakes, 2017; McCollum et al., 2014). One study included students in middle school, or grades 6-8 (Noakes, 2017). Three studies included students in elementary school, or grades 2-5 (Lee, 2011; Noakes, 2017; McCollum et al., 2014). Four studies (Bouck et al., 2011; Lee, 2011; McCollum et al., 2014; Noakes, 2017) provided the gender of student participants. Three studies included male and female students (Bouck et al., 2011; Lee, 2011; McCollum et al., 2014), and one study included only male students (Noakes, 2017). Only three studies (Bouck et al., 2011; Garrett et al., 2011; Lee, 2011) provided information on students’ ages. All students in Bouck et al.’s (2011) study were either 18 or 19 years old. Garrett et al.’s (2011) study included one student who was 18, two who were 17, and two who were 15. Last, all of the students in Lee’s (2011) study were 9 years old. Additional detail on study participants is provided in the study methodology section.

Study Designs

Alternating treatments. The single subject studies reviewed most often employed an alternating treatments design. In this design, researchers conducted several sessions during which participants completed a task, alternating between using STT tools and a comparison tool. Students used only one tool per session. Three articles employed an alternating treatments design (Garrett et al., 2011; Lee, 2011; Noakes, 2017) to study the use of STT by students with disabilities in instructional settings. For example, Garrett et al. (2011) compared writing fluency and accuracy when using STT software and a word processing software. Study participants were five high school students, ages 15-18, with physical disabilities (e.g., spina bifida and cerebral palsy). The study included a baseline phase, speech recognition training, and a writing intervention phase during which students participated in a timed writing activity using either STT technology or a word processor. Students used only one method of writing during each session of the writing intervention phase. The methods of writing were counterbalanced and randomized across sessions. The number of total sessions for each student ranged from 10 to 20, depending on how long it took for “a clear fractionation of the data” to emerge (Garrett et al., 2011, p. 31). Finally, students participated in a two-session replication phase using only STT software. Information on the amount of time between the intervention and replication phases was not provided.

Lee (2011) used an alternating treatments design to determine whether four students with unspecified learning disabilities in grade 4 (age 9) would produce narratives with higher fluency and of high quality when they used handwriting, STT software, or a digital voice recorder with no immediate visual feedback. Lee’s study contained a baseline phase in which students could only write by hand, a phase for students to learn how to use STT software, and an alternating treatment phase lasting an average of 15 sessions. Handwriting, STT software, and digital voice recorder conditions were randomly alternated between sessions. The number of total sessions for each student ranged from 23 to 24.

Finally, Noakes (2017) used the alternating treatments study design to compare the writing fluency and accuracy of three students with traumatic brain injury who had fine motor skills deficits. The study compared STT technology to student handwriting. Subjects were in grades 4, 8, and 9, and were 9, 14, and 15 years old. Students participated in an STT training session, followed by five sessions of narrative writing activities where they alternated use of STT technology and handwriting. The STT and handwriting conditions were counterbalanced to eliminate sequencing effects.

Repeated acquisition. One single-subject study (Bouck et al., 2011) incorporated a repeated acquisition study design. This design involves participants completing a series of researcher-developed tasks that all have the same difficulty and format, using two different interventions that are presented immediately after each other. Bouck et al. (2011) examined the use of STT in math for three high school students. They calculated the average number of attempts and the time it took students to solve the math problems with and without the VISO calculator. They developed sets of math problems with the same difficulty and format for students to solve using the VISO calculator and using the students’ typical means of math calculation. Bouck et al. focused on students’ solution processes for math problems rather than correctness to determine the usability of the VISO calculator. The study included an STT training session, followed by 10 sessions during which students used both interventions, the order of which was randomized by researchers.

Repeated measures. One single-subject study (McCollum et al., 2014) used a repeated measures design to address whether STT technology improved writing samples and reduced the cognitive demands of writing for students with disabilities. With this design, researchers collect data from participants before the intervention and during the intervention (i.e., pretest-posttest). McCollum et al. collected the first set of data while students completed a writing activity using handwriting. They collected the second set of data while students completed a writing activity using a speech-to-text tool. This study included three students of varying grade levels and disabilities: a 2nd grade student with emotional disturbance; a 3rd grade student with specific learning disabilities in the areas of reading and written language; and an 11th grade student with an intellectual disability. The researchers incorporated a software training phase and an analysis of the students’ writing samples that had been written by hand (pretest) and by using the STT software (posttest).

Variables Measured

According to Horner et al. (2005), single-subject research studies control the independent variable and include dependent variables that are quantifiable, measured repeatedly over time, and operationalized in a replicable way. As shown in Table 4, all five studies in this review had a quantifiable dependent variable or variables, measured the dependent variable repeatedly over time, and had an independent variable under control of the researcher. Studies were slightly less consistent in describing how researchers measured dependent variables. Of those five studies, the four that focused on writing (Garrett et al., 2011; Lee, 2011; McCollum et al., 2014; Noakes, 2017) described how the dependent variable was measured. The four writing studies included a variety of dependent variables, including fluency, accuracy, and complexity. The fifth study (Bouck et al., 2011), a study of mathematics, calculated the average number of attempts and the time that it took students to solve problems on a research-developed assessment both with and without a VISO calculator. The researchers did not explicitly explain how they defined a problem attempt.

Table 4. Variable Characteristics

Characteristic Number of Studies
Dependent variable is quantifiable 5
Dependent variable is measured repeatedly over time 5
Independent variable is under the control of the researcher 5
Method of dependent variable measurement is described 4

Operationalization of dependent variables for writing. The four studies that measured writing outcomes as the dependent variables, shown in Table 5 (Garrett et al., 2011; Lee, 2011; McCollum et al., 2014; Noakes, 2017), included study outcomes related to text fluency. These four studies used a measure of writing rate—for example, a count of the number of written words in an allotted time period or a count of the total number of words—as an indicator of writing fluency. In addition, McCollum et al. (2014) also incorporated a count of the number of multisyllabic words as an indicator of writing fluency. Lee (2011) also measured the number of T-units, or clauses, as an indicator of fluency.

Table 5. Fluency Variables

Study Operationalization
Garrett et al. (2011) Word count per minute (one word = 5 characters, after incorrect words are removed), length
Lee (2011) Total text produced, text production rate (words per minute), total number of T-units (“a simple or complex sentence equaled one T-unit, and a compound sentence equaled two or more T-units,” p. 51)
McCollum et al. (2014) Total written words, number of multisyllabic words
Noakes (2017) Total written words

In addition to fluency-related dependent variables, the same four studies (Garrett et al., 2011; Lee, 2011; McCollum et al., 2014; Noakes, 2017) also included dependent variables related to writing accuracy (see Table 6). These accuracy dependent variables included word-level accuracy, mechanical accuracy, and multi-word accuracy. Three studies (Garrett et al., 2011; Lee, 2011; Noakes, 2017) measured students’ word-level accuracy, such as word errors or spelling errors. One study (Garrett et al., 2011) measured errors in mechanics, such as capitalization and punctuation. Two studies (McCollum et al., 2014; Noakes, 2017) measured the number of correct writing sequences. Noakes (2017) defined a correct writing sequence as “two adjacent, correctly spelled words which produce mechanical, semantic, and syntactically correct writing sequences” (p. 48).

Table 6. Accuracy Variables

Study Operationalization
Garrett et al. (2011) Percentage of words that were incorrect, type of word errors (using word processor, speech recognition software, punctuation and capitalization errors)
Lee (2011) Surface errors (“misspelled words,… recognition errors in the SR texts, and grammatical errors,” p. 52)
McCollum et al. (2014) Number of correct writing sequences
Noakes (2017) Number of words spelled correctly, number of correct word sequences

Only one of the four writing studies (Lee, 2011) examined writing complexity as a dependent variable (see Table 7). The author did so by determining clause length, based on the assumption that longer clauses equaled more complex text. Lee also counted the number of clauses per T-unit as an indicator of text complexity.

Table 7. Complexity Variables

Study Operationalization
Garrett et al. (2011) None
Lee (2011) Clause length (“total number of words divided by the total number of clauses,” p. 51), number of clauses per T-unit (“total number of clauses divided by the total number of T-units,” p. 52)
McCollum et al. (2014) None
Noakes (2017) None

In addition to fluency, accuracy, and complexity, some authors also measured other types of dependent variables. Lee (2011) examined whether the story students wrote contained a number of key structural elements such as a setting statement, an initiating event, internal responses, an action, a consequence, character dialog, and an ending. Garret et al. (2011) measured students’ ability to recall intended meanings while editing their own writing samples using DNS (Garrett et al., 2011).


Implementation of Speech-to-Text Tools by Students with Disabilities

Research studies during the 2008-2018 time period examined two types of STT technology. First, students in Bouck et al.’s (2011) study, which addressed students’ use of STT for mathematics tasks, used a VISO calculator. Second, the four studies that addressed students’ use of STT for writing tasks (Garrett et al., 2011; Lee, 2011; McCollum et al., 2014; Noakes, 2017) used a version of DNS software. The writing studies often used different versions of DNS software, or did not specify the version. Garrett et al. (2011) used DNS 7 Preferred. Lee (2011) used DNS Premium 11.0. Two studies used an unspecified version of DNS (McCollum et al., 2014; Noakes, 2017).

Training Students How to Use Speech-to-Text Tools

All of the studies we reviewed involved a training phase to help students learn to properly use the STT technology. The content of the trainings included:

Study authors did not conduct training in a standard number of sessions. Bouck et al. (2011) and Noakes (2017) both held one training session, ranging from 15–20 minutes (Noakes, 2017) to 30 minutes per student (Bouck et al., 2011). Lee (2011) held three to four training sessions per student, each of which lasted from 15–25 minutes. McCollum et al. (2014) held five one-hour sessions. Last, Garrett et al. (2011) held multiple sessions but did not specify the number or length of the sessions. In two studies (Garrett et al., 2011; Noakes, 2017) some students required more training sessions than their peers, depending on the difficulty the software had in understanding their speech.

Perceptions of STT Tools Among Students with Disabilities

Three of the five studies we reviewed included data pertaining to the participating students’ perceptions of the STT tools used in the study (Bouck et al., 2011; Garrett et al., 2011; Lee, 2011). Generally, these researchers reported that students felt positively about the use of the STT technology even when they experienced some difficulty using it. In Bouck et al.’s (2011) study, before using the VISO calculators, students reported that the “technology was exciting and they saw potential in it for students with visual impairments” (p. 8). Students also shared that the VISO calculators could be beneficial if graphing tools could be added to them. However, during the study, students experienced some frustrations using the VISO calculator, largely stemming from the software misunderstanding their speech (Bouck et al., 2011, p. 10). Following data collection, these students still reported feeling enthusiastic about the degree of independence the VISO calculator provided. Despite their frustrations, the students did not report any negative perceptions of the tool. Students in Garrett et al.’s (2011) study also reported that they felt positive about the STT tool, indicating that they could write more quickly. They reported that they had little difficulty using the technology, and would enjoy using it in the future. Similarly, Lee (2011) found that student, teacher, and parent responses to the STT technology were generally positive and consistent between pretests and posttests (p. 64).

Speech-to-Text Compared to Other Tools

Most studies compared the effects of STT technology to those of other means of producing text. For example, Garrett et al. (2011) compared the use of STT technology to the use of word processing software. Other means of text production that studies included were handwriting (Lee, 2011; Noakes, 2017; McCollum et al., 2014) and use of a digital voice recorder (Lee, 2011). Findings comparing STT technology to the other methods of writing showed mixed results. On the positive side, some writing studies found that students demonstrated higher writing fluency when using STT technology compared to handwriting and using word processing software. All studies found that students created longer passages while using STT. In addition, some found that students were able to write faster (Garrett et al., 2011; Lee, 2011) and produce more complex text with STT technology (Lee, 2011). In contrast, Lee (2011) found that while students demonstrated higher writing fluency when using STT compared to handwriting, a dictated response accommodation was more conducive to writing fluency than STT. Lee posited that this difference in fluency between the STT and dictated response conditions may be because students in her study were still fairly inexperienced using STT. She believed writing fluency might increase had the students become more familiar with using the STT technology.

STT did not consistently support greater writing accuracy compared to other methods of text production. One study found that students using STT had lower writing accuracy compared to word processing software (Garrett et al., 2011). In contrast, others found improved writing accuracy with STT compared to handwriting (Lee, 2011; McCollum et al., 2014; Noakes, 2017). The only study examining writing complexity variables (Lee, 2011) found that students did not produce complex writing samples when using STT, dictated response, or handwriting. Nevertheless, the researcher stated that the increased writing fluency when using STT could potentially lead to high quality story development.

Bouck et al. (2011), the only researchers to address math, compared students’ calculation input accuracy and speed of calculation using the VISO calculator and their typical means of calculation (e.g., a typical calculator, a talking calculator, use of another person to input numbers into a calculator). The researcher found that initially all students made more input errors when using the VISO calculator compared to other methods of calculation. However, the overall number of errors that students made decreased as the study continued. Similarly, students took more time to complete the assessments while using the VISO calculator during the first few sessions, but as students gained more experience with the VISO calculator, they eventually completed assessments about as quickly as with their typical means of calculation.

Researchers found that STT technology has advantages and disadvantages compared to other methods of writing and doing math calculations. Advantages included increased student independence (Bouck et al., 2011; Lee, 2011), immediate feedback for students (Lee, 2011), cost effectiveness (Noakes, 2017), and easy implementation (Noakes, 2017). Possible disadvantages included variability in the accuracy of voice recognition (Bouck et al., 2011; Garrett et al., 2011), and time required to become proficient in using the tool (Bouck et al., 2011). In general, students had favorable impressions of using STT technology (Bouck et al., 2011; Garrett et al., 2011).


Findings for Students by Disability Category

The studies included students from a range of disability categories including visual impairments (Bouck et al., 2011; Garrett et al., 2011), specific learning disabilities (Lee, 2011; McCollum et al., 2014), physical disabilities (Garrett et al., 2011), traumatic brain injury with fine motor deficits (Noakes, 2017), Asperger’s Syndrome (Garrett et al., 2011), emotional disturbance (McCollum et al., 2014), and intellectual disabilities (McCollum et al., 2014). Some of the studies only included students in one disability category (Bouck et al., 2011; Lee, 2011; Noakes, 2017), while other studies (Garrett et al., 2011; McCollum et al., 2014) included students with a variety of disability categories or with a combination of disability categories.

Students with Visual Impairments

Bouck et al.’s (2011) study included those students with visual impairments. All three of these students took longer to solve mathematics problems while using the VISO calculator than when they used their typical means of calculation. In addition, all students needed more attempts to correctly solve problems while using the VISO calculator. Still, both students’ number of attempts and the time they took on the assessment decreased as they became more familiar with using the VISO calculator. In addition, students reported that the VISO calculator had other benefits, such as allowing them to be more independent.

Students with Specific Learning Disabilities

Two studies included students with learning disabilities (Lee, 2011; McCollum et al., 2014). McCollum et al.’s (2014) study included one student with a specific learning disability in the areas of reading and written language. Lee (2011) did not provide additional details about the four participants’ learning disabilities. Students in Lee’s (2011) study produced longer texts that were more complex and had higher numbers of T-units and lower percentages of surface errors when using STT compared to handwriting. They also were able to produce text at a faster rate using STT. However, when learning how to use the STT tool, students became distracted by speech recognition errors and tended to immediately pause their writing to correct those errors produced. McCollum et al. (2014) also found that the use of STT supported production of longer writing samples. When using STT, the student in their study produced 113 words per writing sample compared to 18 words per writing sample when using handwriting. The student also showed a large increase in the number of multisyllabic words and correct writing sequences in writing samples.

Students with Physical Disabilities

All of the participants in Garrett et al.’s (2011) study had physical disabilities, including spina bifida, Duchenne muscular dystrophy, cerebral palsy, and spinal muscular atrophy. The authors noted that these students with physical disabilities wrote more quickly and created longer passages using STT. The tradeoff was that writing accuracy and recall of intended meaning were more difficult for students when using STT. The decrease in writing accuracy was attributed to speech recognition errors by the software. Further, the high number of speech recognition errors in the draft made it more difficult for students to recall what they had originally intended to write when it was time to edit their writing.

Students with Traumatic Brain Injury

Noakes’s (2017) study included three students with traumatic brain injury, fine motor skill deficits, and difficulties with writing. The students showed improvement in writing when using the STT tool throughout the course of the study. While using STT, all students produced more total written words overall, more correct writing sequences, and more correctly spelled words than when producing handwritten text.

Students with Emotional Disturbance

One student in McCollum et al.’s (2011) study had an emotional disturbance only. When using STT compared to handwriting, this student was able to produce longer passages of text, slightly more multisyllabic words, and more correct writing sequences.

Students with Intellectual Disabilities

One student in McCollum et al.’s (2011) study had an intellectual disability. When using STT, this student was able to produce longer passages of text, more multisyllabic words, and more correct writing sequences compared to the handwriting condition.

Students with Multiple Disabilities

Garrett et al. (2011) included one student who had a physical disability, a visual impairment, and Asperger’s Syndrome. This student became more proficient using STT technology over the course of the study. She also had greater writing fluency and wrote longer passages when using an STT tool than when using a word processor. However, the student had less difficulty recalling her intended meaning and she wrote with higher accuracy (approximately 3.8% better) when using word processing software than when using the STT tool.


Author-stated Limitations of Included Studies

The authors of studies that included a limitations section (Bouck et al., 2011; Garrett et al., 2011; Lee, 2011; McCollum et al., 2014) often identified limitations related to student experiences with STT technology, sampling, and study design. Two studies (Lee, 2011; Bouck et al., 2011) indicated that students’ lack of familiarity with the STT technology was a limitation. Lee (2011) found that students in her study lacked previous experience with speech recognition software and that this lack of experience may have influenced their success in using the technology. Further, Lee stated that participating students did not have enough time to practice using the speech recognition software during the training phase of the study. Bouck et al. (2011) discussed how students had difficulty using the voice input on the VISO calculator due to the high number of speech recognition errors made by the calculator’s software. The students became frustrated when the software made mistakes, which resulted in an angry tone of voice that made it even more difficult for the software to recognize their speech correctly. The researchers suggested that with more practice, the participants would likely have more success using the voice input, speech output calculator.

Two studies described sampling limitations. Bouck et al. (2011), who examined mathematics outcomes, mentioned that the small sample size of three students was an intentional, but limiting, factor. Lee (2011) mentioned that her study only included native English speakers so her results were not generalizable to a linguistically diverse student population. Finally, Noakes (2017) stated that a study limitation was directly related to the study design. Specifically, Noakes identified a primary limitation to be the lack of indicators of complex writing as dependent variables.


Discussion and Recommendations

The purpose of this report was to review the existing literature on speech-to-text as an accommodation within classroom and testing settings. Due, in part, to the speed of technology changes related to this type of accommodation, literature reviewed was limited to studies published within the past 10 years. Articles were further required to be published in the United States and to use STT to convert student speech to text (not to transcribe teachers or peers). Five articles were identified and included in this review. All studies identified were conducted in an instructional setting. We were not able to identify any studies on STT in a testing setting (large scale or classroom assessment). Additionally, we found no studies that included post-secondary students. The research reviewed on STT for writing included students from elementary to high school with a wide range of disabilities. The small number of studies identified and small study sample sizes (=5 students) made it difficult to identify differences between groups of students, such as students who have a cognitive disability compared to students with specific learning disabilities. Despite this, these studies do include a diverse group of students in terms of both ages and disabilities. Further, the results do not suggest large differences in results based on group membership. The one study addressing the use of a voice input, speech output calculator only included students with visual impairments and therefore has limited generalizability to other students with other disabilities.

Although our research base is limited, it does have the advantage of being relatively recent with all studies conducted since 2011. This is of importance due to the effects of changing STT technology. For example, several studies mentioned issues of accuracy of the speech recognition software as a concern or downside. If we were considering a study conducted 20 years ago it is likely the software being used would have had many more issues in recognizing speech, and therefore would likely not be very informative to current practice and research. All of the studies conducted used a single subject approach, with all but one looking at writing. Writing-related dependent variables differed across studies including fluency variables, accuracy variables, and complexity variables.

Overall findings, though somewhat mixed, included benefits of increased independence and favorable student impressions, immediate feedback, and low cost compared to other accommodations. Further, the amount and nature of training was identified as a possible factor related to student success. Results related to quantity and quality of writing when compared with other methods (keyboard, handwriting, or voice recorder) were somewhat mixed. Some of the findings showed STT use increased length and complexity of writing and decreased accuracy, but these findings are preliminary at best. A major challenge discussed in these studies was the accuracy of voice recognition, which in turn can lead to decreases in writing fluency and accuracy, as well as to student frustration. Based on available primary research, our review is unable to address how specific STT tools compare to each other due to variation in the software used in studies (e.g., software versions) and limited information provided. Additionally, the literature does not provide information on whether STT changes the construct being taught or assessed. For example, the literature does not address whether the skill of writing is different when a student writes with a STT tool compared to when they write without it.

Recommendations

The increased use of STT by students in their daily lives (e.g., phone voice recognition) along with its increased sophistication when used suggests the need for examining future research and educational practice (Warren et al., 2018). This is particularly true given that the majority of states currently allow the use of STT technology as an accommodation for students with disabilities on large-scale assessments (Lazarus & Strunk, 2018). This is an area where the technology and use of STT has outpaced research. Several recommendations for research and for practice and policy are proposed here.

Recommendations for research. A primary recommendation of this literature review is calling for research related to STT technology as an accommodation in both classroom and assessment settings. Given that it is an accommodation that is already in use, research on post-secondary use of speech-to-text, and research on STT in assessment settings (as is done for other accommodations; Kettler, 2012) is needed. Studies including more students are needed to identify potential moderators that may increase or decrease the effectiveness of STT as an intervention or accommodation; such moderators might include disability, age of students, training, or specific software utilized. Further research also is needed on the effects of STT for English learners. Finally, studies including the use of voice input, speech output calculators are needed to confirm the findings of Bouck et al. (2011) and expand the research to disability categories other than visual impairment. It is recommended, overall, that social validity and student perceptions be considered in research going forward as well.

Recommendations for practitioners and policymakers. Given the limited research related to STT, it is incumbent upon practitioners to use their best judgment when considering STT for students. Studies showed preliminary evidence of effective use by students ranging from elementary to high school. Still, as with other decisions about potential accommodation and assistive technology use, decisions about STT should follow established procedures for making accommodation and assistive technology decisions for students. Procedures may vary by locale, but will generally include an educational team-based decision, trial and training period, and evaluation of effectiveness (Lee & Templeton, 2008).

Based on the findings of this review, we recommend that students receive training in the use of their specific STT technology and have ample opportunities to use it in classroom settings before considering use in a testing setting. In the studies we reviewed, training time ranged from a single session of 30 minutes to five one-hour long sessions. The studies noted that students’ use continued to improve through the duration of the study. Given that the majority of studies reviewed used some form of DNS software, we do not have a basis to compare the effectiveness of different DNS software or to recommend specific STT technology. We do recommend that the selection of STT technology be an individual decision based on evaluation of student needs, technology features, supports, social validity, cost, and capacity to integrate with other technology, including online assessment technology that is frequently used in schools.

At the state level, support can be provided to local districts and educational teams by maintaining information on software that is compatible with large-scale assessments, and by sharing specific procedures for demonstrating a student’s need for STT. Further, as policy and practice are developed, it should be with enough flexibility to adjust to ever-changing technology. This may include practices such as describing the types and features of software permitted as accommodations (and that work with testing technology) instead of listing specific software, offering training and support materials for districts and teachers, and tracking the use of STT and its effects to inform continuing policy.


References

Bouck, E., Flanagan, S., Joshi, G., Waseem, S., & Schleppenbach, D. (2011). Speaking math – A voice input, speech output calculator for students with visual impairments. Journal of Special Education Technology, 26(4), 1-14.

Gardner. T. J. (2008). Speech recognition for students with disabilities in writing. Physical Disabilities: Education and Related Services, 26(2), 43-53.

Garrett, J. T., Heller, K. W., Fowler, L. P., Alberto, P. A., Fredrick, L. D., & O’Rourke, C. M. (2011). Using speech recognition software to increase writing fluency for individuals with physical disabilities. Journal of Special Education Technology, 26(1), 25-41.

Horner, R. H., Carr, E. G., Halle, J., McGee, G., Odom, S., Wolery, M. (2005). The use of single-subject research to identify evidence-based practice in special education. Exceptional Children, 71(2), 165-179.

Kettler, R. J. (2012). Testing accommodations: Theory and research to inform practice. International Journal of Disability, Development & Education, 59(1), 53–66.

Lee, H., & Templeton, R. (2008). Ensuring equal access to technology: Providing assistive technology for students with disabilities. Theory into Practice, 47(3), 212–219.

Lee, I. (2011). The application of speech recognition technology for remediating the writing difficulties of students with learning disabilities. Dissertation Abstracts International Section A: Humanities and Social Sciences, 73(7-A(E)).

McCollum, D., Nation, S., & Gunn, S. (2014, January). The effects of a speech-to-text software application on written expression for students with various disabilities. National Forum of Special Education Journal, 25(1), 1-13. Retrieved from http://www.nationalforum.com/Electronic%20Journal%20Volumes/McCollum,%20Dixie%20Effects%20of%20a%20Speech-to-Text%20Software%20NFSEJ%20V25%20N1%202014.pdf

Minnesota Department of Education. (2017). Procedures Manual for the Minnesota Assessments 2017-2018. Retrieved from http://minnesota.pearsonaccessnext.com/resources/resources-training/manuals/2017-18_Procedures_Manual.pdf

National Center for Technology Innovation. (2010). Speech recognition for learning. Retrieved from http://www.ldonline.org/article/38655/

Noakes, M. (2018). Does speech-to-text assistive technology improve the written expression of students with traumatic brain injury? Dissertation Abstracts International Section A: Humanities and Social Sciences, 79(1-A(E)).

Nuance Communications, Inc. (2014). Dragon NaturallySpeaking 13 installation guide and user guide. Retrieved from https://www.nuance.com/content/dam/nuance/en_us/collateral/dragon/guide/gd-dragon-installation-and-user-en-us.pdf

Partnership for Assessment of Readiness for College and Careers (PARCC). (2017). PARCC Accessibility Features and Accommodations Manual 2017–2018 (6th Ed.). Retrieved from http://avocet.pearson.com/PARCC/Home

Shandiev, R., Hwang, W.-Y., Chen, N.-S, & Huang, Y.-M. (2014). Review of speech-to-text recognition technology for enhancing learning. Educational Technology & Society, 17(4), pp. 65-84.

Smarter Balanced Assessment Consortium. (2018). Usability, accessibility, and accommodations guidelines. Retrieved from https://portal.smarterbalanced.org/library/en/usability-accessibility-and-accommodations-guidelines.pdf

Warren, S., Thurlow, M., Lazarus, S., & Strunk, K. (2018). Forum on speech-to-text and scribing: Getting a handle on what this means. Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved from https://nceo.umn.edu/docs/OnlinePubs/2018ForumReport.pdf


Appendix A

Computerized Speech-to-Text: Descriptive Information for Eligible Studies

Study Subject Dependent
Variable(s)
STT
Technology
Disability
Categories
Sample Analysis
Method
Bouck et al.
(2011)
Math Speed of
assessment
completion,
average number
of attempts to
solve problems
Voice input,
speech output
(VISO) calculator
Visual impairments 3 high school
students, Age 18-19
Single-subject,
repeated acquisition
design
Garrett et al.
(2011)
Writing Word count per
minute, percent
correct, type of
word errors,
recall of intended
meaning, length
Dragon Naturally-
Speaking 7
Preferred
Physical disabilities (n=5),
Asperger’s syndrome (n=1),
visual impairment (n=1)
5 high school students,
Grades 9-12, Age 15-18
Alternating
treatments design
Lee (2011) Writing Total text
produced, text
production rate,
total number of
T-units, T-unit
length, clause
length, number
of clauses per
T-unit, surface
errors, story
structure
Dragon Naturally-
Speaking
Premium 11.0
Unspecified learning
disabilities
4 elementary school
students, Grade 4,
Age 9
Single-subject,
alternating treatments
design
McCollum et al.
(2014)
Writing Total number
of words written,
number of
multisyllabic
words, number
of correct writing
sequences
Dragon Naturally-
Speaking (no
version stated)
Specific learning
disabilities related to
reading/written language
(n=1), emotional
disturbance (n=1),
intellectual disability (n=1)
3 students,
Grades 2, 3, 11
Repeated measures
Noakes (2017) Writing Total written
words, number of
words spelled
correctly, number
of correct word
sequences
Dragon Naturally-
Speaking (no
version stated)
Traumatic brain injury
with fine motor skills
deficits
3 students,
Grades 4, 8, 9,
Ages 8, 14, 15
Alternating
treatments