ArticlesNovember 2017

VoiceThread Sequencing: an SLA-informed Approach

Dr. Dan NickolaiBy Dr. Dan Nickolai, Director of the Language Resource Center, Assistant Professor of French, Saint Louis University.



Introduction & Background

VoiceThread is well-known to tech-savvy educators as being a reliable and intuitive platform for creating multimedia-rich learning activities.  Its core functionality enables students to collaboratively annotate digital documents via voice and video comments. Yet the simple interface of the application (as welcome as it is) belies the true promise of the tool for language learning. Research insights from the field of Second Language Acquisition (SLA) can be methodically integrated into the design of these activities in order to bolster L2 listening comprehension skills. This article is meant to provide an accessible framework for creating an impactful and interactive lesson by selectively sequencing and commenting a series of VoiceThread documents into a single lesson.

VoiceThread document with student and instructor video annotations.
Picture 1 – Example VoiceThread Document with student and instructor video annotations in the left margin.

It is worth noting in passing that SLA research and theories may often seem contradictory, nebulous, or simply not actionable in today’s language classroom. And while decades of divergent SLA theories and learning suppositions abound in the academic literature, it is still largely incumbent upon the language instructor to transform theory into practice through some concrete strategy. Yet amidst the field’s divergent opinions and voices, there is popular consensus on the role played by comprehension in the language acquisition process. It is now fairly uncontroversial to suggest that understanding what is being communicated is essential to acquisition (VanPatten & Williams, 2007). Rendering input “comprehensible” will thus be our prime directive in this framework.

To better illustrate the proposed sequencing of VoiceThread documents, we will take as an example an intermediate Spanish-language lesson that discusses film piracy and industry efforts to curb its prevalence in Argentina. This entirely-online activity is centered around a short anti-piracy spot that dramatically condemns the copyright crime through analogy. In the clip, we see a father boasting to his son that his “cleverness” enabled him to purchase a pirated film that has yet to be officially released. The boy, in turn, proudly proclaims that he has been busy pirating as well. The father becomes concerned and watches his son pull out a perfect school assignment. As it unfolds, we see the father’s frequent piracy inspired his son to copy responses from his classmate. From the boy’s perspective, there is no difference between the pirated film and the “pirated” assignment. The child was expecting his father to approve of his own “cleverness”. We are then quickly confronted with the judgmental voice of a narrator questioning the ethics of the father.

Still from VoiceThread video of "Peliculas Piratas."
Picture 2 – Still from VoiceThread Video “Peliculas Piratas” (

Phase 1: Media Selection

The first step in this framework begins with the careful selection of the audio or video clip to be featured in the assignment. There are a few basic criteria that should be considered before choosing the media, all of which ultimately ease the cognitive burden of second language listening comprehension. The following questions can help guide the selection process:  

  1. Does the media selection tie into current class topics or lessons?
  2. Is the clip culturally authentic?
  3. Are the ideas presented relevant to the students’ lives and experiences?

It would be hard to overstate here the importance of context with regards to the proposed framework. Your media selection should be anchored in a familiar and coherent context in order to maximize the learners’ listening comprehension. As will be outlined later, an ability to understand the overall idea or theme of the clip is necessary to cognitively processing the language structures students will be hearing.  

An ever-increasing amount of SLA research (as well as national and international standards and guidelines for language learning) advocate strongly for the use of culturally “authentic” documents. This is understood to be content created by native speakers of a language for native speakers of that language.  The ACTFL Guiding Principles for Language Learning state, “The rich language found in authentic materials provides a source of input language learners need for acquisition”.  Indeed, the exposure to real, unmodified, and non-didactic language has been empirically demonstrated to drive the acquisition process.  This perspective is shared by both researchers and learners themselves (Chavez, 1998, p. 277). The cultural and grammatical richness of structures found in authentic documents provides the critical kind of language input necessary for L2 acquisition. Gilmore (2007) warns against using non-authentic language as simplified or contrived language can inhibit acquisition (p. 10). This sentiment is echoed by the Common European Framework of Reference for Languages, which states that acquisition can be hindered by the “syntactic over-simplification of authentic texts” (p. 165).

Finally, it is paramount that students can make a connection between the media selection and their own daily lives and experiences. In this regard, relevance is everything. Language is not acquired (or even processed) when it is not attended to, and garnering student attention requires presenting something meaningful to them. Providing a conceptual touchstone to connect with is thus essential. The clip should maximize learner attention, interest, and be of high relevance to students. Small efforts to cater to the student interests can in turn pay large dividends vis-à-vis motivation, engagement, and comprehension (Córdoba et al., 2005, p. 9).

The anti-piracy commercial serving as our example was selected for all the aforementioned reasons and more. First, the clip ties in thematically to the textbook lesson on technology and media. Second, it is culturally authentic as it was an actual video used in an anti-piracy campaign in Argentina. Third, the message of the clip arouses immediate interest and provokes reflection and connection-making on the part of the student. All of this prepares and motivates the students to process the target language. Additionally, the audio and video quality of the clip are excellent, and the father-son interchange is limited to well-articulated high-frequency vocabulary.

Phase 2: Document Sequencing

VoiceThread activity sequence of five commented documents
Picture 3 – Five commented documents in sequence create this VoiceThread Activity

Slide 1: Tantamount to proper media selection is its subsequent presentation as a series of interactive VoiceThread slides. Careful and deliberate sequencing of the lesson is a way of providing scaffolding for students encountering new language and structures. Our first document in the lesson will be a single PowerPoint slide that helps situate and contextualize the selected media. In our piracy example, we have a short paragraph in the L1 that students read about Argentina’s high cost of authentic goods, lax legal enforcement of copyright infringement, and some additional cultural notions about intellectual property. Providing this background information as a first step in our lesson has multiple aims. First, new language is more easily understood and processed when anchored in a known context. The contrary is perhaps even more illustrative; an unclear context risks rendering even basic language use incomprehensible. Along with understanding the overall context, another goal is to activate prior knowledge, or schemata, at is sometimes referred to in SLA literature (Carrell, 1983). Tapping into prior knowledge is a way of priming students’ ability to cognitively process the L2 (Córdoba et al., 2005, p. 9).

The final aspect of our first slide in the VoiceThread lesson is to include a brainstorming question in the L1. In our piracy example, we solicit voice comments to the following question: “What kinds of strategies have you seen or would you expect to see in an anti-piracy commercial?” This open-ended question invites situated reflection and guides anticipation strategies for the students. This aspect also facilitates what is known as top-down language processing. It is the top-down skills that call upon existing knowledge about the world as a comprehension strategy to interpret spoken language (McBride, 2009, p. 5). Easing top-down language processing should be understood as the primary objective of the first slide of the framework.

Slide presentation with information on the subject of piracy and questions.
Picture 4 –  Slide #1 with background information and brainstorming question

Slide 2: After the initial slide, we are ready for the first viewing. The full media clip is the second document to sequence in the VoiceThread activity. With a known context, prior knowledge activated, and anticipation strategies primed, it is likely many students will be in a cognitive position to make full sense of the media clip being presented. Swaffar and Arens (2005) remind us that “topic familiarity can overcome a large quantity of unfamiliar vocabulary” (p. 105). However, language comprehension does not rely on top-down processing skills alone. The first viewing presents an opportunity to call upon bottom-up language processing as well. These skills require attention to individual sounds, words, and phrases, and subsequently parsing these into coherent units of meaning (McBride, 2009, p. 5). On this second VoiceThread document, students can be instructed to list (via moderated comments to prevent copying) any keywords or phrases they have heard in the clip. It is the combination of close-listening and broader contextualizing that ease the cognitive processing required to make sense of the video.

Slide 3: The third document to sequence in this activity is the second viewing of the video. This can be the video in its entirety or simply a sub-selection if appropriate. In order to provide an additional pillar of support, this viewing should be preceded by a confirmation and comprehension question. In our piracy example, we ask the students via text comment: “What act is pirating a film compared to, and what does the father come to realize about his gift?” This question is very deliberately formulated in order to confirm some meaning while simultaneously requesting clarification about a key detail of the interchange. Swaffer and Arens (2005) remind us that “questions themselves initiate interpretation” (p. 95). Additionally, knowing what to look for, or attend to, is often sufficient to identifying information that may otherwise have gone unnoticed. At this point in the activity the context and storyline will be easily comprehended, even if some of the language itself was not understood.

Slide 4: Because the scaffolding of this activity has already strived to maximize top-down language processing, the fourth VoiceThread document can be leveraged to call again upon the complementary bottom-up comprehension skills. This can be done with a partial transcript of the “fill-in-the-word” variety common to L2 instruction. Like our first slide, this document can be designed in PowerPoint and feature spaces in the transcript for selected words. In order for students to be able to hear and see the transcript simultaneously, an instructor media comment can be uploaded onto the fourth slide. To this end, the platform allows you to comment with a prepared audio or video file. In our piracy example, we uploaded an audio-only (MP3) version of the clip as an instructor comment. Students were then directed to fill in the blanks via moderated text comments. The near-complete transcription and audio overlay work together to further ease listening comprehension and confirm any lingering doubts of meaning.

PowerPoint slide of dialogue and fill in the blank exercises, with audio annotations in the margin.
Picture 5 – Slide #4 was created in PowerPoint and features an uploaded MP3 comment for the audio.

Slide 5: The sequencing of the documents up to this point seeks to establish optimal cognitive conditions for successful listening comprehension of the L2 to occur. The fifth and last slide should conclude with a final viewing of the media selection and an invitation for students to make meaningful connections and reflections on what they just saw. In our piracy example, we solicit class feedback via voice and video comments to the following questions: “Do you feel this kind of anti-piracy campaign can be effective in the United States? Why? What are the strengths or weaknesses of this particular campaign video?” This last slide is an opportunity for students to think more globally and to form a meaningful opinion or comparison with their home culture.

Concluding Thoughts

The above framework is informed by several important observations from SLA research. The first and foremost being that enabling comprehension is the primary driver of language acquisition. This framework also embraces the idea that language learners have limited cognitive processing power at any given moment. Proper scaffolding of activities is thus essential to avoid overtaxing these finite resources. The cognitive burden of L2 processing can be reduced by alternating top-down and bottom-up language skills, both of which are required to construct meaning. It is also possible to facilitate listening comprehension by providing contextual support, activating prior knowledge, and priming anticipation strategies. Student learning can further be reinforced by encouraging personal connections to be drawn with the selected content. VoiceThread serves as a powerful (and device-agnostic) platform for sequencing these kinds of media-rich activities. While the use of the tool itself does not inherently enhance L2 learning, successful comprehension activities can be modeled and streamlined when coupled with the proposed framework.


ACTFL Proficiency Guidelines. Retrieved from

Carrell, P. L. (1983). Some issues in studying the role of schemata, or background knowledge, in second language comprehension. Reading in a foreign language, 1(2), 81-92.

Chavez, M. (1998). Learners’ perspectives on authenticity. International review of applied linguistics in language teaching, 36(44), 277-306.

Common European framework of reference for language learning, teaching, and assessment.
Retrieved from

Córdoba Cubillo, P., Coto Keith, R., & Ramírez Salas, M. (2005). La comprensión auditiva: definición, importancia, características, procesos, materiales y actividades. Revista Electrónica: Actualidades Investigativas en Educación”5(1).

Gilmore, A. (2007). Authentic materials and authenticity in foreign language learning. Language Teaching 40(2), 97-118.

McBride, K. Podcasts and second language learning. Language Learning & Language Teaching (LL&LT), 153.

Swaffar, J., & Arens, K. (2005). Remapping the foreign language curriculumNew York: Modern Language Association.

VanPatten, B., & Williams, J. (2011). Theories in second language acquisition: An introduction. New York: Routledge.

Leave a Reply

Your email address will not be published. Required fields are marked *