a person with a virtual headset on who sees various things like a game controller, a trophy, books, a lightbulb, a pen, and other languages

DIVE-L: Design of Immersive Virtual Environments for Language Learning – Guidelines for Thoughtful Integration for Instructors

June 28, 2025 469 Views 0 Comments virtual reality, VR

By Ogulcan Durmaz, Informatics PhD Student at University of Illinois Urbana-Champaign

DOI: https://www.doi.org/10.69732/WXMT7186

Introduction

Immersive virtual environments like virtual reality have great potential for language learning, but instructors considering using it with their students need assistance in how to adopt it for their classrooms. DIVE-L: Design of Immersive Virtual Environments for Language Learning is a guideline that I have developed to outline key considerations for integrating immersive virtual worlds into formal and informal language education. Grounded in the affordances of virtual reality explored by many researchers (Alfadil, 2024; Chen & Yuan, 2023; Kaplan-Rakowski & Gruber, 2022; Xie et al., 2019) and game-based learning, DIVE-L aims to enhance second language acquisition by being a guide in using virtual worlds in language learning and teaching. This guideline draws on foundational theories and sources, including Starks’ (2014) Cognitive Behavioral Game Design (CBGD) model and Reinhardt’s (2019) Gameful Second and Foreign Language Teaching and Learning: Theory, Research, and Practice. It is also reinforced by core language learning theories such as Krashen’s Input Hypothesis, Swain’s Output Hypothesis, Long’s Interaction Hypothesis, and Vygotsky’s Sociocultural Theory. Informed by both theoretical and practical perspectives, DIVE-L supports educators in selecting and utilizing immersive environments that cater to diverse learner profiles and foster effective language teaching and learning in virtual reality contexts with six principles: Affordability, Autonomy, Multi-modality, Technical Interactions, Pedagogical Interactions, and Situated Meaning (see Picture 1). Instructors will want to reflect on each of these principles as they plan learning activities that include immersive virtual environments. For those who design their own virtual environments, these principles can help to guide them during the development process, and for those who instead are utilizing resources that were created by others, the principles can serve as a way to evaluate and prepare for the implementation of these kinds of experiences.

Picture 1 - DIVE-L Principles - Affordability, Autonomy, Multi-modality, Technical Interactions, Pedagogical Interactions, Situated Meaning — *Picture 1* – DIVE-L Principles

Principle 1: Affordability

Affordability is an important consideration for language teachers who incorporate immersive virtual environments into their instruction. Both hardware and platform-related costs can influence the feasibility of integrating these tools in educational settings. Among available headsets, the Meta Quest series, particularly the Meta Quest 2, offers an accessible entry point due to its relatively low cost and wide availability. The Meta Quest 3S, released in October 2024, provides a more affordable alternative to the mixed reality headsets that combine real and digital worlds. In terms of platforms, options such as RecRoom, FrameVR, Roblox, EngageXR, Spatial.io, and VR Chat allow teachers to create or access immersive experiences, with most offering free versions and optional premium plans. Typically, only the teacher may require a paid subscription for expanded functionalities, such as creating virtual experiences. When selecting a platform, teachers need to strive towards considering the age and profile of their learners, the size of the group, and their own familiarity with the platform’s interface and features. By carefully evaluating available technologies, teachers can make cost-effective choices that support meaningful language learning in immersive environments.

Principle 2: Autonomy

Autonomy plays a pivotal role in game-based second language learning, enabling learners to take greater control of their educational experiences. It promotes self-regulation and independence, which are valuable in immersive virtual environments. From a theoretical standpoint, autonomy in virtual environments aligns with Vygotsky’s (1978) sociocultural theory, which emphasizes the importance of active participation in meaningful contexts. Similarly, Piaget’s theory of cognitive development stresses learning through exploration and discovery. As McLeod (2024) explains, active engagement with contextualized tasks facilitates the internalization of knowledge. Immersive virtual environments offer learners opportunities to interact with world objects, direct their own progress, and learn through exploration. Especially in higher-immersion contexts, these environments simulate real-world scenarios that enable learners to use language meaningfully. Supporting autonomy in such settings allows teachers to create conditions in which learners can actively shape their own language learning journeys.

Reinhardt (2019) highlights that learner autonomy can be supported through access to resources such as dictionaries, guides, and notepads, which allow students to solve problems independently and build confidence in navigating the virtual space. When using virtual environments, teachers can make sure that such resources are available for their learners. In addition, they can encourage autonomy by offering structured support that facilitates independent engagement in the virtual environment. Introducing learners to the mechanics, navigation, and features of the platform through brief tutorials can help establish this foundational support system. Clear and achievable goals also foster a sense of direction and enable learners to approach tasks with greater confidence. According to the Cognitive Behavioral Game Design Model (Starks, 2014), organizing instructional elements using consistent patterns or visual cues, such as systematic color schemes, can support informed decision-making. This kind of structure enhances navigation and strengthens the learner’s sense of agency in the virtual environment. Therefore, when utilizing immersive environments, teachers may consider the available features of the chosen platform to promote the aforementioned structure.

Principle 3: Multi-modality

Multi-modality refers to “…the coordination of multiple different systems of signification to communicate a single, or at least a unified, message or meaning” (Dressman, 2019, p. 39). Immersive virtual environments are well suited to support this principle by combining visual, auditory, and textual input in ways that enhance learning. The aesthetics of these digital spaces, including high-resolution visuals and 360-degree settings, promote a sense of presence that mimics real-world experiences. In addition to visuals, sound effects play a key role in shaping how learners perceive and interact with the environment. When using virtual environments, teachers can take advantage of these multimodal affordances by offering input through multiple channels. For example, a non-user character (NUC) might deliver spoken language accompanied by on-screen transcriptions, enabling learners to process information by listening and seeing. Written elements such as signs, instructions, and menu items, when embedded directly into the virtual environment, reinforce learning without disrupting the immersive experience.

The Input Hypothesis (Krashen, 1982) underscores the need for comprehensible input slightly beyond the learner’s current proficiency level. While multimodality does not directly affect the level of difficulty, it can significantly improve comprehensibility by presenting information in multiple formats. From a developmental perspective, Piaget’s theory of cognitive development suggests that young learners in particular benefit from sensory-rich experiences as they construct knowledge. Also, Lemetyinen (2023) notes that multi-sensory engagement helps learners progress through cognitive stages by actively interpreting diverse forms of input. Therefore, incorporating multi-modal elements in immersive environments facilitates both language comprehension and deeper knowledge construction. By doing so, teachers can create or offer engaging, context-rich scenes that support language learning.

Principle 4: Technical Interactions

The Technical Interactions principle in immersive virtual environments involves engagement across multiple dimensions and typically encompasses three main types: user–interface (UI) interaction, user–environment interaction, and user–user interaction. When incorporating these environments into language teaching, educators may consider how each type of interaction contributes to learner engagement and language use.

Firstly, user–UI interaction focuses on how learners navigate menus, access tools, and manipulate interface elements. To support effective interaction, teachers need to strive towards choosing platforms with accessible, intuitive user interfaces that offer consistent visual design, clear icons, and logical navigation. Features such as tooltips, hover explanations, and context-sensitive help (ideally presented in the target language if the students’ level is sufficient) can turn routine navigation into opportunities for incidental learning. Familiarizing students with the UI at the outset, through short demonstrations or guided exploration, can enhance learner autonomy and reduce cognitive load during the actual activity time in the immersive world.

Secondly, user–environment interaction refers to the ways learners engage with the virtual world itself. Platforms that offer dynamic and responsive environments, such as interactive 3D objects, ambient sounds, and environmental cues, can enhance immersion and support language use. For instance, the ability to manipulate virtual objects (e.g., grabbing, moving, or activating them) is an important consideration. Teachers may seek out or create tasks that encourage learners to interact with their surroundings to solve problems, complete objectives, or explore meaning in the target language, ensuring learners’ active involvement in the language learning process.

Thirdly, user–user interaction is a vital component of language learning in virtual environments. Peer interaction enables learners to negotiate meaning and practice communication in authentic, goal-directed contexts (Prensky, 2001). Some platforms support social features such as friend systems, private or group chat, and avatar gestures or emotes, which can facilitate communication between peers and foster a sense of community. These tools can be leveraged to strengthen social presence and increase motivation when learning a language in an immersive setting. Interaction with NUCs can further enrich the learning experience. Well-designed NUCs can guide learners through tasks, provide contextual language input, and respond using multimodal communication. With recent advances in AI, NUCs can now recognize speech and engage learners in responsive dialogue, offering authentic input that supports meaning negotiation in line with the Interaction Hypothesis (Long, 1981).

By intentionally selecting and using virtual platforms that support these three forms of interaction, language teachers can utilize immersive and engaging environments that promote meaningful language use. Effective interaction design ensures that learners are not only navigating the virtual world but also actively engaging with content, peers, and language in dynamic ways.

Principle 5: Pedagogical Interactions: Tasks, Collaboration, and Feedback

In immersive virtual language learning environments, interaction extends beyond technical usability to include pedagogically driven dimensions that support language acquisition. Key among these are Tasks, Collaboration, and Feedback. These elements foster more meaningful, goal-directed engagement in the target language. Teachers using virtual platforms may consider how these forms of pedagogical interaction can enhance the learning experience.

Firstly, to support language learning goals, teachers might implement a task system that aligns with instructional objectives while drawing on the affordances of the virtual platform. A task, defined as an action or goal the learner is expected to complete, can be embedded in the virtual world to create authentic learning opportunities. Within game-like environments, tasks often involve achieving specific objectives, where progress may be rewarded or shaped by consequences. When tasks are sequenced, they can foster incremental learning and skill development (Prensky, 2001; Reinhardt, 2019). These can be valuable for teachers in reinforcing what students already know in the second language and to build on top of their existing knowledge base in the virtual settings.

A quest system, for example, offers a structured way to guide learners through a series of tasks while practicing language in context. Rather than relying on external instruction, teachers might aim to present directions and objectives through features of the virtual world itself, such as signs, sound cues, or interactive prompts, maintaining immersion. When needed, visual guidance like spotlights or environmental changes may help orient learners toward relevant content. Picture 2 is taken from a game called Treasure Island, which is a virtual reality game in Spatial.io. In this game, a general quest system including a series of tasks is given to guide the users to escape from the island. The picture shows the first one where players need to talk to a NUC to learn about the story, get necessary directions, and initiate the following task. A similar system can be adapted to language learning experiences, formal or informal, in virtual spaces for a guided and incremental language learning experience.

Picture 2 - Sample Quest System from Treasure Island Game in Spatial.io - has a scene with a ship and a prompt that says "Talk to the captain" — *Picture 2* – Sample Quest System from Treasure Island Game in Spatial.io

Theoretically, tasks that encourage language production, whether written or spoken, contribute to acquisition as proposed in Swain’s (2005) Output Hypothesis, which highlights the role of output in deepening language processing. Teachers may strive to include tasks that prompt meaningful use of language, in line with Reinhardt’s (2019) emphasis on productive outcomes. In parallel, tasks that provide meaning-focused input through reading, listening, or story-based progression can support comprehension and learner engagement (Nation, 1996). Drawing on the Cognitive Behavioral Game Design Model (Starks, 2014), educators can seek to create or adapt task structures that are interactive and engaging, leading to mastery of the intended teaching content.

Secondly, collaboration, the process of learners working together to achieve shared goals through interaction and mutual support, presents a powerful opportunity for learners to co-construct meaning, develop communicative skills, and engage in socially meaningful interaction with each other. Based on Reinhardt’s (2019) work, teachers might aim to incorporate tasks that invite shared participation. Collaboration between the learners can be facilitated by collaborative goals, where learners depend on each other to complete a task or explore a challenge. Virtual environments often afford features that support cooperative gameplay or platformer-style mini-games that rely on teamwork. Incorporating such elements may encourage learners to engage in target language use naturally and purposefully.

This approach resonates with two foundational language learning theories. Vygotsky’s (1978) Zone of Proximal Development (ZPD) emphasizes that learners reach higher levels of development when supported by peers or tools within their environment. Although collaboration is considered human-to-human, as peer interaction requires, it can also be provided in the virtual world through interactive non-user characters (NUCs). If the chosen platform allows, learners can carry out the activities through collaboration with NUCs. Additionally, collaboration can be helpful for facilitating communication. Long’s (1981) Interaction Hypothesis also reinforces the importance of communicative interaction in second language development. By incorporating collaborative activities in the virtual worlds, teachers can provide learners with rich opportunities for negotiation of meaning and mutual support. Ultimately, such collaboration may lead to more personalized, guided, and communicative learning experiences within the virtual context.

Thirdly, feedback refers to the information learners receive about their performance or understanding in response to their actions in an immersive virtual environment. It helps learners understand how their actions relate to outcomes, thereby supporting reflection and improvement. As Reinhardt (2019) suggests, learners benefit when they are informed or given the option to learn why a particular choice led to a specific result. Prensky (2001) likewise underscores that meaningful feedback enhances engagement by clarifying performance. While explicit correction may not always suit informal or immersive settings, teachers can strive to incorporate immediate and implicit feedback mechanisms, such as adaptive dialogue with NUCs, interactive responses, environmental changes, or auditory cues for their learners. These forms of feedback can help learners adjust their language use without interrupting immersion when using virtual environments.

From a theoretical standpoint, feedback supports both interactionist and behaviorist models of language learning. Long’s (1981) Interaction Hypothesis posits that feedback facilitates acquisition through responsive communication. Additionally, Skinnerian principles of reinforcement and repetition (Lemetyinen, 2023) can be reflected in game elements where accurate actions are rewarded or where learners encounter “fail states” that encourage retrying and refining their language. In sum, incorporating responsive and context-aware feedback mechanisms can support both learner progress and instructional refinement, making virtual environments more effective for language development.

Principle 6: Situated Meaning

Situated meaning refers to the way learners engage with language that is embedded in meaningful and context-rich virtual experiences. In immersive environments, language learning is not isolated from its use but is shaped by the learner’s role, the surrounding narrative, and the cultural and spatial design of the virtual setting. Three interrelated components, identity, narrative, and context, contribute to this principle.

Firstly, identity offers learners a powerful way to engage more deeply with the target language. Game-based environments allow learners to adopt and customize avatars, offering opportunities to experiment with different personas and potential future selves (Reinhardt, 2019). By interacting through these virtual identities, learners can take on roles that encourage them to communicate from new perspectives. This is particularly valuable in role-playing tasks, where identifying with a character can increase motivation, reduce anxiety, and enhance language production (Essoe et al., 2022). When learners feel connected to their avatars, they are more likely to speak and act in ways that align with their virtual roles. This connection can foster lower affective filters (Krashen, 1982), encouraging risk-taking and meaningful language use. When choosing virtual environments, teachers may consider prioritizing those that allow for detailed avatar customization. Customization opportunities enable learners to construct identities that reflect their preferences, goals, or aspirations. Picture 3 demonstrates an avatar customization system where users can create their 3D avatars that can be used in the virtual environments. The presence of such a system will help teachers realize previously mentioned benefits.

Picture 3 – Avatar Customization System Example, ReadyPlayerMe Avatars

From a theoretical perspective, Sociocultural Theory (Penuel & Wertsch, 1995) positions identity development as a product of interaction within social and cultural contexts. Virtual worlds, by enabling learners to perform new linguistic identities through avatars, create rich opportunities for such development. These environments also align with Starks’ (2014) Cognitive Behavioral Game Design Model, which highlights the importance of role models, personal reflection, and relational engagement. Adopting their desired identities through virtual avatars, learners can engage with the target language more freely.

Secondly, narrative plays a vital role in serious game design (designing games with educational goals), including language learning games, as it facilitates contextual learning and engages learner-users (Prensky, 2001; Reinhardt, 2019; Starks, 2014). It can also function as an evolving interaction pattern, where learners’ choices influence the story’s progression (Prensky, 2001). In immersive virtual environments, teachers may consider narrative development, structuring other components around it to enhance contextualized language learning. Inspired by massively multiplayer online role-playing games (MMORPGs) like World of Warcraft and Guild Wars 2, virtual environments can feature storylines that require learners to use the target language in the given context to progress.

Theoretically, a narrative may support Nation’s (1996) Four Strands by providing meaning-focused input (listening and reading within context) and meaning-focused output (speaking and writing through tasks and interactions). While writing is often less practical in virtual settings, collaborative story-driven environments can allow learners to practice all four skills: listening, speaking, reading, and writing. Moreover, narratives may lower Krashen’s (1982) affective filter as they model the language use and limit it to the given context.

Lastly, context is a critical principle in using immersive worlds for language learning, as it directly influences learner engagement and recall (Essoe et al., 2022). Teachers can begin by clarifying the instructional purpose of the virtual environment they plan to use (Reinhardt, 2019) and identifying a thematic setting that best supports their language teaching goals. This theme can guide the selection of environments with appropriate 3D objects (assets) and artistic elements that align with the learning objectives while helping avoid unnecessary technical complications. The chosen virtual spaces may serve not only educational goals but also reflect the culture of the target language or situational contexts relevant to communication. Artistic coherence, such as the use of consistent stylistic choices, can enhance immersion by creating visually unified learning environments. Teachers may opt to use a single setting or move between interconnected virtual spaces, depending on the sequence of their lessons and the cultural or linguistic goals they wish to emphasize.

A well-crafted and purpose-driven environment can lead to deeper learner engagement, as students interact with meaningful elements tied to language use. This principle aligns with Starks’ (2014) principles of nature, goals, and outcome expectations. By choosing environments that incorporate cultural and narrative elements, teachers can provide immersive, meaningful learning experiences that support both second language acquisition and intercultural understanding in the target language.

Conclusion

The DIVE-L guideline provides a comprehensive and research-informed foundation for using immersive virtual environments in language learning. It presents six interconnected principles: Affordability, Autonomy, Multi-modality, Technical Interactions, Pedagogical Interactions, and Situated Meaning. These principles draw from established theories in second language acquisition and game-based learning, offering support for the design of immersive, engaging, and pedagogically grounded experiences. Since it is often difficult to address every principle in every context, the guideline allows for flexible interpretation based on instructional goals, learner needs, and the capabilities of the chosen platform. Rather than prescribing a fixed approach, DIVE-L promotes thoughtful and purposeful integration of virtual environments that encourage meaningful language development and active learner engagement.

Guiding Questions

Based on the discussion above, teachers who would like to use virtual worlds can look for answers to the following questions when implementing an immersive language learning experience.

Affordability:

Is the device (like a VR headset) affordable?
Is the platform used affordable?

Autonomy:

Can students use the headset easily?
Can students use the target platform or app without the teacher’s help after a training period?
Are students able to guide themselves in the virtual environment?

Multi-modality:

Does the target platform offer different tools like the ability to upload PDFs, import 3D objects, add sounds, voice chat, and text chat?
How can multiple modes (e.g., visual, auditory, kinesthetic) be integrated in the virtual world?
What sensory inputs (sight, sound, touch, movement) are most relevant to the instructional goals?

Technical Interactions:

Is the interface intuitive and easy for learners to navigate?
Can learners access menus, tools, and settings without assistance?
Are the controls responsive and accessible?
Is the virtual world “mostly” interactable, including the objects in it?
Is the interface consistent across different sections of the virtual world?
Does the virtual environment have different modes of communication, such as text-based chat and voice chat, promoting interaction between users?

Pedagogical Interactions:

Is there a task system that allows learners to progress incrementally?
Are there opportunities for collaborative tasks in the virtual environment?
Can learners receive feedback in multiple modes, such as haptic, audio, and visual?

Situated Meaning:

Can learners create and customize their avatars in the virtual world?
Is there a story/theme embedded in the virtual environment?

References

Alfadil, M. (2024). Immersive virtual reality: A novel approach to second language vocabulary acquisition in K-12 education. Sensors, 24(22). https://doi.org/10.3390/s24227185

Chen, C., & Yuan, Y. (2023). Effectiveness of virtual reality on Chinese as a second language vocabulary learning: Perceptions from international students. Computer Assisted Language Learning, 1–29. https://doi.org/10.1080/09588221.2023.2192770

Dressman, M. (2019). Multimodality and language learning. In M. Dressman & R. W. Sadler (Eds.), The handbook of informal language learning (Chapter 3). Wiley. https://doi.org/10.1002/9781119472384.ch3

Essoe, J. K., Reggente, N., Ohno, A. A., Baek, Y. H., Dell’Italia, J., & Rissman, J. (2022). Enhancing learning and retention with distinctive virtual reality environments and mental context reinstatement. NPJ science of learning, 7(1), 31. https://doi.org/10.1038/s41539-022-00147-6

Kaplan-Rakowski, R., & Gruber, A. (2022). Motivation and reading in high-immersion virtual reality. In B. Arnbjörnsdóttir, B. Bédi, L. Bradley, K. Friðriksdóttir, H. Garðarsdóttir, S. Thouësny, & M. J. Whelpton (Eds.), Intelligent CALL, granular systems, and learner data: Short papers from EUROCALL 2022 (pp. 208–213). Research-publishing.net. https://doi.org/10.14705/rpnet.2022.61.1460

Krashen, S. (1982). Principles and practice in second language acquisition. Prentice-Hall International.

Lemetyinen, H. (2023). Language acquisition theory. Simply Psychology. https://www.simplypsychology.org/language.html

Long, M. H. (1981). Input, interaction, and second language acquisition. Annals of the New York Academy of Sciences, 379, 259–278. https://doi.org/10.1111/j.1749-6632.1981.tb42014.x

McLeod, S. (2024). Piaget’s theory and stages of cognitive development. Simply Psychology. https://www.simplypsychology.org/piaget.html

Nation, I. S. P. (1996). The four strands. Victoria University of Wellington. https://www.wgtn.ac.nz/lals/resources/paul-nations-resources/paul-nations-publications/publications/documents/1996-Four-strands.pdf

Penuel, W. R., & Wertsch, J. V. (1995). Vygotsky and identity formation: A sociocultural approach. Educational Psychologist, 30(2), 83–92. https://doi.org/10.1207/s15326985ep3002

Prensky, M. (2001). Digital game-based learning. McGraw Hill.

Reinhardt, J. (2019). Gameful second and foreign language teaching and learning: Theory, research, and practice. Palgrave Macmillan.

Starks, K. (2014). Cognitive behavioral game design: A unified model for designing serious games. Frontiers in Psychology, 5. https://doi.org/10.3389/fpsyg.2014.00028

Swain, M. (2005). The output hypothesis: Theory and research. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 471–483). Lawrence Erlbaum.

Vygotsky, L. S. (1978). Mind in society: Development of higher psychological processes (M. Cole, V. Jolm-Steiner, S. Scribner, & E. Souberman, Eds.). Harvard University Press. https://doi.org/10.2307/j.ctvjf9vz4

Xie, Y., Chen, Y., & Ryder, L. H. (2019). Effects of using mobile-based virtual reality on Chinese L2 students’ oral proficiency. Computer Assisted Language Learning, 34(3), 225-245.

The FLTMAG