Audio-Lingua: Searchable Audio Database of Native Speaker Recordings for Your Language Class

By Adam Gacs, Instructor & Technology Specialist, Michigan State University

 

 

 

The name Audiolingua might immediately bring to many language teachers’ mind the once-popular but today largely shunned teaching method of the late 50s and 60s called Audiolingualism that favored dialogue drills and accuracy through memorization. The website Audio-Lingua, however, does not bear any explicit connection to this approach. It is a resource that I recommended wholeheartedly. 

Audio-Lingua, available at https://www.audio-lingua.eu, is a collaborative repository/sound bank of native speaker audio files that users can listen to on the website, download, and use for personal or pedagogical use according to the terms and conditions specified. The goal of the service is to improve spoken language comprehension by providing audio assets for teaching purposes. 

The website’s origins go back to the Academic Delegation of Digital Education in the Académie of Versailles and was first published in 2007. It includes almost 6500 audio files in 13 languages (at the time of writing) and new files are added on a continuous basis (about 500 new mp3s have been added in the last year, for example). Most of the audio files are in English (1500), then the site offers roughly the same number of files for Spanish, German, and French (around 1000 each). Also included are Russian (600), Italian (450), Portuguese (400), Chinese (200), Occitan (160), Catalan (100), Corsican (100), Arabic (60), and Guadeloupean Creole (50). The site language can be changed to any of these 13 languages. The English interface language was used for exploring this resource.

Exploring the Database: Topics or Languages as Starting Points

Picture 1 - Screenshot of the website homepage
Picture 1 – Screenshot of the website homepage

 

The homepage gives direct access to mp3 audio files in all 13 languages, quick and advanced search options, the top 10 best-rated audio recordings from across all languages, and a box with the latest recordings. To explore the database, the user may start by selecting one of the thirteen languages. The page displaying all search results in a language can seem a bit overwhelming at first with its multiple boxes of useful extra information and number of search results (see Picture 2 for English). The audio recordings available as part of the database are mostly spontaneous monologues (though some may be scripted), and a few are interviews. Users commonly upload multiple audio assets. Each audio asset is displayed in its own box with rich metadata (see Picture 2 left side column). All audio assets are tagged with the following information that also appears in the advanced search box on the right: language, level (Common European Framework of Reference A1-C2), gender, age (child, teenager, adult, senior citizen), and length (between 30 and longer than 180 seconds). It is also possible to rate the audio assets (five-star system, no comments allowed) and read short descriptions provided by the uploader. One can furthermore download audio files (mp3 files, 128 kbit/s bit rate), embed, copy the URL or scan the QR code to quickly play the audio resource on mobile devices. Search results pages also have randomly generated additional boxes, such as the Top Ten box that includes any of the web pages and assets rated highest by the users specific to that language. All of this ensures constant engagement and random discovery of audio assets.

Picture 2 - All results page for English audio resources
Picture 2 – All results page for English audio resources

 

Another way to explore the database may be through targeted quick search using the quick search box on the right. One can search any expression that may be part of the audio asset description (there are no full transcripts). I performed a quick search on keyword “Brexit” for English level B1 and found 19 results, the most audio results on this topic for any level. The third way of engaging with the database may be through browsing of topic tags. All topics, found on the top navigation bar, are tags and keywords in the selected display language that will bring up audio files in any language that match that topic tag (see Picture 3). The results then can be filtered further according to language. Most audio assets among all languages seem to be tagged under hobbies (495), food (468), introduce yourself (467), school (457), towns (424)/my town (351), followed by sport (330), history (326), my tastes (279). 

Picture 3 – All topics page containing tags in English

Browsing German Audio Assets According to Levels

I performed a search of German audio assets according to language level to demonstrate the variety of topics for this language and the distribution of assets for each level.  

For CEFR level A1 (about 15% of available German audio, 150 files) one can find self-introductions with physical descriptions, eating habits, favorite hobbies, daily schedules, family descriptions, and vacation reports. These files tend to be under one minute. I found some recipe descriptions that perhaps should have been tagged for higher levels. But we should also remember that A1 would probably correspond to Novice High on the ACTFL level (such correspondence has not been officially established, but see how CEFR ratings may be assigned to ACTFL assessments and also this article for possible equivalencies).

For CEFR level A2 (possibly ACTFL Level Intermediate Low/Mid) (about 25% of available German audio, 245 files) we can find more introductions, foods and recipes, some city descriptions, means of transportation, school schedules, holidays, future plans, soccer teams, social network app descriptions, but also personal reminiscences about a few historical events (e.g. Fall of the Berlin Wall). The length of the audio files can be up to two minutes. Some more advanced topics appear briefly, but their length is short, e.g. nuclear energy, smoking, film and book reviews.

CEFR Level B1(possibly ACTFL Level Intermediate Mid/High) (about 36% of available German audio, 355 files) is the most varied level from the point of view of topics. It has longer city descriptions, personal memories of holidays, hypothetical situations, more historical topics (GDR and reunification), more detailed recipes, recordings on language and dialects, bilingualism, Austria & Switzerland, and also some film and tv show reviews and information about music festivals. Overall, the past tense is featured much more prominently. 

Very few files seem to be outdated, such as predictions specific to the 2010 World Cup or information recorded in 2010 on compulsory military service (which was changed in 2011) and listening to music via an iPod Touch. Somewhat less relevant for a global audience may be specific audio assets about German-French cooperative projects and school exchanges. I found some topics to be a bit challenging for this level, e.g. the Arab Spring, renewable energies, and waste sorting. One unique kind of audio resource is actual museum personnel highlighting events in their parks or collections (2-3 files).

CEFR Level B2 (possibly ACTFL Level Advanced Low/Mid) (about 19% of available German audio, 189 files) contains some more advanced topics, such as election systems and political parties (the Pirate Party), jobs and internships, national identity, energy efficient buildings (e.g.passive houses), fair trade, some more detailed book and film reviews, and historical figures and events, (e.g. the Berlin airlift). There seem to be more texts at this level that are  scripted and read, and thus resemble written texts with no spontaneous formulations. 

CEFR Level C1/ (possibly ACTFL Level Advanced High) contains just 24 audio assets (2%) and C2 just 9 audio files as of this writing. There is a bit of history about the Frauenhofer Society, which originally invented and patented the mp3 compression for audio files. Many of these are spoken in a specific regional dialect, but just one is in Swiss German. (Interestingly there are no C level audio resources for English, the majority of English language files are either A2 or B1 level.)

Ideas for Classroom Use and Beyond: Listening Comprehension and Varied Input Sources

Many language instructors  have moved towards using audiovisual clips for developing, training and assessing listening comprehension. Textbook listening activities may eventually become outdated or may not mirror real-life spontaneous language use as desired. Thus, video clips hosted on popular video platforms have begun to take the place of audio-only clips (a curated example of native speaker videos for Russian may be found in the previous issue of the FLTMAG). However, several proficiency tests still have audio-only components and students still need to practice the audio-only listening mode. While the popularity of podcasts and audiobooks have somewhat brought back the audio-only engagement on demand, their lengths are often a problem for language teaching purposes. The clips by Audio-Lingua offer flexibility and a selection unmatched by any other platform for finding audio-based resources around one can create tasks right away without much modification. Another advantage is that users can find several speakers talking about the same or closely-related topics and students can listen for similarities and differences and use them as models for their own expressions. Students can also perform searches based on predefined topics and bring to class summaries or perhaps even some listening comprehension activities, or expansion tasks (e.g. speculate about the speaker, reflect on what you would like to know further, etc). The  database covers topics rarely represented in many traditional textbooks and are a welcome addition to any curriculum hoping to offer a more up-to-date picture of (in the case of my analysis) modern German-speaking countries, e.g. being vegan, sustainability, social media, smartphones, voluntary social/ecological year, and equal pay day.

Students could also transcribe all or parts of the audio assets. It would also be advisable to make full transcripts of these audio files available to students who might benefit from such accommodation or those who would like to verify their understanding after several tries. Audiolingua is already a fantastic resource for language teachers, but just a few improvements could be suggested: the availability of full audio transcripts with searchable keywords, more context on the audio assets, tasks, and speakers. Some suggested ideas or shared lesson plans involving the audio materials would also be welcome. Finally, the rating system of the audio resources could be improved by letting users indicate through comments why low ratings were assigned. It usually seems to be the issue of audio quality, but in other cases the user may be left wondering why the ratings were low, for example here.

Expanding the Database

Check Newest Submissions

On each search page, visitors can find podcast feeds to subscribe to the newest mp3 files via iTunes or RSS feeds. By registering for an account, users can organize their favorite mp3 resources as lists (podcast flux) and even publish all of them as RSS feeds or by levels (to accompany a language course for example). Using this approach, users can download multiple favorite mp3 files at once using the podcast aggregator of preference.

Submitting your own recordings does not require an account. There is a detailed legal notice and practical information for publishing documents, e.g. no copyrighted materials may be submitted. The editorial team checks all submissions. The language level of recordings are actually assigned by the team, but the uploader may suggest one.

Terms and conditions

The website thankfully carries a Creative Commons license for its audio assets indicated under each resource, but the site’s privacy policy does not make any mention of CC licenses and emphasizes that the actual recordings submitted by users are copyrighted and can only be used in personal, professional, educational contexts and not for advertising or commercial purposes. The sentence “The publication of Audio-Lingua resources on other websites is forbidden” stands in contrast with the CC licensing Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) of its individual assets that actually lets users copy and redistribute the material in any medium or format and also remix, transform, and build upon the material as long as the original source is credited and the new non-commercial materials carry the same license as the original files. The legal notice and practical information for publishing documents that users need to read before submitting their own resources does mention the Creative Commons license. I hope the Creative Commons licensing system will be also displayed under privacy policy by the database editors in the future for consistency and to ensure peace of mind for users (who might not read the legal notice), and would lead to wider global usage and awareness of this otherwise excellent resource.

 

References

Assigning CEFR ratings to ACTFL assessments. Retrieved March 9, 2020.
Audio-Lingua: https://www.audio-lingua.eu
Goertler, S., Kraemer, A., Schenker, T. (2016). Setting Evidence‐Based Language Goals. Foreign Language Annals 49.3, 434-54.
The CEFR Levels
The Russian Voices Project: Native-Speaker Interviews in the Foreign Language Classroom, FLTMAG, November 2019

Leave a Reply

Your email address will not be published.