ArticlesJuly 2018

Google Speech Technology: Gboard & Voice Typing

Joan Palmiter BajorekBy Joan Palmiter Bajorek, Doctoral Candidate, University of Arizona, Second Language Acquisition and Teaching (SLAT) Program.

 

Lightening fast, accuracy improving, and with sophisticated interfaces, the ability for instructors to get free and fantastic speech technology into the hands of language learners has never been easier.

In this article, we explore how Google Voice Typing and Google Gboard can be used today to facilitate multilingual language use and learning. Faster than typing, reports indicate that voice typing (dictation) is three times faster than manual keyboard typing (Ong, 2017). This may not be quite as fast for language learners, but certainly could help them build their proficiency in speaking the new language.

Is speech recognition perfect today? No, of course not. But Google’s speech technology is considered one of the best in the world (Kudryavtsev, 2016; Novet, 2015; Ong, 2017; Tatman, 2017). Google reported an 92% accuracy for its speech recognition technology in 2015 for native speakers (Novet, 2015). With the recent demonstration of Google Duplex for making automated calls to get haircut appointments and the like (Malcom, 2018), this form of speech recognition is superior to its peers, Amazon’s Alexa and Apple’s Siri.These advances in speech technology overall are thanks to artificial intelligence through machine learning, enhanced processing power, and big, big, big data sets.
Important for those of us who are multilinguals, Google supports 119 languages. This means that it does have commonly available languages: English, Spanish, French, and German, but also Arabic, Turkish, and Chinese. Google speech recognition works best for people who have supplied it with lots of previous data, working best for early adopters and affluent populations.The best accuracy rates for speech recognition today are often for white, male, Californian speakers (Hannun, 2017; McMillian, 2011; Nicol et al., 2002, Tatman, 2017).

Many jokes have been made about exactly how bad these systems do with native speakers of Scottish English, such as Scottish comedians parody in a voice activated elevator that is unable to understand them (Burnistoun, 2011). But these biases and frustrations aside, if the tech tools are used for learning purposes and not for assessment, they can have immense benefits.

Below are some of the benefits of using Google Voice Typing and Gboard.

TOP BENEFITS FROM SPEECH TECHNOLOGY

  • Individualized Practice and Agency: Learners can work on material in personalized ways. Learners can learn more about their own personal speech habits and tendencies. Beyond common pronunciation difficulties, users can also see how many filler words, “um” and “uh” type words they produce.
  • Time Immersed: By speaking the language, learners can spend more time thinking and being in an immersed environment. Spending time speaking and interacting with the second language are crucial for acquisition (Ellis & Bogart, 2007).
  • Immediate Feedback: With almost instantaneous processing speed, students get direct feedback about what the speech recognizer did or did not understand.
  • Focus on Speaking: Learners can speak for longer durations than many typical classrooms may support (Bajorek, 2017)
  • Skill Building: Students can see direct connections between their speech and writing skills.
  • Speed: Manually typing can be fatiguing and slow. Speech can be significantly faster because of using your voice not your hands to communicate.

VOICE AND DICTATION

Let’s be clear that “voice typing” really means dictation. One reason for the rebranding of the term may be because the experience feels different than previous, clunkier versions of dictation tools. This Gboard and Voice Typing are sophisticated enough to give you options of what it believes you said by underlining words and phrases that it is less sure about recognizing correctly. You can often specify your dialect, so that the recognizer best matches your voice.

For your smartphone, Google Gboard has its speech technology integrated into a keyboard and search bar (see Picture 1). This Gboard is available on iPhone, Android, Pixel, and Samsung smartphones. Users can enable up to three languages of their choosing to toggle between (see Picture 2).

Screenshot of Gboard - the Google Keyboard in device store.
Picture 1: Gboard- the Google Keyboard (Google Play Store, 2018).

With this keyboard, you can voice type, use Google Translate, use emojis, and in general use a keyboard in multimodal ways that currently aren’t supported on many other keyboards. In many ways, Google wants to make their products more accessible in the mode that people are currently using the most, the keyboard on the mobile phone. The size of the individual keys on the board may differ in size from the regular keyboard. You might notice some glitches, but you can always go to another keyboard and return to the Gboard when you need it.

Faster and more accurate than Siri, users can easily enable the keyboard, speak what I need to say, and check for any modifications that are needed (see Picture 2). Instead of opening a new browser to find certain things, Google has made it possible to search from within the keyboard (see Picture 3). Best of all, you can have this keyboard enabled in up to three languages. If part of your life is in one language, and the other in another language, all you need to do is toggle on the keyboard. There are still glitches, such as repeating parts of the sentences or accidentally toggling to the wrong language, but the ability to use this to practice your second language is powerful.

Screenshots of Gboard that demonstrates the use of speech recognition, search bar features, and the use of multiple language keyboards.
Pictures 2-
Gboard Speech Recognition on iPhone (left); Gboard Search Bar on iPhone (middle) and Installed Languages on Gboard on iPhone (right).

GOOGLE VOICE TYPING: LAPTOP AND DESKTOP

When you are not on your mobile and potentially need longer sentences and paragraphs, Google Voice Typing is available in Google Docs. Google Voice Typing can be found under the “Tools” menu bar under “voice typing.” Next you’ll have to specify the language and give permissions to allow the microphone to listen to you speak. When you enable Voice Typing, you’ll see a large red microphone working on your speech by listening to words in contexts (see Picture 5). Voice Typing corrects itself as it recognizes words and puts together whether words are possessives or adverbs, etc. Most of the time the words that the voice typing will get right are regular words like verbs and adjectives, not more difficult content words like the names of places or people. For example, it is not yet sophisticated enough to learn my last name, but it may get more common words and more recognizable names such as Sotomayor, Mother Teresa and Beyoncé. Another limitation is that this tool requires significant working memory of your computer. If it is pausing frequently or stops, simply refresh the page or the speech recognizer through the red icon that appears on the page (see Picture 5).

There are limitations to what this software can recognize, but when the voice typing fails, there is always the option to type manually. Using it in hybrid ways is clearly the most effective use of the tool.

Screenshot that demonstrates Google Voice Typing in Google Docs.
Picture 3: Google Voice Typing Example.

What seems almost magical is the speed, the accuracy, and the direct link between your voice and the written words on the screen. Only a few years ago, this type of technology cost hundreds of dollars per user. Now, it’s free with a Google account to those with access to technology.

EDUCATIONAL PURPOSES: HOW MIGHT THIS BE USED IN THE CLASSROOM

Speech technology in language learner settings is relatively new concept and it isn’t always straightforward how to best integrate it into homework and in-class time. In my experience, the best ways are task-based assignments out of class that inform discussions and content we are covering. Learning about modals and polite dialogues? I send students in pairs to write, speak, practice, and deliver short dialogues with technology in hand.

In my research about student experiences with speech technology, language learners talk about how different it is to copy and paste answers versus using speech technology tools. For example, here is an excerpt from one of my participants comparing the use of speech technology to traditional homework:

“I feel like that’s a really good way to do it because I feel like a lot of homework, you just end up going to Google Translate and being like “okay and there’s my answer” and or just like copying from the book or whatever. And [with speech technology] you keep doing it until you get it right. I feel like that’s a lot more helpful.”

Instead of a relatively passive experience of copy-paste, speech technology can be a way to engage the students through more interactive, active, vocal means. For the student above, their experience with speech technology helped them to practice repetition and with meaningful language usage.

LESSON PLAN IDEA WITH GOOGLE VOICE TYPING AND GBOARD

The ability to use this technology for language learning and usage purposes is almost limitless. Here are some concrete, task-based examples to spark the imagination of integrating speech technology into your classes and the language experiences of your students:

  • Texting and Messaging: Ask students to enable Gboard on their phones. For the span of a week, create exercises where they must text each other or message each other by using the voice activated messaging in the language being learned. Ask them to notice and identify when the speech recognition writes out different words from what they thought they said or the modifications in their voices they need to make to be understood by the software.
  • Minimal Pairs: A favorite of pronunciation practice material, some students appreciate getting lists of words that have small differences in vowels or consonants to be able to practice those contrasts. For English, some examples are “Beat” versus “Bit” and “Sheet” versus “Cheat.” By making this a game, the speech recognition can be a way to get immediate feedback on their pronunciation.
  • Speeches: Ask students to create speeches and longer spoken material that they can practice by using the Google voice recognition.
  • Register and Genre: Ask students to speak material from different genres such as poems, plays, song lyrics, emails, etc. while using Voice Typing. Ask them to reflect about their experience and how the speech recognition matched or did not match the spoken material they were reading.
  • Tongue Twisters: Give students tongue twisters in different languages and ask them to practice by using Gboard or Google Voice Typing.
  • Essays and Paragraphs: Get students to write a paragraph by using Google Voice Typing. Pair this with a lesson about formulating paragraphs and essay writing. Ask students how dictating using voice typing is different from regular manual typing.
  • Dialogues: Pair students in small groups and ask them to create dialogues by using Google Voice Typing.

Hopefully, this list of ideas gives you food for thought about how you could incorporate these tools into your lives and those of your students. I know when I was learning languages more seriously I would have loved these resources. Many times we do not get enough practice in the language classroom speaking and getting instantaneous feedback.

OUT OF YOUR COMFORT ZONE? DIGITAL LITERACIES AND LANGUAGE SKILLS FOR TODAY

Feeling some hesitation? Does using these type of tools feel out of your comfort zone? For those who might feel anxiety and trepidation about the usage of speech technology in education settings, let’s consider some of these potential concerns. Would using speech technology detract from writing and literacy skills? If they are using speech technology while they write, is that still writing skills?

Let’s talk about digital literacies and the goal of language education. Today in 2018, I want to empower my learners to be active participants in their language communities. They need to be able to speak, listen, read, write, type, and convey their thoughts effectively in the new language. This includes being able to type and use technology. This is part of their overall literacy. Digital literacies include the ability to read and create emails, blogs, text messages, social media posts, memes etc. Your students might learn new language strategies by using voice typing in conjunction with their writing assignments. Is this skill set identical to those found when you use paper and pen to write? Nope. But they might be more in line with how students are interacting with language communities in the real world today.

Here’s an example of the importance of digital literacies. I moved to France in 2013 and had just accepted a new job. I had some questions and needed to send an email to my new boss. I had been learning French for 11 years. I had a French BA. I could read Proust and Madame de la Fayette and write persuasive essays. I could debate fluently in French.

But I could not write an email. Never in my life had I written a professional email in French and had no idea where to begin. What was the correct greeting? Would I need to use “vous” or “tu”? Should I make sure to use the subjunctive or conditional to express the level of formality and deference required between an employee and a boss? How should I sign off?

It was embarrassing that I had absolutely no clue! I did not want to be mistaken as being rude. A faux pas could have been potentially detrimental to my career. It eventually worked out, but it was a nerve-wracking time.

I wish my instructors and professors during those 11 years of study had been more open new approaches for learning. That they had been open to a wider definition of literacy that included digital platforms, genres, and registers. Knowledge of a language is the ability to use it in contexts with others in the language community.

Now, not all language students might need to learn how to write a professional email. However, this does not make skills of digital literacy any less valuable. The ability to convey thoughts digitally is exceptionally important as learners participate in a broadening digital landscape.

FINAL NOTES

Lastly, I will leave you with the fact that most of this article was “written” using Google Voice Typing. This takes the capacities of Google Docs to another level. Consider how this may change your writing style, how it may broaden your knowledge of digital genres, and how future technology may blur the lines more and more of typing, dictating, and speaking. Some theorize that these technologies will make manual typing obsolete. I know many people who can’t wait for this change. Although there are still some features that need improvement, the tools are at a great first stage.

It is a wonderful time for all of us to practice speaking and understanding how technology can better help us to communicate efficiently and effectively in many languages.

Note: For more research and expanded literature review of contemporary speech technology, look for an upcoming Cambridge University Press, Online Language Learning Research Network (OLLReN) publication, Bajorek, 2018: “Speech Technology for Language Learning: Research & Today’s Tools.”

 

REFERENCES

Bajorek, J. P. (2017). L2 Pronunciation Tools: The Unrealized Potential of Prominent Computer-assisted Language Learning Software. Issues and Trends in Educational Technology, 5(2), 60-87. https://journals.uair.arizona.edu/index.php/itet/article/view/20140/21378

Burnistoun (2011). Scottish Elevator With Voice Recognition (with subtitles). https://www.youtube.com/watch?v=BOUTfUmI8vs

Ellis, N., & Bogart, P. (2007). Speech and Language Technology in Education: the perspective from SLA research and practice. Paper presented at the Proceedings ISCA ITRW SLaTE, Farmington PA.

Google Play Store (2018). Gboard. https://play.google.com/store/apps/details?id=com.google.android.inputmethod.latin&hl=en

Hannun, Awni (2017). Speech Recognition Is Not Solved. https://awni.github.io/speech-recognition/#fn:data_details

Kudryavtsev, A. (2016). Automatic Speech Recognition Services Comparison. Retrieved from http://blog-archive.griddynamics.com/2016/01/automatic-speech-recognition-services.html

Malcolm, R. (2018). Google Duplex proves human language is the only API that matters. AI. Retrieved from https://venturebeat.com/2018/06/13/google-duplex-proves-human-language-is-the-only-api-that-matters/

McMillian, G. (2011). It’s Not You, It’s It: Voice Recognition Doesn’t Recognize Women. Tech Land. Retrieved from http://techland.time.com/2011/06/01/its-not-you-its-it-voicerecognition-doesnt-recognize-women/

Novet, J. (2015). Google says its speech recognition technology now has only an 8% word error rate. Big Data. Retrieved from https://venturebeat.com/2015/05/28/google-says-its-speech-recognition-technology-now-has-only-an-8-word-error-rate/

Ong, T. (2017). Google now recognizes 119 languages for voice-to-text dictation. Tech. Retrieved from https://www.theverge.com/2017/8/14/16142786/google-recognises-119-languages-dictation-voice-typing

Tatman, R. (2017). Gender and Dialect Bias in YouTube’s Automatic Captions. EACL 2017, 53. http://www.ethicsinnlp.org/workshop/pdf/EthNLP06.pdf

2 thoughts on “Google Speech Technology: Gboard & Voice Typing

  • Are you certain that Google’s Gboard, when used on an iPhone, is actually using Google voice recognition and not Siri? When I tried Gboard a couple of years ago on my iPhone, I didn’t think that the voice recognition was any better than Siri’s. It certainly didn’t seem as good as voice recognition on my Google iPhone app.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *