Wednesday, August 2, 2017

Using Corpus Linguistics

TamaraJonesTamara Jones is an ESL Instructor at Howard Community College, Columbia, Maryland

If you’ve been reading this blog for a while, you might know that I am a conference junkie. While there are plenty of great local and national conferences for me to choose from, I love going to the TESOL International Convention the best. I love flipping through the program, attending the sessions, wandering through the publishers’ exhibits, and seeing colleagues from all over the world. Every year, I try to choose to attend at least one session on a topic about which I know absolutely nothing. Sometimes I don’t even know the key words in the description.

Corpus What?

This was the case many years ago, at my first ever TESOL Conference, when I attended a session on Corpus Linguistics. The speaker was Victoria Clark, and at the risk of being overly dramatic, it was life changing. Or, at least, it was work changing. She talked about how text books (back in those days, anyways) rarely contained language that reflected how people really use language. She gave the example of the most basic and common of turns, “Thank you.” and “You’re welcome.” Nothing too controversial, right? Except, when we use Corpus Linguistics research to analyze what we actually say in response to “Thank you”, we learn that we are more likely to say things like “No problem.” “Have a good day.” and “Sure.” In fact, “You’re welcome.” is really low on the list, even below “no response”! As I walked out of the session, I resolved to start to think more critically about language and whether or not what I think I say is actually what I say.

Resources for Teachers and Students

Basically, Corpus Linguistics is the study of language by analyzing a large database of naturally occurring language. The good news is that the actual collection of data has been done for us. There are wonderful existing collections of language. The Michigan Corpus of Academic Spoken English (MICASE), for example, is an excellent collection of spoken language from transcripts of academic speech events that were recorded at the University of Michigan. The Corpus of Contemporary American English (COCA) is a 520,000,000 word corpus of written and spoken language from TV, radio, fiction, popular magazines, newspapers, and academic texts. The Academic Corpus was developed in New Zealand and it contains approximately 3,500,000 words from four subject areas.

Linguists and teachers can use concordancers to sift through the collections and highlight patterns in language. In other words, you can use a concordancer to find out what words are most common before and after a given word. This can come in handy when students ask those really tough questions to which we are inclined to respond, “We just say it that way.” For instance, sometimes words like “become” and “turn” are virtually synonymous. The only real difference is the words that come before or after.

For instance, if a student wants to know when we use “become” and when we use “turn,” you can go online to the corpus of your choice, say COCA, type in the two words that you want to compare, hit enter and watch the magic happen.


Practical Suggestions

You might be thinking, “Well, this is all well and good, but aside from seeing words in about a billion different contexts, how useful is this for me?” My supervisor presented on this very topic at an in-house PD session she presented a few years ago (Woo, 2012) and she had several good suggestions.

Comparing Similar Words – Students are justifiably confused about the many near-synonyms in English. For instance, what’s the difference between “request” and “ask”? What about “hope” and “wish”? They have pretty much the same meaning, really. But, the words that precede and follow them are different. You can ask a favor, but we don’t really request a favor. You can make a request, but you can’t make an ask. Teachers can use a concordancer to provide students with a list of words that are commonly associated with synonyms. Now, I probably wouldn’t send students off with an internet link and expect them to discern the patterns themselves, but I might refer to a corpus to create a lists of the most common phrases containing each of the words, so students can compare them.

Mastering Collocations – This is a biggie for intermediate students. As I detailed in a previous blog, The Flat Bits in the Middle, Richards (2008) argues that one of the major hurdles for students who want to be fluent speakers of English is mastering collocations. Sometimes words just go together in English. An example of a collocation is “winding road”. We don’t usually say “twisty road” or “curly road,” though our students might. To help students learn these challenging collocations and set phrases, teachers can use the concordancer results to compile a list of common collocations for students to learn when they learn vocabulary words. A few years after my eye opening experience at Clark’s (2001) TESOL presentation, I attended a great session at an IATEFL conference in which the speaker, Ken Lackman, described some fun games to help students learn collocations (Lackman, 2010). Rather than describe them here, I suggest, if you are interested in helping students learn collocations, you check out his wonderful resource, Classroom Games from Corpora.

In spite of these wonderful resources and teaching ideas, however, I’ll admit that I am a bit intimidated about the prospect of using Corpus Linguistics in my classroom. It just seems too labor intensive to create all these materials myself. So, that’s why I am very excited that ESL authors are increasingly providing corpus-based vocabulary lists and activities in their books.

Clark, V. (2001). Corpus Linguistics for Teachers and Material Writers. Paper presented at the 35th TESOL International Convention and English Language Expo, St. Louis, MI.
Richards, J. (2008). Moving Beyond the Plateau: From Intermediate to Advanced Levels in Language Learning. Cambridge, UK: Cambridge University Press.
Lackman, K. (2010). Classroom Games from Corpora. Paper presented at the 44th Annual International IATEL Conference and Exhibition, Harrogate, UK.
Woo, M. (2012). Using Corpus Linguistics in the Classroom. [PowerPoint Presentation].

Leave a comment on this post