Corpora Linguistics: Opening the Doors to Creative
Language Teaching
People learn new languages for a variety of reasons. Maria moved to Paris to start a new life and needed
to learn French. Mark wanted to get a better job in his hometown in Houston and decided to become bilingual in order to do so. Foreign language was
required for Jennifer's degree curriculum so she took Spanish classes. Whatever
the reason, people who learn languages need to be able to read, write, speak, and understand the target language. To achieve this they must be familiar with the vocabulary prevalent in the environments in which they will
be engaged. Multimedia in the form of corpora, and specifically TalkBank, may
be able to help learners accomplish their language learning goals.
The contexts in which language is used are wide-ranging and variable. A
learner may need communicative vocabulary in order to interact with a doctor, or ESP (English for Special Purposes) for academic
or career goals. Within the realm of ESP, the set of vocabulary needed by a chemist
will differ from that of a mechanic. Obviously these two learners will have different
motives for learning the target language. In addition to the difference in field
terminology, a mechanic may need to be more comfortable with spoken than written language since he works closely with others. Whereas a chemist, who works primarily alone and with written data, needs to have
a comfortable working knowledge of written words and phrases related to his field. To
complicate things further, vocabulary is also affected by dialect. Therefore
the region in which a learner will be living or working is another important factor to keep in mind when selecting vocabulary
words.
How
and where do teachers and learners find modern lists of vocabulary words that are used by real people in their target fields
and regions? Much research is being done with corpora linguistics: bodies of
text grouped together according to topic or source. There are many aspects of
corpora that make them useful for language learners and teachers.
The
fact that they are online makes them available to anyone who has Internet access. Concordancers
are useful tools associated with corpora. Their product, concordances, are the
output of a computer program used with online corpora to pull collocations associated with any given word in a text contained
in the corpus. Collocations are words that co-occur together (Schmitt). For example, the collocations for the word her in the nursery rhyme Little Miss Muffet,
are eating her curds and whey and sat down beside her. If this search were run
through a concordancer, the word her would be listed down the center of the page with all of the occurring collocations from
the selected text(s) preceding and following the word on each line.
There
are biological, psychological, and pedagogical reasons for learning vocabulary in the collocational form as opposed to learning
single-word items. First, the brain stores items most effectively in chunks,
or bits of information stored together. For example, we know that numbers are
remembered best in three and four digit chunks. Hence the structure of American
telephone numbers. Words can be learned in a similar manner. We remember lexical chunks such as "red balloon," "happily ever after," and "wit's end," for example.
A large portion of our conversational speech is organized in chunks. We often use clichés such as "Make yourself comfortable." On
a daily basis we say things like "Have a nice day." "See you later." and "Send her my best."
Although there is grammatical structure present in these phrases, it is apparent that learning them as multiple-word
units that contain meaning as a whole can lead to effective communication.
According
to Paul Nation, learning whole chunks lightens the learning burden, or the level of effort needed to completely learn something. Since the brain stores things most efficiently in chunks, it makes sense to learn
a language in the chunks in which they are used. This is a bit controversial,
because if teachers teach chunks, there's some question as to whether learners will develop the ability to grammatically manipulate
the language freely. In addition, there is debate about whether non-native speakers
can make the same associations through chunks as native speakers without the same cultural background. However, as we'll see later, much research has been performed that proves chunking a successful approach
to language learning. First, let us take a closer look at corpora.
In
the past, texts were entered into corpora by hand, which was time consuming and put a limit on the length and variety of texts
electronically available. Now, the use of modern technology makes it possible
to create corpora from a limitless variety of texts. Researchers and writers
in the Linguistics and TESOL (Teachers of English to Speakers of Other Languages) fields agree that the research and use of
electronic data and online corpora are making multimedia in language learning a hot topic in the field of TESOL, and resources
for teachers and learners are more attainable than ever before.
In his article, "Computer-Assisted Language Learning: Current Programs and
Projects," Chris Higgins reviews other types of CALL (Computer-Assisted Language Learning) that are being used in language
classrooms. There are software programs that simulate real-life experiences,
such as "Where in the World is Carmen Sandiego?" Teachers can also use
authoring programs to create their own interactive language materials for students.
Also available are LANs (Local Area Networks) that make intra-classroom communication possible. The use of corpora is relatively new compared to the aforementioned options.
For the purpose of this paper, I will examine TalkBank, a new online research project
funded by the National Science Foundation and founded by Brian McWhinney of Carnegie Mellon
University and the University of Pennsylvania. The site is still very new but growing rapidly and increasing in diversity of
content literally every day. The TalkBank website covers a wide range of linguistically
related resources. For language learners, however, I am concerned predominantly
with two sections in the site. They are titled "Classroom Discourse" and "Conversations."
"Classroom Discourse" is a corpus of written texts for academic purposes. Users can read texts written by native speakers on a variety of topics. For example, if a non-native English speaking Zoology student wants to study the vocabulary commonly used
in her field she can peruse a lesson on camels in this section of TalkBank. She
can read the transcripts and even watch movies.
It is beneficial for learners to listen and read discourse spoken by native
speakers, according to Alejandro Curado Fuentes. He suggests activities such
as having learners study the concordances and construct phrases similar to those they find in corpora. They can also analyze the parts of speech used and break down the grammatical structures of phrases. Nation agrees that this type of exercise can help in learning how to manipulate lexical
data and incorporate the vocabulary at hand into their existing framework of the target language.
The "Conversations" section of TalkBank contains a sub-section called "CallFriend"
which is a collection of spoken material. The TalkBank administrators received
permission from individuals all over the world to allow them to record their telephone conversations. The conversations are stored in this section of the TalkBank website and are easy to access. They are spoken by people who are related in a variety of ways and cover a wide range of topics. For example, in a file in the Southern
U.S. directory you can find a conversation between a father
and son discussing deer hunting. In Pennsylvania, two
old friends catch up with each other's lives. The speakers are English, Japanese,
and Spanish and are divided into dialect and region. A user can listen to English
conversations spoken by native speakers from every region of the U.S. This type of corpora can be very useful for a learner who is preparing to move to
a specific area.
Multimedia resources such as those made available by TalkBank are practical
for a language learner's phonological development of vocabulary. Non-native speakers
can listen to conversations to increase their listening skills and pronunciation. To
supplement a listening experience, the learner can open the transcript file that corresponds to the conversation to which
she is listening. This will allow her to match the appearance of a word to its
sound.
There
are many other facets to the "Conversations" section of TalkBank. They are too
many to list, and are rapidly increasing so I will just mention a few. In the
"Nixon" section, you can listen to Nixon and Haldeman discuss details of the Watergate Scandal. "Free Lunch" contains video-recorded interaction between students who were offered free pizza in exchange
for their contribution to linguistic research. "Class Projects" is a substantial compilation of discussions by international
students on a variety of topics in their native languages.
TalkBank includes videos filmed from an array of contexts that can be easily
downloaded and viewed. There are films of student interaction in the classroom
as well as that of indigenous speakers from all over the world. I would assume
that more videos will continue to be added as contributions are made to the TalkBank library.
Videos are helpful to learners because they can simulate real-life listening experiences while giving the user control
of the medium. In other words, they can stop and replay as needed to double-check
the pronunciation of a word.
It is widely agreed by experts in the TESOL community that there is room for
much more research on the use and effectiveness of corpora such as TalkBank. Laura
Gavioli and Guy Aston believe that corpora can indeed be a useful tool for learners if the activities are selected and presented
carefully by the teacher. Just as there are graded readers, there can be graded
corpora, according to Gavioli. This will make it easier for teachers and students
to choose corpora appropriate for the learners' aptitude level and allow them move up as they improve.
Based on an ESP study performed on undergraduate law students, Jean-Jacques
Weber suggests an activity to strengthen academic writing. This is recommended
for advanced speakers. Students extract from a corpus a variety of content-specific
texts written by native speakers. Their agenda while reading should be to identify
the most important elements of a well-written essay. Once they've done so, they
meet in groups to share and compare their ideas. They then come up with a collaborative
list of essentials that represent the group's findings. Students are to then
re-read the same articles and pick out the previously identified qualities. Weber
notes that after completing the assignment, the increase in the students' self-confidence made noticeable improvements in
their own essays.
Fuentes emphasizes the usefulness of corpora and concordancing in distinguishing
the difference between technical and academic vocabulary. It is important for
university-level learners to be able to distinguish between the two. According
to Nation, technical vocabulary refers to a word or definition of a word that is limited to one particular field. Academic vocabulary refers to words such as "essential" and "variable" that are common to a variety of
academic topics. Fuentes suggests assignments that encourage students to compare
phrases that contain technical vocabulary with that of academic vocabulary. Selected
words and phrases can also be entered into a concordancer. The collocations will
help students put their meanings into perspective. Analyzing the collocations
of technical and academic vocabulary side-by-side will make the gap between the two more visible.
Some linguists and writers question the use of corpora in language teaching
and learning. John Rosenthal brings into view the terms prescriptive and descriptive. Prescriptivists are people who abide firmly by the laws of grammar in all situations. A prescriptivist might respond to the question, "How are you?" with "I'm well, thanks. And you?" Descriptivists, on the other
hand would be comfortable with the reply, "I'm good, thanks. And you?" while
a prescriptivist would probably cringe if within earshot.
This brings an interesting spin to the use of corpora in language teaching. Should teachers produce students who speak and write using grammar that is accurate
and by-the-book grammar, or should they have access to the comfortable and casual discourse that native speakers throw around? This question challenges us to take a closer look at language and what we consider
acceptable. Traditionally, a word doesn't exist if it's not in "the dictionary." Corpora are essentially challenging the role that the dictionary has played in our
academic history.
Let's look at the sentence, "I'm gonna ride my bike today." Were it spoken instead of read from this page, this mutation of "going to" would probably not even be noticed
because it is so commonly used and understood by speakers and listeners. Should
"gonna" therefore be taught in English classes? On the other hand,
if a word is in a corpus just because it was said or written by one person, is that enough to make it qualify as an appropriate
component of a language system?
Guy Cook says, "Computer corpora can never be more than a contribution to our
understanding of effective language teaching." He believes that corpora simply
record what was said, not what is being said in the moment. Cook also says that
no matter how large a corpus is it can never represent the vast cultural history behind most lexical chunks that gives them
meaning for native speakers. According to this way of thinking, the use
of collocations is not practical for non-native speakers.
However, Cook reminds us that not all learners need to acquire native-like
proficiency of the language. Sometimes learners just need to know enough to get
by in certain situations. In such a case, the study of corpora-based collocational
chunks may provide a decent working knowledge of English. In this context, corpora
may help to mainstream a learner's use of the language so that they sound more like native speakers. For example, a learner can become familiar with Im gonna go to the store, without having to become too
involved in the grammatical implications involved.
Susan Conrad argues that there are strong lexicogrammatical connections commonly
made in our everyday use of language. She claims that the repeated use of certain
linking adverbials proves this. Corpora provide hard data that confirms the use
of lexical and grammatical associations. Such progress made through corpora research
can change the way we view grammar, and thus, the way it is taught in the classroom.
An interesting point Conrad makes is that corpora can be a useful tool in the
teaching of register. Register refers to the subtle differences that make a word
appropriate or inappropriate depending upon the context in which it is used. For
example, a fat hamburger is usually a good thing. But to tell someone that they
look fat is generally inappropriate in American society. Register can be a tricky
detail for non-native speakers to grasp. Learners can study collocations to see
how certain words are used in a variety of situations, thus teaching them aptness of word use in context.
It seems evident that the use of corpora in language teaching is here to stay. As mentioned earlier, much research continues to be made in its application to pedagogy. In addition, the abundance of online corpora available, not to mention the rapid growth
of TalkBank, confirms the notion that corpora linguistics are on their way to revolutionizing traditional classroom teaching
methods.
We are at the beginning of the corpora era, and as of now, there are many corpora
that are commonly known and used in the world of linguistics and language teaching.
Resources in the form of corpora can be found in virtually any field one wishes to study. For example, the Rutgers Optimality Archive is an archive of research papers on the Optimality Theory. Or an interested party can visit a site on Animal Communication Data and listen to
banks of sound files containing the songs of zebra finches. The opportunities
are limitless.
The most commonly known and used online corpora, however, are broader in spectrum. The Bank of English is an example of a well-established corpus. It contains over half a million words from hundreds of different sources.
Users can look up any word from "banana" to "zealous." Some other popular
corpora are the American National Corpus, the Michigan Corpus of Academic Spoken English, and the International Corpus of
English, to name a few. Corpora such as these and the many others available,
including TalkBank, provide a strong backbone for the future of corpora in the classroom.
What will it take to transform corpora research into classroom application? According to Conrad, the findings first need to be presented in a way
in which teachers can practically apply them. Online resources must be easily
navigable. Teachers will need to be able to consult them for grammatical and
lexical trends. We should be able to easily use the texts available on corpora
sites as well as concordancers.
Of equal importance, new teachers need to be educated about corpora. We need to know not only how to use them but how to apply what we learn from them into knowledge of trends
in grammar and vocabulary. There is more to corpora than frequency findings and
concordances. Teachers should be challenged to experiment with the data available
and extend it to serve the needs of learners. We also ought to be able to expertly
determine what should and what shouldn't be transmitted to our learners. Next,
we need to know how to convert it all into activities and materials that can be brought into the classroom.
A
major battle will be getting existing teachers to change the way they view vocabulary and grammar teaching and tossing out
their old grammar books, or at least changing the way they use them. Perhaps
the inundation of new textbooks that incorporate and encourage the use of corpora in the classroom will help make an impact.
It
is an exciting time in the world of language teaching. Teachers are being urged
to free themselves from the chains of previously conceived pedagogical laws. There
seems to be a subtle push for teachers to become more creative in an informed way. Corpora
will open new doors to the manner in which teachers not only approach but also perceive language. As long as we continue to challenge the way we think about language, we will encourage the same in our
learners.