Aubree Evans' MATL Portfolio

Knowledge of Human Development and Learning Artifact 1
Home
Resume
Philosophy
Biographical Portrait
Knowledge of Subject Matter
Knowledge of Human Development and Learning
Adapting Instruction for Individual Needs
Multiple Instructional Strategies
Classroom Motivation and Management
Communication Skills
Instructional Planning Skills
Assessment of Student Learning
Professional Commitment and Responsibility
Partnerships
Upon Reflection

 Corpora Linguistics: Opening the Doors to Creative Language Teaching

 

            People learn new languages for a variety of reasons.  Maria moved to Paris to start a new life and needed to learn French.  Mark wanted to get a better job in his hometown in Houston and decided to become bilingual in order to do so.  Foreign language was required for Jennifer's degree curriculum so she took Spanish classes.  Whatever the reason, people who learn languages need to be able to read, write, speak, and understand the target language.  To achieve this they must be familiar with the vocabulary prevalent in the environments in which they will be engaged.  Multimedia in the form of corpora, and specifically TalkBank, may be able to help learners accomplish their language learning goals.

 

            The contexts in which language is used are wide-ranging and variable.  A learner may need communicative vocabulary in order to interact with a doctor, or ESP (English for Special Purposes) for academic or career goals.  Within the realm of ESP, the set of vocabulary needed by a chemist will differ from that of a mechanic.  Obviously these two learners will have different motives for learning the target language.  In addition to the difference in field terminology, a mechanic may need to be more comfortable with spoken than written language since he works closely with others.  Whereas a chemist, who works primarily alone and with written data, needs to have a comfortable working knowledge of written words and phrases related to his field.  To complicate things further, vocabulary is also affected by dialect.  Therefore the region in which a learner will be living or working is another important factor to keep in mind when selecting vocabulary words.

 

            How and where do teachers and learners find modern lists of vocabulary words that are used by real people in their target fields and regions?  Much research is being done with corpora linguistics: bodies of text grouped together according to topic or source.  There are many aspects of corpora that make them useful for language learners and teachers. 

 

            The fact that they are online makes them available to anyone who has Internet access.  Concordancers are useful tools associated with corpora.  Their product, concordances, are the output of a computer program used with online corpora to pull collocations associated with any given word in a text contained in the corpus.  Collocations are words that co-occur together (Schmitt).  For example, the collocations for the word her in the nursery rhyme Little Miss Muffet, are eating her curds and whey and sat down beside her.  If this search were run through a concordancer, the word her would be listed down the center of the page with all of the occurring collocations from the selected text(s) preceding and following the word on each line.

 

            There are biological, psychological, and pedagogical reasons for learning vocabulary in the collocational form as opposed to learning single-word items.  First, the brain stores items most effectively in chunks, or bits of information stored together.  For example, we know that numbers are remembered best in three and four digit chunks.  Hence the structure of American telephone numbers.  Words can be learned in a similar manner.  We remember lexical chunks such as "red balloon," "happily ever after," and "wit's end," for example. 

 

A large portion of our conversational speech is organized in chunks.  We often use clichés such as "Make yourself comfortable."  On a daily basis we say things like "Have a nice day." "See you later." and "Send her my best."  Although there is grammatical structure present in these phrases, it is apparent that learning them as multiple-word units that contain meaning as a whole can lead to effective communication.    

 

            According to Paul Nation, learning whole chunks lightens the learning burden, or the level of effort needed to completely learn something.  Since the brain stores things most efficiently in chunks, it makes sense to learn a language in the chunks in which they are used.  This is a bit controversial, because if teachers teach chunks, there's some question as to whether learners will develop the ability to grammatically manipulate the language freely.  In addition, there is debate about whether non-native speakers can make the same associations through chunks as native speakers without the same cultural background.   However, as we'll see later, much research has been performed that proves chunking a successful approach to language learning.  First, let us take a closer look at corpora.

 

            In the past, texts were entered into corpora by hand, which was time consuming and put a limit on the length and variety of texts electronically available.  Now, the use of modern technology makes it possible to create corpora from a limitless variety of texts.  Researchers and writers in the Linguistics and TESOL (Teachers of English to Speakers of Other Languages) fields agree that the research and use of electronic data and online corpora are making multimedia in language learning a hot topic in the field of TESOL, and resources for teachers and learners are more attainable than ever before. 

 

In his article, "Computer-Assisted Language Learning: Current Programs and Projects," Chris Higgins reviews other types of CALL (Computer-Assisted Language Learning) that are being used in language classrooms.  There are software programs that simulate real-life experiences, such as "Where in the World is Carmen Sandiego?"   Teachers can also use authoring programs to create their own interactive language materials for students.  Also available are LANs (Local Area Networks) that make intra-classroom communication possible.  The use of corpora is relatively new compared to the aforementioned options.

 

For the purpose of this paper, I will examine TalkBank, a new online research project funded by the National Science Foundation and founded by Brian McWhinney of Carnegie Mellon University and the University of Pennsylvania.   The site is still very new but growing rapidly and increasing in diversity of content literally every day.  The TalkBank website covers a wide range of linguistically related resources.  For language learners, however, I am concerned predominantly with two sections in the site.  They are titled "Classroom Discourse" and "Conversations."

 

"Classroom Discourse" is a corpus of written texts for academic purposes.  Users can read texts written by native speakers on a variety of topics.  For example, if a non-native English speaking Zoology student wants to study the vocabulary commonly used in her field she can peruse a lesson on camels in this section of TalkBank.  She can read the transcripts and even watch movies. 

 

It is beneficial for learners to listen and read discourse spoken by native speakers, according to Alejandro Curado Fuentes.  He suggests activities such as having learners study the concordances and construct phrases similar to those they find in corpora.  They can also analyze the parts of speech used and break down the grammatical structures of phrases.  Nation agrees that this type of exercise can help in learning how to manipulate lexical data and incorporate the vocabulary at hand into their existing framework of the target language. 

 

The "Conversations" section of TalkBank contains a sub-section called "CallFriend" which is a collection of spoken material.  The TalkBank administrators received permission from individuals all over the world to allow them to record their telephone conversations.  The conversations are stored in this section of the TalkBank website and are easy to access.  They are spoken by people who are related in a variety of ways and cover a wide range of topics.  For example, in a file in the Southern U.S. directory you can find a conversation between a father and son discussing deer hunting.  In Pennsylvania, two old friends catch up with each other's lives.  The speakers are English, Japanese, and Spanish and are divided into dialect and region.  A user can listen to English conversations spoken by native speakers from every region of the U.S.  This type of corpora can be very useful for a learner who is preparing to move to a specific area.  

 

Multimedia resources such as those made available by TalkBank are practical for a language learner's phonological development of vocabulary.  Non-native speakers can listen to conversations to increase their listening skills and pronunciation.  To supplement a listening experience, the learner can open the transcript file that corresponds to the conversation to which she is listening.  This will allow her to match the appearance of a word to its sound.  

 

            There are many other facets to the "Conversations" section of TalkBank.  They are too many to list, and are rapidly increasing so I will just mention a few.  In the "Nixon" section, you can listen to Nixon and Haldeman discuss details of the Watergate Scandal.  "Free Lunch" contains video-recorded interaction between students who were offered free pizza in exchange for their contribution to linguistic research. "Class Projects" is a substantial compilation of discussions by international students on a variety of topics in their native languages.  

 

TalkBank includes videos filmed from an array of contexts that can be easily downloaded and viewed.  There are films of student interaction in the classroom as well as that of indigenous speakers from all over the world.  I would assume that more videos will continue to be added as contributions are made to the TalkBank library.  Videos are helpful to learners because they can simulate real-life listening experiences while giving the user control of the medium.  In other words, they can stop and replay as needed to double-check the pronunciation of a word.

 

It is widely agreed by experts in the TESOL community that there is room for much more research on the use and effectiveness of corpora such as TalkBank.  Laura Gavioli and Guy Aston believe that corpora can indeed be a useful tool for learners if the activities are selected and presented carefully by the teacher.  Just as there are graded readers, there can be graded corpora, according to Gavioli.  This will make it easier for teachers and students to choose corpora appropriate for the learners' aptitude level and allow them move up as they improve.

 

Based on an ESP study performed on undergraduate law students, Jean-Jacques Weber suggests an activity to strengthen academic writing.  This is recommended for advanced speakers.  Students extract from a corpus a variety of content-specific texts written by native speakers.  Their agenda while reading should be to identify the most important elements of a well-written essay.  Once they've done so, they meet in groups to share and compare their ideas.  They then come up with a collaborative list of essentials that represent the group's findings.  Students are to then re-read the same articles and pick out the previously identified qualities.  Weber notes that after completing the assignment, the increase in the students' self-confidence made noticeable improvements in their own essays.

 

Fuentes emphasizes the usefulness of corpora and concordancing in distinguishing the difference between technical and academic vocabulary.  It is important for university-level learners to be able to distinguish between the two.  According to Nation, technical vocabulary refers to a word or definition of a word that is limited to one particular field.  Academic vocabulary refers to words such as "essential" and "variable" that are common to a variety of academic topics.  Fuentes suggests assignments that encourage students to compare phrases that contain technical vocabulary with that of academic vocabulary.  Selected words and phrases can also be entered into a concordancer.  The collocations will help students put their meanings into perspective.  Analyzing the collocations of technical and academic vocabulary side-by-side will make the gap between the two more visible. 

 

Some linguists and writers question the use of corpora in language teaching and learning.  John Rosenthal brings into view the terms prescriptive and descriptive.  Prescriptivists are people who abide firmly by the laws of grammar in all situations.  A prescriptivist might respond to the question, "How are you?" with "I'm well, thanks.  And you?"  Descriptivists, on the other hand would be comfortable with the reply, "I'm good, thanks.  And you?" while a prescriptivist would probably cringe if within earshot.

 

This brings an interesting spin to the use of corpora in language teaching.  Should teachers produce students who speak and write using grammar that is accurate and by-the-book grammar, or should they have access to the comfortable and casual discourse that native speakers throw around?  This question challenges us to take a closer look at language and what we consider acceptable.  Traditionally, a word doesn't exist if it's not in "the dictionary."  Corpora are essentially challenging the role that the dictionary has played in our academic history. 

 

Let's look at the sentence, "I'm gonna ride my bike today."  Were it spoken instead of read from this page, this mutation of "going to" would probably not even be noticed because it is so commonly used and understood by speakers and listeners.  Should "gonna" therefore be taught in English classes?    On the other hand, if a word is in a corpus just because it was said or written by one person, is that enough to make it qualify as an appropriate component of a language system? 

 

Guy Cook says, "Computer corpora can never be more than a contribution to our understanding of effective language teaching."  He believes that corpora simply record what was said, not what is being said in the moment.  Cook also says that no matter how large a corpus is it can never represent the vast cultural history behind most lexical chunks that gives them meaning for native speakers.   According to this way of thinking, the use of collocations is not practical for non-native speakers.

 

However, Cook reminds us that not all learners need to acquire native-like proficiency of the language.  Sometimes learners just need to know enough to get by in certain situations.  In such a case, the study of corpora-based collocational chunks may provide a decent working knowledge of English.  In this context, corpora may help to mainstream a learner's use of the language so that they sound more like native speakers.  For example, a learner can become familiar with Im gonna go to the store, without having to become too involved in the grammatical implications involved. 

 

Susan Conrad argues that there are strong lexicogrammatical connections commonly made in our everyday use of language.  She claims that the repeated use of certain linking adverbials proves this.  Corpora provide hard data that confirms the use of lexical and grammatical associations.  Such progress made through corpora research can change the way we view grammar, and thus, the way it is taught in the classroom.

 

An interesting point Conrad makes is that corpora can be a useful tool in the teaching of register.  Register refers to the subtle differences that make a word appropriate or inappropriate depending upon the context in which it is used.  For example, a fat hamburger is usually a good thing.  But to tell someone that they look fat is generally inappropriate in American society.  Register can be a tricky detail for non-native speakers to grasp.  Learners can study collocations to see how certain words are used in a variety of situations, thus teaching them aptness of word use in context. 

 

It seems evident that the use of corpora in language teaching is here to stay.  As mentioned earlier, much research continues to be made in its application to pedagogy.  In addition, the abundance of online corpora available, not to mention the rapid growth of TalkBank, confirms the notion that corpora linguistics are on their way to revolutionizing traditional classroom teaching methods.

 

We are at the beginning of the corpora era, and as of now, there are many corpora that are commonly known and used in the world of linguistics and language teaching.  Resources in the form of corpora can be found in virtually any field one wishes to study.  For example, the Rutgers Optimality Archive is an archive of research papers on the Optimality Theory.  Or an interested party can visit a site on Animal Communication Data and listen to banks of sound files containing the songs of zebra finches.  The opportunities are limitless.   

 

The most commonly known and used online corpora, however, are broader in spectrum.  The Bank of English is an example of a well-established corpus.  It contains over half a million words from hundreds of different sources.  Users can look up any word from "banana" to "zealous."  Some other popular corpora are the American National Corpus, the Michigan Corpus of Academic Spoken English, and the International Corpus of English, to name a few.  Corpora such as these and the many others available, including TalkBank, provide a strong backbone for the future of corpora in the classroom.

 

What will it take to transform corpora research into classroom application?    According to Conrad, the findings first need to be presented in a way in which teachers can practically apply them.  Online resources must be easily navigable.  Teachers will need to be able to consult them for grammatical and lexical trends.  We should be able to easily use the texts available on corpora sites as well as concordancers.

 

Of equal importance, new teachers need to be educated about corpora.  We need to know not only how to use them but how to apply what we learn from them into knowledge of trends in grammar and vocabulary.  There is more to corpora than frequency findings and concordances.  Teachers should be challenged to experiment with the data available and extend it to serve the needs of learners.  We also ought to be able to expertly determine what should and what shouldn't be transmitted to our learners.  Next, we need to know how to convert it all into activities and materials that can be brought into the classroom. 

 

            A major battle will be getting existing teachers to change the way they view vocabulary and grammar teaching and tossing out their old grammar books, or at least changing the way they use them.  Perhaps the inundation of new textbooks that incorporate and encourage the use of corpora in the classroom will help make an impact.  

 

            It is an exciting time in the world of language teaching.  Teachers are being urged to free themselves from the chains of previously conceived pedagogical laws.  There seems to be a subtle push for teachers to become more creative in an informed way.  Corpora will open new doors to the manner in which teachers not only approach but also perceive language.  As long as we continue to challenge the way we think about language, we will encourage the same in our learners.   

Sources

  

Conrad, S.  (2000).  Will Corpus Linguistics Revolutionize Grammar Teaching in the 21st Century?  TESOL Quarterly. (Vol. 34).  548-558.

 

Cook, G.  (2001).  The Uses of Computerized Language Corpora:  A Reply to Ronald Carter.  Innovation in English Language Teaching:  A Reader.  65-69.

 

Fuentes, A.  (2002, September).  Lexical Behaviour in Academic and Technical Copora: Implications for ESP Development.  Language Learning & Technology.  (Vol 5). 

 

Gavioli, L, & Aston, G.  (2001, July).  Enriching Reality: Language Corpora in Language Pedagogy.  ELT Journal.  (Vol . 55).  238-245. 

 

Higgins, C.  (1993).  Computer-Assisted Language Learning: Current Problems and Projects.  ERIC Clearinghouse on Languages and Linguistics Washington, DC.  (ERIC identifier ED 355835). 

 

Nation, P.  (2001).  Learning Vocabulary in Another Language.  New York:  Cambridge University Press.  287-216, 317-343.

 

Rosenthal, J.  (2002, August).  The Way we Live Now: 8-18-02: ON LANGUAGE; Corpus Linguistics.  New York Times. 

 

Schmitt, N.  (2000). Vocabulary in Language Teaching.  New York: Cambridge University Press.  68-95

 

TalkBank is an online resource for linguistic research.  (www.talkbank.org).

 

Weber, J.  (2001, January).  A concordance and genre-informed approach to ESP essay writing.  ELT Journal.  (Vol. 55).  14-20.