The Effects of Homography on Computer-generated High Frequency Word Lists The Effects of Homography on Computer-generated High Frequency Word Lists
Ming-Tzu and Nation (2004) did some research on the Academic Word List (AWL) that addresses some criticisms of word-frequency lists. They evaluate the extent of homography throughout the AWL. However, words found in the AWL are often not a part of the highest frequency word-forms in English.
The present study focuses on high frequency words. It evaluates a randomized sample of 46 lemmas that occur at least 1500 times in the British National Corpus (BNC).
A further random sampling of 200 examples for each lemma, in context, was semantically analyzed and tallied. One hundred of these examples were from the written portion and the other 100 from the spoken portion. The list of meanings for each word was compiled using conflated WordNet senses and some additional senses. Each context was double and sometimes triple rated. The results indicate that the impact of semantic frequency versus form-based frequency is considerable. The study suggests that the presence of homography tends to be extensive in many high-frequency word forms, across major registers of the language, and within each of the four major parts of speech. It further suggests that basing frequency on semantics will considerably alter the content of a high-frequency word list.
School:Brigham Young University
School Location:USA - Utah
Source Type:Master's Thesis
Keywords:homography word lists high frequency vocabulary esl text coverage written vs spoken
Date of Publication:11/13/2008