Tuesday, December 18, 2018
'A Corpus-Based Analysis of Mixed Code in Hong Kong Speech\r'
'2012 internationalistic Conference on Asian Langu duration touch A Corpus-based Analysis of Mixed Code in Hong Kong Speech John Lee H sum of m atomic number 53ylyiday Centre for goodlyly Applications of Langu season Studies Department of Chinese, Translation and linguistics city University of Hong Kong [email protected] edu. hk Abstractââ¬We present a star-based digest of the utilization of compound code in Hong Kong lecture. From transcriptions of Cant iodinse tv set programs, we identify face oral communication embedded inside Cantonese emitances, and investigate the demands for such(prenominal) code-switching.Among the m some(prenominal) motivations observed in previous question, we found that four alone count on for to a greater extent than than 95% of the occasion of side of meat news ushers in our pitch entropy across literary genres, genders, and hop on crowds. We per acted analyses over much than 60 hours of tinned bringing, resu lting in one of the largest empirical studies to-date on this linguistic phenomenon. Key run-in-code-mixing; slope; dealer linguistics. code-switching; Cantonese; II. PREVIOUS RESEARCH I. INTRODUCTIONWhile Cantonese is the mother spit for the grand majority of the people in Hong Kong, position is similarly spoken by 43% of the macrocosm [1], reflecting the cityââ¬â¢s heritage as a British colony. A advantageously-known feature of the run-in in Hong Kong is code-switching, i. e. , ââ¬Å"the juxtaposition of passages of speech belonging to ii resistent sound-formed systems or sub-systems, inside the analogous exchangeââ¬Â [2]. Specifically, in the case of Hong Kong, the two grammatical systems argon Cantonese and side.The former serves as the ââ¬Ëmatrix dictionââ¬â¢, and the latter as the ââ¬Ëembedded languageââ¬â¢, resulting in Cantonese sentences with slope segments such as ( good example taken from [3]): targetteen heoi3 canteen jam2 caa4 à ¢â¬Ëletââ¬â¢s go to the canteen for lunchââ¬â¢ Here, the position segment contains only one pronounce (ââ¬Ëcanteenââ¬â¢), only in general, it can be a whole cla implement. We ordain exercise the general term ââ¬Ëcode-switchingââ¬â¢ rather than the much detail term ââ¬Ëcode-mixingââ¬â¢, which refers to switching be little the cla usance level, hitherto though or so side segments in our corpus and then contain only one or two terminology (see plank 3).There is already a large body of literature devoted to the view of Cantonese- side of meat code-switching from the theoretical linguistic point of view [3,4,5]. This publisher investigates the motivations stooge the use of mixed code, on the reason of a large dataset of speech canned from telecasting programs. In sectionalisation II, we outline previous research on the motivations of code-switching, and discuss how our investigation complements theirs. In role III, we pick out our methodolog y for corpus construction, in position the design of the taxonomy of code-switching motivations.In Section IV, we present an analysis of these motivations harmonise to genre, gender and age. The premiere major frame ply for classifying codeswitching motivations in Hong Kong consists of two categories: ââ¬Ë usefulââ¬â¢ and ââ¬Ëorientationalââ¬â¢ [6]. Central to this framework is the character between words in ââ¬Ëhigh Cantoneseââ¬â¢ and ââ¬Ë rugged Cantoneseââ¬â¢. In everyday conversations, a speaker sometimes can non find each word from ââ¬Ëlow Cantoneseââ¬â¢ to describe an object, institution or idea (e. g. , ââ¬Ëapplication formââ¬â¢). Using a word from ââ¬Ëhigh Cantoneseââ¬â¢ (e. g. , biu2 gaak3), however, would reasoned too courtly and therefore stylistically inappropriate.In expedient mixing, the speaker resorts to an English word; the mixing is pragmatically motivated. In contrast, orientational mixing is socially motivated. The speaker chooses to use English (e. g. , ââ¬Ëbarbecueââ¬â¢) despite the availability of homogeneous words from both ââ¬Ëlow Cantoneseââ¬â¢ (e. g. , siu2 je5 sik6) and ââ¬Ëhigh Cantoneseââ¬â¢ (e. g. , siu1 haau1), since he perceives the subject matter to be inherently to a greater extent than ââ¬Ëwesternââ¬â¢. This dichotomy has been criticized as overly simplistic, because of the ambiguity in defining lexical and stylistic analogouss among ââ¬Ëlow Cantoneseââ¬â¢, ââ¬Ëhigh Cantoneseââ¬â¢, and English.Instead, a four-way taxonomy is proposed: euphemism, specificity, bilingual punning, and the linguistic rule of preservation [7]. This taxonomy is then further extended, in a airfield of code-switching in text media [8], to include quotations, doubling, identity target, and interjection. These categories depart be explained in detail in Section III. While these variety systems ar comprehensive and well grounded, they do not per se convey any(prenominal) sense of the relative importance or scattering of the various motivations.Our goal is, graduation, to empirically verify the reporting of these motley systems on a large dataset of set down speech; and, second, to give quantitative answers to questions such as: Which kinds of motivations ar the most prominent? Does the range of motivations differ according to the speech genre, or to the speakerââ¬â¢s gender or age? We now cover our attention to the methodology for constructing and annotating a speech corpus for these research purposes. III. DATA A. Source material Our corpus is constructed from television programs broadcast in Hong Kong within the last four years by goggle box Broadcasts Limited (TVB).The programs belong to a variety of genres, including two drama series, three current-affairs destines, a news program, and a whistle manoeuver. The news program, TVB News at Six-Thirty, carries the most formal story, containing mostly pre-planned 165 978-0-7695-4886-9/12 $26. 00 é 2012 IEEE inside 10. 1109/IALP. 2012. 10 speech by the anchor. The current-affairs shows, Tuesday contri merelyeup, sunlight Report and Hong Kong Connection, argon serious in tone but contain spontaneous discussions. The mouth show, My Sweets, is about sustenance and drink.It also contains spontaneous discussions, but the topics tend to be lighter. Although pre-planned, the speech in both drama series, synodic month Resonance and Yes Sir, Sorry Sir, is arguably the least formal in register, designed to reflect natural speech in everyday life. Details of these TV programs argon presented in Table 1. Table 1: tv programs that serve as the source material of our corpus. genre Program Length Current Tuesday Report ( ), cxxxv episodes affairs ), X 20 minutes Sunday Report ( Hong Kong Connection ( ) slop 24 episodes My Sweets ( ) show X 30 minutesEuphemism: When a Cantonese word explicitly mentions something that the speaker finds embarrassing, s/he might opt for an English word that contains no such mention. For example, to overturn the female body part hung1 ââ¬Ëbreastââ¬â¢ in the word hung1 wai4 ââ¬Ëbraââ¬â¢, the speaker might prefer to use the English ââ¬Ëbraââ¬â¢ (all examples are taken from [7]): bra tau3 bra gaak3 gaak3 ââ¬ËA princess whose bra is clearââ¬â¢ Specificity: ââ¬Å"Sometimes an English expression is preferred because its mean is more general or specific compared with its near-synonymous counterparts,ââ¬Â [7] in either low or high Cantonese.For example, the verb ââ¬Ëto withstandââ¬â¢ means ââ¬Ëto take for a reservation for which no money or deposit is requiredââ¬â¢, which is more specific than its closest same in Cantonese, deng6 ââ¬Ëto make a reservationââ¬â¢. It is ofttimes used in sentences such as: book ngo5 soeng2 book saam1 dim2 ââ¬ËI want to book 3 oââ¬â¢ quantifyââ¬â¢ Principle of Economy: ââ¬Å"An English expression may also be pr eferred because it is shorter and thus requires less(prenominal)(prenominal) linguistic effort compared with its Chinese/Cantonese equivalent. ââ¬Â [7] While the word ââ¬Ëcheck-inââ¬â¢ has two syllables, its Cantonese equivalent baan6 lei5 dang1 gei1 sau2 zuk6 ââ¬Ëcheck-in [on a plane]ââ¬â¢ has six.The principle of rescue is thus apparent the reason behind mixed code such as: check-in nei5 check-in zo2 mei6 aa3 ââ¬ËHave you checked in already? ââ¬â¢ The taxonomy in [8] builds on the one in [7], further enriching it with categories2 below: reference book: When citing text or someone elseââ¬â¢s speech, one often prefers to use the original code to vitiate having to perform translation. An example is direct speech: ââ¬Å"What do you bring forward? ââ¬Â jau5 go3 pang4 jau5 man6 ngo5 what do you think ââ¬ËA friend asked me, ââ¬Å"What do you think? ââ¬â¢ stunt man: Originally put upd ââ¬ËEmphasis or evasion of repetitionââ¬â¢ [8], it will be referred to as ââ¬ËDoublingââ¬â¢ [9] here to make it explicit, as this category refers to English words that are embedded alongside Cantonese words that collect the selfsame(prenominal) or nearly the same mean. The purpose is to emphasize the idea or to avoid repetitions. In the future(a) sentence, it serves as an emphasis: 2 News playing period TVB News at Six-Thirty ( ) synodic month Resonance ( ), Yes Sir, Sorry Sir ( Sir Sir) 5 episodes X 20 minutes 4 episodes X 45 minutes B.Data Processing From the television programs listed in Table 1, all code-mixed utterances were transcribed, preserving the original languages, either Cantonese or English. Following monetary standard class period, loan words are not considered to be mixed code; in our context, all English words (e. g. , ââ¬Ëtaxiââ¬â¢) that acquit been adapted into Cantonese phonemics (e. g. , dik1 si2) were excluded. The TV legends corresponding to each of these utterances are also recorded as part o f the corpus. These captions are in standard Chinese, rather than Cantonese.Furthermore, alignments between the Chinese word(s) in the caption and the English word(s) in the utterance are annotated. This information will be used in the classification of motivations. Finally, two kinds of metadata about the speaker are recorded: gender (male or female) and age group (teenager or adult). C. Taxonomy of Code-Switching Motivations Our goal is to quantitatively characterize the motivations behind code-switching; to this end, each English segment in the Cantonese sentences in our corpus is to be labeled with a motivation. due(p) to time constraint, this classification was performed only on the currentaffairs and talk shows.The ââ¬Ëexpedientââ¬â¢ vs. ââ¬Ëorientationalââ¬â¢ classification system is too rough for our purpose. Instead, we adopted the taxonomy in [7,8] as our starting point, then introduced some new categories to accommodate our data. The categories in [7] are1: 1 A fourth category, ââ¬Ëbilingual punningââ¬â¢, is excluded from our taxonomy. As may be expected, punning is rarer in speech, and is indeed not found in our corpus. Among these categories is ââ¬Ëidentity markingââ¬â¢, for mixed code that marks ââ¬Å"social characteristics such as social status, education status, occupation, as well as regional affiliation. [8] We found it difficult to objectively identify this motivation, and excluded it from our taxonomy. 166 Very good very good m4 co3 aa1 ââ¬ËVery good, very good! ââ¬â¢ Interjection: English interjections may be inserted into the Cantonese sentence. For example: Anyway anyhow nei5 hou2 sai1 lei6 ak1 ââ¬ËAnyway, you are awesome! ââ¬â¢ A significant count of mixed code in our corpus, however, still does not fit into any of the above categories. Most capitulation under one of two reasons, ââ¬Ëpersonal make upââ¬â¢ and ââ¬Ë chronicleââ¬â¢.We therefore added them to our taxonomy: immortalise: This is roughly equivalent to the ââ¬Ëexpedientââ¬â¢ category in [6], but will be referred to as ââ¬ËRegisterââ¬â¢ in this wallpaper to make the motivation explicit. Sometimes, the speaker cannot find any equivalent ââ¬Ëlow Cantoneseââ¬â¢ word, but feels awkward to use a more formal ââ¬Ëhigh Cantoneseââ¬â¢ word (e. g. , paai1 deoi3 ââ¬Ë callerââ¬â¢). As a result, s/he resorts to an English equivalent instead. For example, party hoi1 ci2 laa1 ngo5 dei6 go3 party ââ¬ËOur party is startingââ¬â¢ Personal Name: It is common practice among Hong Kong people to adopt an English name.Although this phenomenon may be considered ââ¬Ëorientationalââ¬â¢ codemixing in terms of the ââ¬Ëwesternââ¬â¢ perception [6], it is tending(p) its own category, because it is very specific and accounts for a potent amount of our data. A typical example is: Teresa, Teresa ngo5 dei6 zing2 dak1 leng3 m4 leng3 ââ¬ËTeresa, did we make it nicely? ââ¬â¢ D. Annot ation Procedure We thus have a total of eight categories in our taxonomy of code-switching motivations. quintet of these categories â⬠namely, ââ¬Ëeuphemismââ¬â¢, ââ¬Ëquotationââ¬â¢, ââ¬Ëdoublingââ¬â¢, ââ¬Ëinterjectionââ¬â¢, and ââ¬Ëpersonal nameââ¬â¢ â⬠can usually be unambiguously discerned.The annotator, however, has often found it difficult to distinguish between ââ¬Ëspecificityââ¬â¢, ââ¬Ëregisterââ¬â¢, and ââ¬Ëprinciple of parsimoniousnessââ¬â¢. To main(prenominal)tain consistency, we adopted the following procedure. When an English segment does not fit into any of the five ââ¬Å"easyââ¬Â categories, the annotator is to decide whether it has the same meaning as the Chinese word in the caption to which it is aligned. If it is deemed not to have the same meaning, then it is assign ââ¬Ëspecificityââ¬â¢. If it is equivalent in meaning, and the annotator cannot think of any equivalent in ââ¬Ëlow Cantoneseââ¬â¢ , then it is labeled ââ¬Ëregisterââ¬â¢.Lastly, if there is a ââ¬Ëlow Cantoneseââ¬â¢ equivalent, but its name of syllables is larger than that of the English segment, then the motivation is ââ¬Ëprinciple of rescueââ¬â¢. IV. ANALYSIS English segments in Cantonese speech (section A), then discuss the distribution of the categories of motivations, both boilersuit and with respect to genres, genders, and age groups (section B). A. Density and Length of English Segments It is well known that English words are sprinkled rather liberally in the Cantonese speech in Hong Kong. We measure how the frequency of English segments varies across contrastive genres.As shown in Table 2, the frequency correlates with the register of the genre (see Section III. A). In the drama series, the most informal genre, one and a half English words are uttered per minute on average. The talk show occupies second place, and the current affairs shows have slightly less frequent English words. In the news program, where the speech is preplanned, the anchor did not utter any English word. Table 2: The total number of Cantonese sentences containing English segments, and the total number of English words transcribed. The last column shows how often an English word is uttered.Program genre Drama Talk show Current affairs News # sent with English 219 487 1495 0 # English words 259 625 1995 0 frequency (words/min) 1. 4 0. 87 0. 74 0 Second, we measure the length of the English segments. Table 3 shows that the vast majority of English segments contain no more than two words. Across all genres, more than 80% of the English segments consist of only one English word. This figure is comparable to the 81. 4% for text data reported in [8]. Table 3: relation of English segments with only one (e. g. , ââ¬Å"canteenââ¬Â) or two words (e. g. , ââ¬Å"thank youââ¬Â).Program genre Drama Current affairs Talk show One-word 85% 85% 81% 2-word 11% 11% 17% This section presents some pre liminary analyses on this corpus. We first consider the frequency and length of B. Motivations for the use of mixed code A plethora of motivations have been posited for the use of mixed code in Hong Kong (see Section II). Applying our proposed classification system (see Section III. C) on our corpus of transcribed speech, we aim now to discern the relative preponderance of the various kinds of codeswitching motivations. Table 4 shows the distribution of these motivations in the current-affairs and the talk shows.Four dominant motivations â⬠chiefly ââ¬Ëregisterââ¬â¢, but also ââ¬Ëpersonal nameââ¬â¢, ââ¬Ëprinciple of economyââ¬â¢, and ââ¬Ëspecificityââ¬â¢ â⬠are attributed to more than 95% of the English segments. This turn out is the same across genres (current-affairs and talk shows), genders (see Table 6), and age groups (see Table 5). All other categories, including quotations, euphemism, doubling, and interjection, are relatively infrequent. Genre s. Among the four dominant motivations, ââ¬Ëregisterââ¬â¢ â⬠the use of befittingly informal words â⬠is the most frequent motivation in both the current-affairs and 167 talk shows.Its proportion, however, is significantly more marked (47. 4%) in the talk show than in current affairs (36. 4%), reflecting the more informal nature of the former. Table 4: dispersion of code-switching motivations, contrasted between genres. Motivation Current affairs Talk show Register 36. 4% 47. 4% Personal Name 26. 8% 24. 5% Principle of economy 19. 0% 17. 6% Specificity 13. 2% 8. 2% Quotation 2. 1% 1. 0% Doubling 1. 4% 0. 4% Interjection 0. 9% 1. 0% Euphemism 0. 3% 0% Age groups. Table 5 contrasts the distributions of code-switching motivations between adults and teenagers in the current-affairs shows 3 .As mentioned above, the four major motivations remain constant. However, teenagers are much more likely than adults to use English words to achieve more informal register (52. 4% vs. 35 . 1%). They also tend more to opt for English to save effort (23. 8% vs. 18. 6%). Somewhat surprisingly at first glance, teenagers address others in English names less often than adults (2. 4% vs. 28. 8%); it turns out that in the conversations in our corpus, teenagers often prefer to address adults with the more formal Chinese names, likely out of respect.Table 5: Distribution of code-switching motivations, contrasted between age groups. Motivation Adults Teenagers Register 35. 1% 52. 4% Personal Name 28. 8% 2. 4% Principle of economy 18. 6% 23. 8% Specificity 13. 1% 14. 3% Quotation 1. 9% 4. 0% Doubling 1. 3% 2. 4% Interjection 0. 9% 0% Euphemism 0. 3% 0. 8% use English names to address others (32. 9% vs. 18. 9%); men, on the other hand, more frequently use English words to reduce effort (22. 9% vs. 14. 8%). V. CONCLUSIONS We have described the construction of a corpus of Cantonese-English mixed code, based on speech transcribed from television programs in Hong Kong.Drawn from mor e than 60 hours of speech, this corpus is among the largest of its type. A novel feature of the corpus is the notational system of the motivation behind each code-mixed utterance. Having proposed a classification system for these motivations, we applied it on our corpus, and reported differences in the use of mixed code between genres, genders and age groups. A key finding is that four main motivations â⬠ââ¬Ëregisterââ¬â¢, ââ¬Ëpersonal nameââ¬â¢, ââ¬Ëprinciple of economyââ¬â¢, and ââ¬Ëspecificityââ¬â¢ — account for more than 95% of the embedded English segments.ACKNOWLEDGMENT This project was partially funded by a little Research Grant from the Department of Chinese, Translation and Linguistics at City University of Hong Kong. We thank Man Chong Mak and Hiu Yan Wong for put in the corpus and performing annotation. REFERENCES [1] K. H. Y. Chen, ââ¬Å"The Social Distinctiveness of Two Code-mixing Styles in Hong Kong,ââ¬Â in Proceedings of the 4 th worldwide Symposium on Bilingualism, MA: Cascadilla Press, 2005, pp. 527541. J. Gumperz, ââ¬Å"The sociolinguistic significance of conversational code-switching,ââ¬Â in RELC Journal 8(2), 1977, pp. 1ââ¬34. J.Gibbons, ââ¬Å"Code-mixing and koineizing in the speech of students at the university of Hong Kongââ¬Â, in Anthropological Linguistics 21(3), 1979, pp. 113ââ¬123. B. H. -S. Chan, ââ¬Å"How does Cantonese-English code-mixing work? ââ¬Â, in talking to in Hong Kong at ampere-secondââ¬â¢s End, M. C. Pennington (ed. ), 1998, pp. 191ââ¬216, Hong Kong: Hong Kong University Press. D. C. S. Li, ââ¬Å"Linguistic convergence: Impact of English on Hong Kong Cantonese,ââ¬Â in Asian Englishes 2(1), 1999, pp. 5ââ¬36. K. K. Luke, ââ¬Å" wherefore two languages might be better than one: motivations of language mixing in Hong Kongââ¬Â, in Language in Hong Kong at Centuryââ¬â¢s End, M.C. Pennington (ed. ), 1998, pp. cxlvââ¬159, Hong Kong: Hong Kong Univer sity Press. D. C. S. Li, ââ¬Å"Cantonese-English code-switching research in Hong Kong: a Y2K review,ââ¬Â in adult male Englishes 19(3), 2000, pp. 305â⬠322. H. Cao, ââ¬Å"Development of a Cantonese-English code-mixing speech recognition system,ââ¬Â PhD dissertation, Chinese University of Hong Kong, 2011. R. Appel and P. Muysken, Language contact and bilingualism. London: Arnold, 1987. [2] [3] [4] [5] [6] Table 6: Distribution of code-switching motivations, contrasted between genders.Motivation Female Male Register 37. 5% 40. 7% Personal Name 32. 9% 18. 9% Principle of economy 14. 8% 22. 9% Specificity 10. 9% 13. 2% Quotation 1. 9% 1. 7% Doubling 1. 1% 1. 3% Interjection 0. 7% 1. 1% Euphemism 0. 3% 0. 2% Genders. Finally, we investigate whether codeswitching motivations are biased according to gender. Aggregating statistics from both the current-affairs and talk shows, Table 6 compares the motivations of males and those of females. Females are shown to be more likely to 3 [7] [8] [9] The speakers in the talk show are predominantly adults. 168\r\n'
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment