ez000441@deneb.ucdavis.edu (R.Goldthwaite) (01/08/90)
Hi, Are there transcripts of conversations - ordinary dialogues between adults - in computer-readable formats anywhere? Any topics, although topics which would suggest story-telling narratives especially, are welcome. Send email; I'll summarize. Thank you, Ron Goldthwaite rogoldthwaite@ucdavis.edu Psychology & Animal Behavior U. Calif., Davis, CA 95616 (916) 752 5655/1880 Ron Goldthwaite, PhD / UC Davis, Psychology and Animal Behavior rogoldthwaite@ucdavis.edu
sp299-ad@violet.berkeley.edu (Celso Alvarez) (01/18/90)
In article <6385@ucdavis.ucdavis.edu> ez000441@deneb.ucdavis.edu (R.Goldthwaite) writes: >Are there transcripts of conversations - ordinary dialogues between adults - >in computer-readable formats anywhere? Any topics, although topics which >would suggest story-telling narratives especially, are welcome. >Send email; I'll summarize. Thank you, Please do summarize your findings. I tried to send you email twice, but it bounced twice. I suggest you look into the ComServe fileserver (a Bitnet node which includes a number of discussion groups or `hotlines' on communication). The database includes some transcripts of conversation. Their address: SUPPORT@RPIECS.BITNET (to get general information on ComServe). To subscribe to the Ethnomethodology hotline, where you can address your queries, send the following 1-line command message to COMSERVE@RPIECS.BITNET (notice this address is ComServe, not Support). The command is: Join Ethno Your_Name If you can, please get in contact with me, and send me a working path. I can give you additional information. Celso Alvarez UC Berkeley sp299-ad@violet.berkeley.edu Celso Alvarez sp299-ad@violet.berkeley.edu
jwk@lanl.gov (John W. Keller) (01/19/90)
I too tried to e-mail you some information and got bounced. I use a forum program with participants from across the country. It is very easy to make copies of all the conversations that occur. These are not, however, transcripts of face to face conversations. They are synchronous text conversations and may not be what you are looking for. If this type of thing will suit your purpose, please let me know. Like I said they are very easy to record as unix files that can be mailed. Or I can get you a copy of the code and you can join in and get your own. Hope this helps John Keller ****************************************************************** John Keller Staff Reasearch Assistant LANL, MS M997 Los Alamos National Laboratory PO Box 1663 Los Alamos, NM 87544 jwk@beta.lanl.gov ****************************************************************** As usual, my opinions are my own. ***********************************
acm@grendal.Sun.COM (Andrew MacRae) (02/06/90)
>These are not, however, transcripts of face to face conversations. >They are synchronous text conversations and may not be what you >are looking for. If someone is looking for transcripts of face to face conversations, you might consider using play scripts. Granted, depending on the playwrite the conversations may be less than realistic, but many plays do mimic normal conversation fairly well. Andrew MacRae
edwards@cogsci.berkeley.edu (Jane Edwards) (02/06/90)
In a Jan. 7 article <6385@ucdavis.ucdavis.edu> ez000441@deneb.ucdavis.edu (Ron Goldthwaite) asked about the availability online of transcripts of conversations and narratives. I summarize below the ones I know of. If you know of others, I would very much like to hear from you, as I am trying to prepare a reasonably complete list of the major ones for publication in a book on related topics later this Spring. So far as I know, the biggest archive project is the Oxford Text Archive, with about 450 separate collections of written texts and transcripts of spoken language ("corpora"). Most are from written sources (e.g., literary classics), but it also has some well-known spoken language corpora, such as the Lancaster-Oslo-Bergen (LOB) and London-Lund corpora. Most of the holdings are in English, but a wide range of other languages are also represented: Dutch, French, Hebrew, Latvian, German, Icelandic, Gaelic, Coptic, Malayan, etc. The Oxford Text Archive also distributes information concerning the holdings of 4 other archives: U. of Cambridge, U of Pisa, U. of Pennsylvania, and Brigham Young U. Oxford Text Archive address: archive@uk.ac.ox.vax (JANET), archive%vax.ox.ac.uk@ucl.cs.edu (EDU), archive%vax.ox.ac.uk@ukacrl.earn (BITNET). One of their written holdings is the BROWN CORPUS (asked about in a recent nl-kr digest), which is composed of 500 written language samples, of 2000 words each from a range of written styles of English printed in 1961 (described in Kucera and Francis, 1967, _Computational analysis of present-day American English_). This corpus is not used widely in linguistic research (though perhaps in Literature, or Humanities) because the data are: (a) from written rather than spoken language sources, and (b) 30 years old. The large "Australian Corpus Project" (described in Kyto, et al. (eds.), 1988, _Corpus linguistics: hard and soft_, and in the book review in _Language_, 1989, 65(4), 843-848), may provide a needed updated sampling of a wide range of written (Australian/British) English, and some spoken English as well. Another big archive project is the CHILDES project, at Carnegie-Mellon (Brian MacWhinney, brian@andrew.cmu.edu). While most of their data are children speaking to adults, they also distribute adult written and adult spoken language corpora from the CORNELL project. The spoken samples range from abortion debates to the Patty Hearst trial to TV sit. coms. There are a fair number of typographical errors, unfortunately, including some which most spell-checkers would overlook (e.g., "feint" for "faint"). But it is a diverse, highly useful and recent collection. For SPOKEN spontaneous adult English, the best and biggest is probably the London-Lund corpus (described in Svartvik & Quirk, 1980, _A corpus of spoken English_, and Svartvik, et al., 1982, _Survey of Spoken English_), available through the Oxford Text Archive. These data include conversations by people of various ages, occupations, etc., recorded under various circumstances. They have rich prosodic marking, and have been of enormous benefit to a wide range of linguistic investigations. A drawback for Americans for some purposes, is that the data are British English. Another big archive of spoken (British) English is the Lancaster-Oslo-Bergen (LOB) archive (52,000 words, prosodic marking, as close to RP as possible), also available through the Oxford Text Archive. For SPOKEN ADULT AMERICAN English, there is, to my knowledge no publically accessible archive as large as those just mentioned. At Berkeley, we have a collection of various types of spoken interaction (from conversations, to the Oliver North trial, to lectures), collected and contributed mostly by professors here, and intended mainly for local use at this time. The ethnomethodological corpora mentioned in article <1990Jan18.074947.28456@agate.berkeley.edu> by sp299-ad@violet.berkeley.edu (Celso Alvarez) also warrant looking into. An enormous archive of spoken American English is presently in the planning stages at UC Santa Barbara to fill the need for a large-scale archive sampling a wide range of types of adult spoken American English discourse. The 1987 Linguistics Society of America questionnaire turned up many private data sets, but only relatively few of them on computer. The trend toward doing so is very rapidly increasing, and with it, discussion of standards, normalization, etc., and as that happens more of them may come into common domain. In Germany, two archives warrant mention. One is in Mannheim (for which I have no email address or contact person) and contains various types of data in the German language. The other is at Univ. of Ulm (designed and coordinated by Erhard Mergenthaler, LU07@DMARUM8.bitnet, author of _Textbank systems: Computer science applied in the field of psychoanalysis_ 1985), and contains a large number of psychotherapy sessions and interviews (most in monolingual German, some in monolingual English). In the Netherlands (Max-Planck-Institut fuer Psycholinguistik, Nijmegen, helmut@hnympi51.bitnet), there is the European Science Foundation Second Language Data Bank, containing transcripts of 10 groups of adult migrant workers learning the language of their "host" country (e.g., Turks learning German or Dutch, Punjabis learning English, Moroccans learning French, Spaniards and Finns learning Swedish, etc.) So, these are all of the ones that I know about. If you know of others, or have email addresses to those above which I don't, I would very much appreciate hearing from you, and will summarize and post responses received. Thanks, Jane Edwards (edwards@cogsci.berkeley.edu) Cognitive Science Program, UC Berkeley