[comp.ai] transcripts of conversations

ez000441@deneb.ucdavis.edu (R.Goldthwaite) (01/08/90)

Hi,
Are there transcripts of conversations - ordinary dialogues between adults -
in computer-readable formats anywhere?  Any topics, although topics which
would suggest story-telling narratives especially, are welcome.  

Send email; I'll summarize.  Thank you,

Ron Goldthwaite
rogoldthwaite@ucdavis.edu

Psychology & Animal Behavior
U. Calif., Davis, CA 95616
(916) 752 5655/1880


Ron Goldthwaite, PhD / UC Davis, Psychology and Animal Behavior
rogoldthwaite@ucdavis.edu

sp299-ad@violet.berkeley.edu (Celso Alvarez) (01/18/90)

In article <6385@ucdavis.ucdavis.edu> ez000441@deneb.ucdavis.edu
(R.Goldthwaite) writes:
>Are there transcripts of conversations - ordinary dialogues between adults -
>in computer-readable formats anywhere?  Any topics, although topics which
>would suggest story-telling narratives especially, are welcome.  

>Send email; I'll summarize.  Thank you,

Please do summarize your findings.  I tried to send you email twice,
but it bounced twice.  I suggest you look into the ComServe fileserver
(a Bitnet node which includes a number of discussion groups or
`hotlines' on communication).  The database includes some transcripts
of conversation.  Their address:  SUPPORT@RPIECS.BITNET (to get general
information on ComServe).  To subscribe to the Ethnomethodology
hotline, where you can address your queries, send the following
1-line command message to  COMSERVE@RPIECS.BITNET  (notice this address
is ComServe, not Support).  The command is:
		Join Ethno Your_Name

If you can, please get in contact with me, and send me a working
path.  I can give you additional information.

Celso Alvarez
UC Berkeley
sp299-ad@violet.berkeley.edu

Celso Alvarez
sp299-ad@violet.berkeley.edu

jwk@lanl.gov (John W. Keller) (01/19/90)

I too tried to e-mail you some information and got bounced.
I use a forum program with participants from across the country.
It is very easy to make copies of all the conversations that
occur. 

These are not, however, transcripts of face to face conversations.
They are synchronous text conversations and may not be what you
are looking for.

If this type of thing will suit your purpose, please let me know. Like
I said they are very easy to record as unix files that can be mailed. 
Or I can get you a copy of the code and you can join in and get your
own.

Hope this helps

John Keller


******************************************************************
John Keller			Staff Reasearch Assistant
LANL, MS M997			Los Alamos National Laboratory
PO Box 1663
Los Alamos, NM 87544		jwk@beta.lanl.gov
******************************************************************
As usual, my opinions are my own.
***********************************

acm@grendal.Sun.COM (Andrew MacRae) (02/06/90)

>These are not, however, transcripts of face to face conversations.
>They are synchronous text conversations and may not be what you
>are looking for.

If someone is looking for transcripts of face to face conversations,
you might consider using play scripts.  Granted, depending on the playwrite
the conversations may be less than realistic, but many plays do mimic
normal conversation fairly well.

					Andrew MacRae

sp299-ad@violet.berkeley.edu (Celso Alvarez) (02/06/90)

In article <814@jethro.Corp.Sun.COM> acm@grendal.EBay.Sun.COM
(Andrew MacRae) writes:

>If someone is looking for transcripts of face to face conversations,
>you might consider using play scripts.  Granted, depending on the playwrite
>the conversations may be less than realistic, but many plays do mimic
>normal conversation fairly well.

I don't know what use play scripts could have for serious conversational
analysis.  Constructed discourse relies on perceptions and generalizations
about talk which may not be the real thing.

Celso Alvarez
sp299-ad@violet.berkeley.edu

edwards@cogsci.berkeley.edu (Jane Edwards) (02/06/90)

In a Jan. 7 article <6385@ucdavis.ucdavis.edu> ez000441@deneb.ucdavis.edu
(Ron Goldthwaite) asked about the availability online of transcripts of 
conversations and narratives.  I summarize below the ones I know of.  
If you know of others, I would very much like to hear from you, as I am 
trying to prepare a reasonably complete list of the major ones for 
publication in a book on related topics later this Spring. 

So far as I know, the biggest archive project is the Oxford Text Archive, 
with about 450 separate collections of written texts and transcripts of 
spoken language ("corpora"). Most are from written sources (e.g., literary 
classics), but it also has some well-known spoken language corpora, such 
as the Lancaster-Oslo-Bergen (LOB) and London-Lund corpora.  Most of the 
holdings are in English, but a wide range of other languages are also 
represented: Dutch, French, Hebrew, Latvian, German, Icelandic, Gaelic, 
Coptic, Malayan, etc.  The Oxford Text Archive also distributes information 
concerning the holdings of 4 other archives: U. of Cambridge, U of Pisa, 
U. of Pennsylvania, and Brigham Young U.  Oxford Text Archive address: 
archive@uk.ac.ox.vax (JANET), archive%vax.ox.ac.uk@ucl.cs.edu (EDU), 
archive%vax.ox.ac.uk@ukacrl.earn (BITNET).  One of their written holdings 
is the BROWN CORPUS (asked about in a recent nl-kr digest), which is 
composed of 500 written language samples, of 2000 words each from a range 
of written styles of English printed in 1961 (described in Kucera and 
Francis, 1967, _Computational analysis of present-day American English_).
This corpus is not used widely in linguistic research (though perhaps in 
Literature, or Humanities) because the data are: (a) from written rather 
than spoken language sources, and (b) 30 years old.  The large "Australian
Corpus Project" (described in Kyto, et al. (eds.), 1988, _Corpus linguistics: 
hard and soft_, and in the book review in _Language_, 1989, 65(4), 843-848), 
may provide a needed updated sampling of a wide range of written 
(Australian/British) English, and some spoken English as well.

Another big archive project is the CHILDES project, at Carnegie-Mellon
(Brian MacWhinney, brian@andrew.cmu.edu).  While most of their data are 
children speaking to adults, they also distribute adult written and adult
spoken language corpora from the CORNELL project.  The spoken samples
range from abortion debates to the Patty Hearst trial to TV sit. coms.
There are a fair number of typographical errors, unfortunately, including 
some which most spell-checkers would overlook (e.g., "feint" for "faint").  
But it is a diverse, highly useful and recent collection.  

For SPOKEN spontaneous adult English, the best and biggest is probably the
London-Lund corpus (described in Svartvik & Quirk, 1980, _A corpus of 
spoken English_, and Svartvik, et al., 1982, _Survey of Spoken English_), 
available through the Oxford Text Archive.  These data include conversations 
by people of various ages, occupations, etc., recorded under various
circumstances.  They have rich prosodic marking, and have been of enormous 
benefit to a wide range of linguistic investigations.  A drawback for 
Americans for some purposes, is that the data are British English.  Another 
big archive of spoken (British) English is the Lancaster-Oslo-Bergen (LOB) 
archive (52,000 words, prosodic marking, as close to RP as possible), 
also available through the Oxford Text Archive.

For SPOKEN ADULT AMERICAN English, there is, to my knowledge no publically 
accessible archive as large as those just mentioned.  At Berkeley, we have 
a collection of various types of spoken interaction (from conversations, 
to the Oliver North trial, to lectures), collected and contributed mostly 
by professors here, and intended mainly for local use at this time.  The 
ethnomethodological corpora mentioned in article 
<1990Jan18.074947.28456@agate.berkeley.edu> by sp299-ad@violet.berkeley.edu 
(Celso Alvarez) also warrant looking into.  An enormous archive of spoken 
American English is presently in the planning stages at UC Santa Barbara 
to fill the need for a large-scale archive sampling a wide range of types
of adult spoken American English discourse.

The 1987 Linguistics Society of America questionnaire turned up
many private data sets, but only relatively few of them on computer.
The trend toward doing so is very rapidly increasing, and with it,
discussion of standards, normalization, etc., and as that happens
more of them may come into common domain.

In Germany, two archives warrant mention.  One is in Mannheim (for which
I have no email address or contact person) and contains various types
of data in the German language.  The other is at Univ. of Ulm (designed 
and coordinated by Erhard Mergenthaler, LU07@DMARUM8.bitnet, author of 
_Textbank systems: Computer science applied in the field of psychoanalysis_ 
1985), and contains a large number of psychotherapy sessions and interviews 
(most in monolingual German, some in monolingual English).  

In the Netherlands (Max-Planck-Institut fuer Psycholinguistik, Nijmegen,
helmut@hnympi51.bitnet), there is the European Science Foundation
Second Language Data Bank, containing transcripts of 10 groups of adult 
migrant workers learning the language of their "host" country (e.g., Turks
learning German or Dutch, Punjabis learning English, Moroccans learning 
French, Spaniards and Finns learning Swedish, etc.)

So, these are all of the ones that I know about.  If you know of others, 
or have email addresses to those above which I don't, I would very much 
appreciate hearing from you, and will summarize and post responses received.  
Thanks,

Jane Edwards  (edwards@cogsci.berkeley.edu)
Cognitive Science Program, UC Berkeley

dave@rnms1.paradyne.com (Dave Cameron (Consultant)) (02/07/90)

In article <814@jethro.Corp.Sun.COM> acm@grendal.EBay.Sun.COM (Andrew MacRae) writes:
>If someone is looking for transcripts of face to face conversations,
>you might consider using play scripts.  Granted, depending on the playwrite
>the conversations may be less than realistic, but many plays do mimic
>normal conversation fairly well.

This is far from correct. If you are looking for strings of phonetic groups
it may do, but for word choice, content analysis, and pattern expectation