U35395@UICVM.BITNET (Michael Sperberg-McQueen 312 996-2477 -2981) (08/10/90)
Original-posting-by: U35395@UICVM.BITNET (Michael Sperberg-McQueen 312 996-2477 -2981) Original-subject: Text Encoding Initiative Description Reposted-by: emv@math.lsa.umich.edu (Edward Vielmetti) [Reposted to comp.text from the newsgroup(s) bit.listserv.pacs-l.] ----------------------------Original message---------------------------- ----------------------------Original message---------------------------- What is the Text Encoding Initiative? 1. Introduction The Text Encoding Initiative is a cooperative undertaking of the Association for Computers and the Humanities (ACH), the Association for Computational Linguistics (ACL), and the Association for Literary and Linguistic Computing (ALLC), to formulate and disseminate guidelines for the encoding and interchange of machine-readable texts intended for literary, linguistic, historical, or other textual research. Representatives from approximately twenty learned societies and professional associations whose concerns include the encoding of machine-readable literary and linguistic material serve on an Advisory Board which will guarantee a wide range of expertise to bring to bear on the problems. 2. Rationale At present, the chaotic diversity of encoding schemes used for such texts makes it difficult to move texts from one software program to others, and researchers who exchange texts with others lose valuable time deciphering the texts and converting them into their local encoding scheme. The primary goal of the Text Encoding Initiative is to provide explicit guidelines which define a text format suitable for data interchange and data analysis; the format should be hardware- and software-independent, rigorous in its definition of textual objects, easy to use, and compatible with existing standards. The Standard Generalized Markup Language (SGML) is being used as the basic notation for the encoding recommended by the guidelines. 3. Schedule The project began with a planning conference in November, 1987. A first cycle of development began in June, 1988, and will run through May, 1990. During the first cycle, the TEI has produced draft recommendations on a variety of topics; the complete draft will be available publicly sometime during summer, 1990. During a second development cycle, which is to run from 1990 to 1992, the guidelines will be tested by various projects affiliated with the TEI and revised in accordance with public comment and with the experience of those who use the draft. Interim public drafts will be made available in summer, 1991, and possibly at other times. 4. Contents The guidelines will cover both the "how" and the "what" of text encoding. The overall table of contents for the draft of summer, 1990, is: 1. About These Guidelines 2. SGML Markup 3. Characters and Character Sets 4. Documentation and Bibliographic Control 5. Features Common to Many Text Types (including: text structure, basic non-structural features, physical apprearance, figures, tables, diagrams, bibliographic references, editorial comment and emendation, resolution of ambiguous punctuation, reference systems, critical apparatus and parallel texts) 6. Features for Specific Text Types (including: literary texts, office documents, language corpora and other collections, dictionaries) 7. Analytic and Interpretive Features (including: common constructs for linguistic analysis, with examples for syntax, morphology, and word-level tagging) 8. Extending and Modifying the Guidelines Appendices: A. Examples of TEI-Encoded Texts B. SGML Declarations for TEI Documents C. TEI Document Type Declarations D. Character Sets During the second cycle, extensions to other text types and to other specialized types of analysis and interpretation are foreseen. 5. Organization The TEI is led by a steering committee comprising representatives of the three sponsoring organizations; an advisory board of representatives of the other participating organizations assists in disseminating results and gathering comments. Recommendations for the guidelines have been developed by four working committees on Text Documentation (chair: Dominik Wujastyk, Wellcome Institute for the History of Medicine), Text Representation (chair: Stig Johansson, University of Oslo), Analysis and Interpretation (chair: D. Terrence Langendoen, University of Arizona), and Metalanguage and Syntax Issues (chair: David Barnard, Queen's University, Kingston, Ont.). During the second cycle, a variety of affiliated projects will apply the guidelines and assist in their revision and extension. Day to day work is coordinated by the editors, C. M. Sperberg-McQueen (University of Illinois at Chicago) and Lou D. Burnard (Oxford University). 6. Funding The project is funded in part by the U.S. National Endowment for the Humanities, the Commission of the European Communities, and the Andrew W. Mellon Foundation. 7. For Further Information For further information, contact the editors: C. M. Sperberg-McQueen, University of Illinois at Chicago Computer Center (U35395 @ UICVM on Bitnet, U35395 @ uicvm.cc.uic.edu on the Internet) and Lou D. Burnard, Oxford University Computing Service (LOU @ UK.AC.OX.VAX on Janet, LOU @ VAX.OX.AC.UK from Bitnet or Earn); or any member of the steering committee: Robert Amsler, Bellcore (amsler@bellcore.com); Susan Hockey, Oxford University (SUSAN@VAX.OX.AC.UK); Nancy Ide, Vassar College (IDE@VASSAR); Donald Walker (walker@bellcore.com); Antonio Zampolli (GLOTTOLO@ICNUCEVM).