[comp.text] [bit.listserv.pacs-l] Text Encoding Initiative Description

U35395@UICVM.BITNET (Michael Sperberg-McQueen 312 996-2477 -2981) (08/10/90)
Original-posting-by: U35395@UICVM.BITNET (Michael Sperberg-McQueen 312 996-2477 -2981)
Original-subject: Text Encoding Initiative Description
Reposted-by: emv@math.lsa.umich.edu (Edward Vielmetti)

[Reposted to comp.text from the newsgroup(s) bit.listserv.pacs-l.]


----------------------------Original message----------------------------


----------------------------Original message----------------------------
What is the Text Encoding Initiative?

1.  Introduction

The Text Encoding Initiative is a cooperative undertaking of the
Association for Computers and the Humanities (ACH), the Association for
Computational Linguistics (ACL), and the Association for Literary and
Linguistic Computing (ALLC), to formulate and disseminate guidelines for
the encoding and interchange of machine-readable texts intended for
literary, linguistic, historical, or other textual research.
Representatives from approximately twenty learned societies and
professional associations whose concerns include the encoding of
machine-readable literary and linguistic material serve on an Advisory
Board which will guarantee a wide range of expertise to bring to bear on
the problems.

2.  Rationale

At present, the chaotic diversity of encoding schemes used for such
texts makes it difficult to move texts from one software program to
others, and researchers who exchange texts with others lose valuable
time deciphering the texts and converting them into their local encoding
scheme.  The primary goal of the Text Encoding Initiative is to provide
explicit guidelines which define a text format suitable for data
interchange and data analysis; the format should be hardware- and
software-independent, rigorous in its definition of textual objects,
easy to use, and compatible with existing standards.  The Standard
Generalized Markup Language (SGML) is being used as the basic notation
for the encoding recommended by the guidelines.

3.  Schedule

The project began with a planning conference in November, 1987.  A
first cycle of development began in June, 1988, and will run through
May, 1990.  During the first cycle, the TEI has produced draft
recommendations on a variety of topics; the complete draft will be
available publicly sometime during summer, 1990.  During a second
development cycle, which is to run from 1990 to 1992, the guidelines
will be tested by various projects affiliated with the TEI and revised
in accordance with public comment and with the experience of those
who use the draft.  Interim public drafts will be made available in
summer, 1991, and possibly at other times.

4.  Contents

The guidelines will cover both the "how" and the "what" of text
encoding.  The overall table of contents for the draft of summer, 1990,
is:

    1.  About These Guidelines
    2.  SGML Markup
    3.  Characters and Character Sets
    4.  Documentation and Bibliographic Control
    5.  Features Common to Many Text Types
         (including:  text structure, basic non-structural features,
         physical apprearance, figures, tables, diagrams, bibliographic
         references, editorial comment and emendation, resolution of
         ambiguous punctuation, reference systems, critical apparatus
         and parallel texts)
    6.  Features for Specific Text Types
         (including:  literary texts, office documents, language
         corpora and other collections, dictionaries)
    7.  Analytic and Interpretive Features
         (including:  common constructs for linguistic analysis,
         with examples for syntax, morphology, and word-level tagging)
    8.  Extending and Modifying the Guidelines

    Appendices:
    A.  Examples of TEI-Encoded Texts
    B.  SGML Declarations for TEI Documents
    C.  TEI Document Type Declarations
    D.  Character Sets

During the second cycle, extensions to other text types and to other
specialized types of analysis and interpretation are foreseen.

5.  Organization

The TEI is led by a steering committee comprising representatives of
the three sponsoring organizations; an advisory board of representatives
of the other participating organizations assists in disseminating
results and gathering comments.  Recommendations for the guidelines have
been developed by four working committees on Text Documentation (chair:
Dominik Wujastyk, Wellcome Institute for the History of Medicine), Text
Representation (chair:  Stig Johansson, University of Oslo), Analysis
and Interpretation (chair:  D. Terrence Langendoen, University of
Arizona), and Metalanguage and Syntax Issues (chair:  David Barnard,
Queen's University, Kingston, Ont.).

During the second cycle, a variety of affiliated projects will apply
the guidelines and assist in their revision and extension.

Day to day work is coordinated by the editors, C. M. Sperberg-McQueen
(University of Illinois at Chicago) and Lou D. Burnard (Oxford
University).

6.  Funding

The project is funded in part by the U.S. National Endowment for the
Humanities, the Commission of the European Communities, and the Andrew
W. Mellon Foundation.

7.  For Further Information

For further information, contact the editors:  C. M. Sperberg-McQueen,
University of Illinois at Chicago Computer Center (U35395 @ UICVM on
Bitnet, U35395 @ uicvm.cc.uic.edu on the Internet) and Lou D. Burnard,
Oxford University Computing Service (LOU @ UK.AC.OX.VAX on Janet, LOU @
VAX.OX.AC.UK from Bitnet or Earn); or any member of the steering
committee:  Robert Amsler, Bellcore (amsler@bellcore.com); Susan Hockey,
Oxford University (SUSAN@VAX.OX.AC.UK); Nancy Ide, Vassar College
(IDE@VASSAR); Donald Walker (walker@bellcore.com); Antonio Zampolli
(GLOTTOLO@ICNUCEVM).