[comp.text] SGML

tut%cairo@Sun.COM (Bill "Bill" Tuthill) (07/23/88)

I'm moving a discussion of SGML started in comp.text.desktop into this
newsgroup, because I think the issues are larger than a desktop.

I feel SGML would be helpful if it were a de facto (rather than merely
a de jure) standard.  However, since most document publishing systems
cannot at present exchange SGML text with each other, SGML is pointless,
kind of like Esperanto.  Furthermore, even if SGML were more widespread,
incompatible tag sets would still pose interchange problems.

Here is an excerpt from a memo I wrote a while back.  Goldfarb (an IBM
employee) is the principal perpetrator of SGML.  References are to an
article he published in "SIGPLAN Notices," June 1981.
-----

SGML is a solution to a non-problem.  Goldfarb believes that descriptive
markup languages (such as SGML) are superior to procedural ones (such as
IBM SCRIPT).  Even though this may be true, it is a specious comparison
because SCRIPT really stinks.  Instead, SGML should be compared to decent
procedural languages such as troff and TeX.  There are good reasons why
troff and TeX macro packages were invented: well-designed macros provide
writers with a descriptive layer over a procedural language.  When the
descriptive layer isn't powerful enough, troff and TeX already have escape
hatches so writers can achieve special effects.  SGML apparently provides
no escape hatches.

SGML is no panacea for portability.  Being a metalanguage, SGML does not
provide one syntax, but only a method for describing different syntaxes.
On p. 68 Goldfarb states, "SGML allows variant concrete syntaxes."  This
is tantamount to saying it isn't really standard.  It would probably be
as difficult to translate between variant syntaxes as to translate between
troff and Interleaf or Frame.

SGML was born obsolete.  Graphics are missing from the specification, as
are provisions for tables and equations.  On p. 100 Goldfarb talks about
WYSIWYG, but what he apparently means is typewriter input: something like
-ms's .DS/.DE macros.  Furthermore, every SGML document I've ever seen is
extremely ugly.  It doesn't say much for a documentation standard when it
can't even produce handsome documents.

SGML represents no great advance.  I was a consultant at UC Berkeley when
IBM SCRIPT/GML arrived, and most users said "so what?  We already have
troff." The specially-hired SCRIPT/GML consultant had no clients-- none.
There was no evidence that SGML was superior to other batch systems.  A
few comparisons in Goldfarb's article make SGML seem inferior:

----- SGML -----
<p>
Text processing and word processing systems
typically require additional information to be interspersed
among the natural text of the document being processed.
This added information, called <q>markup</q>, serves two purposes:
<ol>
<li>Separating the logical elements of the document; and
<li>Specifying the processing functions to be performed on those elements.
</ol>
This figure represents divine document intervention.
</p>
<fig id=angelfig>
<figbody>
<artwork depth=24p>
</artwork>
</figbody>
<figcap>Three Angles Dancing
</figcap>
</fig>
----- troff -----
.LP
Text processing and word processing systems
typically require additional information to be interspersed
among the natural text of the document being processed.
This added information, called \*Qmarkup\*U, serves two purposes:
.NP
Separating the logical elements of the document; and
.NP
Specifying the processing functions to be performed on those elements.
.LP
This figure represents divine document intervention.
.FN "Three Angels Dancing"
.CP angelfig 24P

SGML's jargon shows intellectual bankrupcty.  As far as I can tell, the
term "generic identifier" means tag, and the term "entity reference"
means file.  Why does Goldfarb have to resort to obfuscatory terminology,
if not to hide intellectual deficiencies in the design?

SGML embraces pointless data structures.  Documents seem to be stored
(or conceptualized) in a hierarchical tree, right down to individual
words and letters.  There is no compelling reason why words and letters
cannot be strings and characters.  In the concrete syntax described, the
ASCII characters < > & % ; appear to be reserved symbols, but Goldfarb
offers no method for printing these characters literally.  In troff at
least only the \ is reserved.  Note that < > & % are heavily used in
UNIX documentation.

SGML requires a guru.  SGML documents are supposed to be rigorous, but
rigorous means inflexible.  If writers want to change the least thing,
they will have to consult an SGML guru.  It seems that SGML gurus will
have to be just as knowledgeable as a TeX or troff macro guru, or a
Scribe database administrator.

sylvia@cs.vu.nl (Sylvia van Egmond) (08/02/88)

There has been some discussion on SGML lately. Some misconceptions
on SGML have been corrected by a number of people, so I won't go into
that. At the Free University, we have implemented a (basic) SGML parser.
A technical report, written by Sylvia van Egmond and Jos Warmer, is
available for anyone interested. This report gives an introduction
to what SGML is, and describes the parser. If you are interested in
obtaining the report and/or the parser, please write to Sylvia or Jos at
the address below.

			Free University Amsterdam
			Dept. of Mathematics and Computer Science
			De Boelelaan 1081
			1081 HV  Amsterdam
			The Netherlands

news@encore.UUCP (Newsboy) (11/09/88)

Desktop SGML would be made in Boston, Nov. 16-18. Can someone tell me
the details, like where it is being held, who is sponsoring it, etc? 
 -lar
From: kaufman@maxzilla.Encore.COM (Lar Kaufman)
Path: maxzilla!kaufman



 Lar Kaufman   <= my opinions          Fidonet: 1:322/470@508-534-1842 
 kaufman@multimax.arpa    {bu-cs,decvax,necntc,talcott}!encore!kaufman

yuri@sq.uucp (Yuri Rubinsky) (11/10/88)

In article <4137@encore.UUCP> kaufman@maxzilla.UUCP (Lar Kaufman) writes:
>Desktop SGML would be made in Boston, Nov. 16-18. Can someone tell me
>the details, like where it is being held, who is sponsoring it, etc? 


STANDARDS & THE DESKTOP is co-sponsored by the
National Association of Desktop Publishers and
the Graphic Communications Association (which also
 sponsors the TechDoc and MarkUp conferences).

It is being held at the Hotel Meridien.
For further information telephone Marion Ellidge or Patti Hill
at the GCA: 703 841-8160.

Registration Fees:

GCA Member	$495.00
NADTP Member	$590.00
Nonmember	$685.00


Here's the text of the brochure that was published:

----------
STANDARDS & the Desktop
Wednesday November 16

Registration 7:30 am


9:00 am

No Desk Is an Island

Yuri Rubinsky, President, SOFTQUAD INC.

In a world made confusing with competing proprietary formats, we
can take comfort in the work of standards-makers. The individual at
the desktop, the networked individual in the working group, the
individual whose microcomputer speaks to a mainframe - all
need to retrieve and share old and new data easily. The
Standard Generalized Markup Language offers a high-level approach
that says `What follows meets a standard'; and then defines
the structures needed to understand it.



9:30 am

Sometimes You Want to See
What You Get, Sometimes You Don't

Sharon Adler, Product Development, IBM CORPORATION

The answers to
several questions - Who creates the document? How is the information
going to be used? What types of documents are produced? - will
determine
whether a What You See is What You Get or code-driven, automated
publishing system is most appropriate for certain tasks. Either
approach benefits from skillful
application of text and graphic standards.


10:15 am

SGML for Text and Graphics

Pamela Gennusa, Director, Product Marketing, DATALOGICS, INC.

As background to SGML's role in the conference, a compact tutorial
will explain the fundamentals of this ISO standard, with particular
emphasis on its flexibility in defining structures, for expressing
different system notations and for passing complex and detailed
information to software processes in SGML attributes.


11:30 am

Apple's Knowledge Navigator
Video Explained: SGML in the Future

Yuri Rubinsky

A popular conference item for the last year, Apple's elegant videotape
and informational master of ceremonies gathering various kinds of data
from a variety of other computing systems. If such a dream is to be realized,
it will be built on standards which take into account the idiosyncrasies
of many systems, many applications and many storage protocols.


1:30 pm

The Intelligence Embedded in Pages

Haviland Wright, President, AVALANCHE DEVELOPMENT

Our collective history is still written in pages; much of our 
current challenge comes from a need to convert existing text into
a more usable - read, electronic - form. To do that most 
effectively, we must extract from pages not just from words, but
also the information about the information. Text tool information
is conveyed through structure as well as content. Some text 
structures, like tables and lists, are local. Others, like cross
references, numbered sections and paragraphs, are
navigational markers for documents or collections of documents.
Conversion of page-oriented text to electronic form will be fully
successful when an electronic text can be scanned by readers as 
easily as a book.


2:00 pm

The Status of the AAP
Electronic Manuscript Project

Betsy Kiser, EPSIG Manager, OCLC

OCLC, the Online Computer
development and promotion of the `grand-daddy of SGML applications', the
standard created by the Association of American
Publishers collaborating
with some thirty other organizations.


2:20 pm

SGML and PageMaker:
The Individual at the
Desktop

Michael Tabor, Publishing Consultant, INTERACCESS INFORMATION DESIGN

What are the advantages of standards for someone working in virtual
isolation? Implementing a smaller version of the ISO standard in
currently available desktop publishing software will provide a
major focus for this session with particular discussion of archiving,
file control and version control.



2:50 pm

SGML on the Campus: The
Publishing of
Theses, Journals and Books

Czeslaw Jan Grycz, Design & Production Manager, UNIVERSITY OF CALIFORNIA PRESS

SGML and related coding systems have a great effect on the publishers
who accept scholarly work. This session will show examples of the
prep- aration of documentary editing projects both using SGML or ignoring
its advantages, discussing implications for publishers of receiving
each.


3:45 pm

The Need for Interchange
in Corporate Publishing

Robert Marcum, Engineering Specialist, GENERAL DYNAMICS

Desktop publishing has provided a proliferation
of hardware and software solutions for problems we didn't know 
we had. The corporate solution: obtain as many of each type of hardware
platform and software package as can be justified; evaluate everything
on line; minimize training; and
pray. The major constraint? The traditional
support groups (Technical Publications, Graphic Services, Art and Editorial, 
etc.) can no longer be used for programs with desktop computing capabilities.
This presentation addresses how this environment can be made into a
productive team production process for the generation of compound electronic
documents via networking (file transfer), translation (file conversion to
standard formats), data element filing (DBMS or file control system) and 
documentation integration (merging of
text and graphic elements); all via
a desktop automated batch processor (without human intervention).



Thursday November 17
8:30 am

Graphics and the Future

Lee Silverman, Senior Graphics Manager, Engineering Division,
COMPUGRAPHIC CORPORATION

As the early cave drawings did,
certainly, the future of communication holds,
at its essence, graphic representation. Today,
though, we are forced by new media not only to
communicate locally or even nationally but rather internationally and
globally. Without
a single world-wide spoken or written
language, graphics
will be the key for transmitting many technical, political, social,
scientific and even aesthetic concepts and articles of information.
These graphics must
be universally displayed, read, and understood. Digital standards
will make this possible.


9:00 am

Everything You Need to Know
about Graphics and Graphic Standards

Lee Silverman

Hu Hohn, Director, COMPUTER ARTS LEARNING CENTER, MASSACHUSETTS COLLEGE OF ART

Macroscopically, in a document of 50,000 words, only the graphic
elements stand out. Taking the microscopic view, each
character of each word is composed of curves
and lines filled with color. The page, the graphic and the
character all represent components that we would like to
exchange, interpret, image electronically or print photomechanically using
a set of standards which describe them accurately, and, most importantly,
repeatably.


10:30 am

What Do Electronic Graphic Systems
Do? What Should They Do?

David Mayer, Marketing Manager, Electronic Publishing Systems,
AUTO-TROL TECHNOLOGIES

How we look at pages changes dramatically depending on whether we
look at all pages as textual with some illustration thrown in, or
at all pages as graphics where it's important to know specialized
information (the SGML information, for instance) about all the text
components within the image. Current standards offer a foundation
for the latter approach.


11:15 am

The Argument for Having
SGML Everywhere

Frank Gilbane, President, PUBLISHING TECHNOLOGY MANAGEMENT

The stockpile of examples is growing of instances where the structural
information contained in SGML files and databases provides valuable (and
sometimes unexpected) insights. Access to structure at a more fundamental
level - at a layer just between the operating system and the application,
for example - would revolutionize current utilities.


1:30 pm

Structuring Text for Research

Michael Sperberg-McQueen, Editor-in-Chief,
Text Encoding Initiative, UNIVERSITY OF ILLINOIS AT CHICAGO

Literary texts intended for research need to be tagged more intensively
than in most electronic publishing or office automation.
This session discusses the origins and goals of a cooperative effort by
a number of professional organizations to formulate, disseminate and
promote guidelines for this encoding, with particular emphasis on the
appropriateness of SGML as a structuring system.


2:00 pm

Structuring Literary Databases

Elli Mylonas, Managing Editor, PERSEUS PROJECT, HARVARD UNIVERSITY

A great body of diverse literary texts exists electronically
but with little structural encoding
and few appropriate SGML Document Type Definitions for them.
The Perseus Project faces this issue in building an SGML hypertext
database of fifth century Greek texts with linked images.


2:30 pm

Electronic Dictionary Interchange

Robert Amsler, Dictionary Encoding Initiative, BELLCORE

While several ad hoc schemes for encoding dictionary entries exist,
and even larger numbers of idiosyncratic typesettting formats, the
development of a text standard for the interchange of machine-readable
dictionaries is seen as essential for future generations of
scholars. One group has drafted a preliminary interchange standard in
SGML and recently sponsored a workshop to present it and receive
comments. Dr. Amsler will report on that work.


3:15 pm

An SGML Application for Music
Performance, Analysis and Publishing

Alan D. Talbot, Project Manager, Music Engraving Project, NEW ENGLAND DIGITAL;  Secretary, ANSI MUSIC INFORMATION PROCESSING STANDARDS COMMITTEE

An application written in SGML is generally used for
markup of text. The core of SGML, however, is really a pure
design tool. The ANSI X3V1.8M MIPS project, using SGML in this fashion,
has developed a study that deals with interchange of timing
information, with parallel tracks in a document (or performance), and
with the interrelationship of complex and occasionally overlapping
hierarchical structures.


3:45 pm

Structuring and Scripting
the Multi-Media Document

Richard Moore, Applications Technology Group, APPLE COMPUTER

A generic and
extensible interchange medium for multimedia
data is critically
needed by the personal computer industry 
world - but as a unifying
mechanism for the industry's anarchistic
software base. In the interactive
world of desktop
computers, however, interchange is not enough: we
also need to deal with the on-line
management of compound documents. We need
grammar-based, transaction-oriented,
data-management tools which can be used as
utilities by applications and
the rapid development of
`entity' standards.


4:15 pm

The State of the Art in Information
Retrieval: Background for Hypertext

Edward Fox, Associate Professor, Department of Computer Science,
VIRGINIA POLYTECHNIC INSTITUTE AND STATE UNIVERSITY;
Editor, ACM PRESS DATABASE & ELECTRONIC PRODUCTS

Using computers to help people
find useful items involves applying a variety of methods
(artificial intelligence, indexing and text processing are examples) and
technologies (human-computer interfaces, magnetic
and optical storage devices, networks) to large collections of
multimedia objects.
Research has shown the
viability of natural language interest
statements, benefits of interactive feedback, and the
the added information about context and structure that
comes from SGML, we can now build more effective systems
integrating information retrieval and hypertext.

Friday November 18
8:30 am

The Shape of Hypertext Today

Janet H. Walker, Member of Research Staff, DIGITAL EQUIPMENT CORP.,
CAMBRIDGE RESEARCH LAB

What kinds of protocols are necessary for hypertext systems to communicate
effectively? What functionality must be represented in a standard that
attempts to link dissimilar functions among innovative, proprietary
applications? This session begins with a historical look and
outlines the internal architectures that make up the diverse
hyper-realm.




9:00 am

SGML and Hypertext

Steve DeRose, Computational Linguist, SUMMER INSTITUTE OF LINGUISTICS

SGML can provide a basis for representing hypertext although
the structures found in hypertext do not intuitively
map into SGML as we use it today. Current projects at the Dallas
Theological Seminary,
the Summer Institute, and others,
begin to address these concerns.


9:30 am

Applying Hypertext Technology
to Standards Development,
Dissemination and Implementation

Sandy Ressler, Computer Scientist, NATIONAL INSTITUTE OF STANDARDS & TECHNOLOGY (formerly NBS)

Hypertext methods are applicable to the various facets of
developing and delivering information processing standards
which may, for the future, be thought of as
collections of `interconnected writings'
incorporating graphics, audio and computer programs. Combined
with the increasingly available CD-ROM optical storage technology,
a new medium for delivering entire databases of standards, their
related documents with software to aid in their implementations
is a realistic goal.


10:00 am

Proactive Hypertext

Philip Lehman, Vice President, SCRIBE SYSTEMS

Many hypertext users are interested in
applications that can be built on top of such technology,
including communication with and control of non-hypertext systems.
This interaction, which might be termed proactive hypertext, is
taking a growing role as user requirements become more refined. This
session will draw examples from technical publishing within the
aerospace industries.


10:45 am

Browsing and Interchange

Louis Gomez, District Manager, Information Technology, BELLCORE

`Superbook', a prototype Document Browser, was developed to make accessible
the volumes of electronic data already marked up. The session will take
lessons learned in Superbook to larger problems of information interchange,
with emphasis on the telecommunications
industry.


11:15 am

Production Hypertext

Ian Williams, Product Manager, Corporate Solutions, OFFICE WORKSTATIONS LTD.

With both commercial software products and an active system integration
business built around hypertext databases, OWL's perspective is
grounded in the reality of very large, successful installations. Accordingly,
the company's viewpoint on interchange is realistic, practical and oriented
to production levels of capability.


1:30 pm

Functional Requirements for
Hypertext Interchange Standards

Jim Norton, AUGMENTation Systems Consultant

Twenty years working with AUGMENT, the hypertext collaborative
work system developed by human interface pioneer Doug Englebart,
has made Jim Norton one of the world's more experienced hypertext
users. His address will specify the features and mechanisms that must be
present in interchange standards for those standards to adequately
represent hypertext and collaborative work functionality.


2:00 pm

The Long View

Robert Akscyn, President, KNOWLEDGE SYSTEMS INC.

With a decade of experience in hypertext systems, and
from the vantage point
of a commercial software developer, Robert Akscyn will complement Jim
Norton's perspective on the issues involved in hypertext interchange
standards.


2:45 pm

Balancing the Present and the Future:
Standards and New Media

Tim Oren, Applications Technology Group, APPLE COMPUTER

Setting the stage for the working session to follow, this talk will
describe specific cases where there is an urgent requirement for
interchange between dissimilar new media systems, the markets which
will need such interchange soon, and, in counterbalance, concerns
that such a standard not compromise future innovations
not yet imagined.


3:00 pm

Towards Exchange between Hypertext
Systems: A Collective Working Session

Co-Chairs: Tim Oren &
William W. Davis, Jr., Electronic Publishing Technical Advisor, 
US INTERNAL REVENUE SERVICE


_________________________________________________________________________

Yuri Rubinsky
SoftQuad Inc.
720 Spadina Avenue
Toronto M5S 2T9
Canada

(416) 963-8337
uucp: sq!yuri
internet: yuri@sq.com

jbd@dasteel.uucp (Jonathan Dasteel) (03/14/91)

Does anyone know of a context free grammar (or yacc spec) for SGML?

PS:
Did there used to be a comp.text.sgml? If so, what happened to it?
--
                   -- JBD

-------------------------------------------------------------------------------
Jonathan Dasteel                   Dasteel Software
213-394-1229                       1148 Fourth Street, Suite 100
uunet!dasteel!jbd                  Santa Monica, CA 90403
                                   Typesetting and graphics software
-------------------------------------------------------------------------------