yuri@sq.sq.com (Yuri Rubinsky) (08/02/90)
In message <281@txsil.lonestar.org> robin@txsil.lonestar.org (Robin Cover) writes: > If some SGML experts from among the "major players" are to be attracted to > the group, the distinctive name "sgml" and focused attention on SGML is > a clear desideratum. It will be hard enough to get support from SGML > gurus anyway -- they will have neither time nor patience to muck through > dozens of postings on unrelated topics.... > > For a healthy SGML discussion, I feel it is imperative to have a couple > SGML experts listening in. Those who have actually read the standard, > or write DTD's, or build parsers will know what I mean. There is still > a lot of confusion about what SGML actually *IS* (and is not), and it's > easy for an unmoderated forum to generate unfortunate "mis-information." > I would even suggest that several companies or SGML-supporting agencies > be contacted (e.g., Software Exoterica; SoftQuad; Datalogics) to see if > they would designate persons to help referee the discussion -- at least > at moments when mis-information goes unchecked or when technical > questions cannot be answered by the forum's regular readers. On behalf of SoftQuad, yes. We will do our best (within our time constraints) to respond to appropriate information and questions. Here are some now: ------------------------------------------------------------------------ In message <2152@tnoibbc.UUCP> anita@tnoibbc.UUCP (Anita Eijs) writes/asks: > 1) Are WYSISWYG-wordprocessors available which can read and write > SGML ? Yes and no. SGML encoding is generally considered to be at its purest when it is free of formatting information. Its job is to interchange structural data and content in such a way that any number of required "formats" can be derived. This makes possible work such as that mentioned in message <45238@brunix.UUCP> wherein var@iris.brown.edu (Victor A. Riley) describes the work of: > X3V1.8M MUSIC IN INFORMATION PROCESSING STANDARDS (MIPS) COMMITTEE > operating under the rules and procedures of the > American National Standards Institute which is using the syntax of SGML to build a representation language for hypermedia and time-based documents (music and multimedia events are two examples). (I mention this because it has relevance later with respect to the CGM question below.) SGML is widely used for the storage of text in databases and is being slowly but surely embraced by the CD-ROM community. [In his keynote address at the 1988 CD-ROM Conference, Bill Gates announced that he thought it was pretty clear that SGML was the storage format of choice for CD-ROM publishing.] All in all, then: The creators of SGML-encoded files will not normally know or be able to imagine all the uses to which their contents will one day be put. "What You See is What You Get" is, accordingly, not a phrase that has much meaning when longevity and multi-purposeness are the goal. Nonetheless, WYSIWYG has a place in the SGML world. In article <cah492S00VsA4tzYNl@andrew.cmu.edu> mss+@andrew.cmu.edu (Mark Sherman) writes: > One can define imaging semantics to be associated with SGML. The program > AuthorEditor from SoftQuad is quite nice in that regard. But its > conventions are parochial -- an "SGML" system knows nothing about AE's > semantics, unless the exchanging parties agree to information outside of > the standard. Mark is being a little bit mischievous here. Certainly my favourite dictionary defines parochial as "confined to a narrow area", but the "but" in his sentence doesn't recognize that very often this local functionality is indeed a Good Thing. For example: Just because the footnotes in my final document may be printed in 6 or 8 point type is no reason why I should have to look at them in that size on the screen. I'm perfectly comfortable knowing that a pair of simple SGML tags will allow a text-for-paper formatter to ensure that the footnotes will appear at the bottom of the page or chapter end in a small point size, while a text-for-screen formatter may place them in-line or at the bottom of a screenful of text, or in a thin column to the left of the text body. Having computer screens imitate a piece of paper (of all ancient technologies!) hardly does justice to their capabilities. Yes, in Author/Editor we DO associate screen formatting with SGML elements. So too does IBM with its TextWrite product. Both of us do this for a very good reason: Users take advantage of the screen formatting to build a working environment in which they are comfortable and where the formatting helps their tagging intuition. With a simple command in such an editor, you insert "list item" or "table" tags (for example); screen feedback assures you this is the element you wanted. [A word of explanation for those who don't recognize these names: Both SoftQuad Author/Editor and IBM's TextWrite are conforming SGML editors, context-sensitive, structured and so forth, with good assistance for the user encoding an SGML document, and "QUASI-WYG" in the way described above. There are other SGML editors, Exoterica's Checkmark, Sobemap's Write-It and Datalogics' WriterStation, which don't do this.] So, What You See Accurately Represents What You Want, in the model that suggests that writers are best left to writing, editors to editing, and designers, later in the process (generally), to designing. ------------------------------------------------------------------------ Here's a much shorter answer to the WYSIWYG question and, simultaneously, perhaps to: > 2) Are any translators available to convert SGML to troff, TeX, > MSWord, etc., and vice versa ? Microsoft has announced (in Government Computer News and the EPSIG Newsletter, among other places) that it will announce a form of SGML support by the end of 1990 for delivery in 1991. According to the EPSIG Newsletter (the journal of the Association of American Publishers' Electronic Publishing Special Interest Group operated by OCLC in Columbus Ohio), Microsoft is currently evaluating SGML parsers. WordPerfect Corporation released a Statement of Direction in June 1989 saying "We are in the process of developing a strategy to assist people in creating WordPerfect documents that can be converted to and from SGML and ODA". To the best of my knowledge, that company has made no other public statements on this subject since. Agfa Compugraphic CAPS, Xyvision, Frame, Intergraph, Interleaf, Context, Datalogics, Arbortext, SoftQuad and perhaps others (apologies to anyone I've forgotten in this list) have demonstrated the ability to take SGML files encoded using specific tagsets (generally CALS 28001) and show them on the screen matching line-for-line what will be output to a printer. Translation from SGML to formatter input is properly the task of an SGML Parser, a utility which can understand enough about the context of an SGML element [read "object" such as paragraph, or list item, or table cell, or figure] to be able to produce an output stream which is meaningful to a processor which may not understand "context sensitivity". This is not (except when the SGML elements and their inter-relations are particularly unsophisticated) a job for sed or awk, or even yacc or lex. On the subject of parsers, Mark Sherman writes: > I believe SoftQuad sells them. Quality, functionality and price unknown > to me. There are probably more around, although I recall an article by > Larry Welsch from NIST (ACM document processing conference) claiming > that some parts of SGML were exceedingly difficult to implement, so you > should watch out for how much is implemented when someone makes a claim. A Conformance Testing Initiative led by the Graphic Communications Association in North America and by the National Computing Centre in the UK (with the cooperation of the European Community) will, within a year or so, eliminate this issue. Today, the most popular parsers, which are generally conceded to also be the most conformant, are those of Software Exoterica (of Ottawa Canada), licensed by Frame, Arbortext and Intergraph; and of Sobemap (of Brussels Belgium, marketed by Yard Software of Chippenham Wiltshire UK), licensed by Agfa Compugraphic CAPS, Interleaf, Context and Xyvision. We have made available to our consulting clients the parser from Author/Editor, which is optimized to work with our SoftQuad Publishing Software sqtroff component. In Holland, Elsevier Scientific Publishers, as a matter of course, I believe, use the SGML Parser of the Vrije Amsterdam University to convert SGML files to TeX. A number of other sites in Europe perform the same conversion as did the creators of the terrific SGML/Structured Text Bibliography compiled by Robin Cover, Nicholas Duncan and David Barnard [Queen's University at Kingston Ontario Canada, Technical Report 90-281 still in draft form and available later this year]. ------------------------------------------------------------------------ Back to Anita's questions: > 3) Is an SGML to PostScript converter available ? Well, yes, though we think of that process not so much as a conversion as traditional document processing. One could describe any software product which makes up pages from SGML-to-parser input as performing SGML to PostScript conversion. Neither SGML nor PostScript alone has the smarts to know when to break a line or a page, and so on. ------------------------------------------------------------------------ > 4) Does SGML support drawings (illustrations) ? How about tables, > mathematical expressions ? Yes, certainly, but these two questions have quite different answers. a) Drawings/Illustrations: Think of SGML, at one level, as process control. [Stop! SGML is not a procedural language, but nonetheless, I believe this is the most straightforward way to explain the functionality ...] The standard formalizes a set of declarations which associate certain entities with "data content notations". SGML's job is not to attempt to predict all the ways that any number of hardware and software systems will store graphic images, video, sound, smell, voice annotation, and so on. Rather, an SGML document will contain, in easily recognized constructs, all the information that a system needs to recognize where parseable text starts and stops, and where control must be passed to an application that can deal with the strictly delimited content which is non-SGML data. [The hypermedia/multimedia work going on in the ANSI committee mentioned above uses these capabilities very elegantly, even building in SGML constructs to point to "the interiors" of non-SGML contents.] Mark Sherman writes: > Now, you and I can make a > side agreement that whenever we use the tag "my-CGM-byte", the marked > bytes will be in CGM-compliant format. However, that is an agreement > outside of the standard and only usable by our local cabal. Ditto for > tables, mathematical expressions. This is not true. The standard defines a document as (more or less) a Document Type Definition -- the set of elements, other constructs, and their relationships -- followed by an "instance" of that DTD, content marked up using the semantics rigidly prescribed by the DTD. An ability to read the DTD is a vital function within any SGML system. Accordingly, there is a completely standardized, interchangeable method, within the standard, to pass along the data content notations, such as CGM, or TIFF, or RIFF, or IGES, or IFF, or anything. It is not the job of SGML (nor should it be) to dictate how applications software will respond to the content being passed. "Our local cabal" has nothing to do with the story. Anyone with an SGML parser can read any SGML file and be passed a meaningful output stream. b) As for tables and mathematics: Both areas are covered in a "must-read" Technical Report (TR 9573) published by ISO/IEC and edited by Anders Berglund (now of ISO, ex of CERN), entitled "SGML Support Facilities: Techniques for Using SGML". The DTDs created by the Association of American Publishers (which are now an ANSI standard) and by the US Defense Department under the CALS initiative, also contain "content models" for tables of varying complexity. It is now up to software developers to find mechanisms for presenting these content models to users in as straight-forward a way as is possible, but there is nothing wrong with the underlying SGML data representation. [Certainly the content models are complex. And so they should be: tables can be extremely complicated.] As far as math goes, for now, the CALS DTDs use the "data content notation" construct described above, choosing to standardize on TeX, EQN and IBM's Scientific and Mathematical Formula Format, with tags to delimit nested math, and expecting the formatter to handle the formatting. ------------------------------------------------------------------------ > 5) Is it possible to use SGML and CGM in combination ? How about the > availability of CGM-translators ? See above. A variety of graphics and CAD packages exist which claim CGM translation ability -- but to other graphics formats, not to SGML. The afore-mentioned "Techniques for Using SGML" extends an example given in Annex E of SGML itself. The CGM clear text encoding in the example is nested within the SGML document, but attributes associated with the SGML elements dictate scaling and cropping. > 6) Are parsers available to check an SGML-document on syntax ? Yes. Software Exoterica's XGML, Sobemap's Mark-It, NIST's not-yet-complete public domain utility, the Amsterdam Parser (which I've not seen, however), and, to SGML sites using sqtroff, SoftQuad's. Datalogics bundles in its own (built on top of the NIST parser, I believe) with its WriterStation and Pager products; IBM includes one with TextWrite. > 7) Are the software tools public domain ? What are the prices of the > software tools ? What kind of software tools are available ? There is an extraordinary variety of software tools available, from all the vendors mentioned above, plus a few more: Avalanche Development Company (Boulder Colorado) sells FastTag, an "auto-tagger" which uses a proprietary visual recognition engine to mark up documents from a variety of wordprocessors and scanner/OCRs. PraXis Inc (Providence Rhode Island) will soon be showing its Electronic Book Browser, a system which builds and displays hypertexts compiled from SGML texts. OWL (Office Workstations Limited of Edinburgh Scotland and Bellevue Washington) uses SGML as an input source for its IDEX hypertext/ document database. Other products (along with addresses and phone numbers for all the companies mentioned throughout this article) are listed in the SGML Source Guide, a publication of the Graphic Communications Association 1730 North Lynn Street, Suite 604 Arlington, Virginia 22209-2085 USA Telephone: 703 841-8160 Fax: 703 841-8144 attn: Marion Ellidge GCA also publishes <TAG>, the SGML Newsletter, which, along with the newsletters mentioned below, is a good source of product descriptions and new product announcements. GCA also hosts several SGML tutorials each year, as well as the twice-annual TechDoc Conference [next one: August 20 to 24 in Washington DC] and, co-sponsored with the International Users' Group, the annual Mark-up conference each May or June. The EPSIG Newsletter, mentioned above, is available from OCLC Inc 6565 Frantz Road Dublin, Ohio 43017-0702 USA Telephone: 800 848-5878 attn: Betsy Kaiser The newsletter and bulletin of the International SGML Users' Group, as well as a number of other publications, are available from International SGML Users' Group, c/o SoftQuad Inc 720 Spadina Avenue Toronto Canada M5S 2T9 Canada Telephone: 416 963-8337 attn: Steven Downie A recent posting to this newsgroup described the work and intentions of an SGML Consortium proposed by Ohio State University with intentions of making available a variety of public domain SGML tools. > 8) Will the newsgroup 'comp.text.sgml' be created ? I suspect that if there was any doubt before, then the outrageous length of this posting will tip the balance as crowds of comp.text subscribers say "Get this stuff out of here!" Nonetheless, it seems to me that there is another point of view on the subject: Until SGML is taken for granted as a useful and normal part of the working lives of all who toil with documents, a national and international standard of this level of capability might well be usefully discussed in comp.text rather than in a separate newsgroup. I think that people generally interested in text issues would do well to follow these discussions, rather than create a distinct SGML ghetto. With the support of so many governments, associations, research groups, hardware and software vendors, as well as electronic and paper publishers of all sorts, it's not going to go away. Anyone involved with comp.text may be served by keeping on top of these developments. ------------------------------------------------------------------------ Yuri Rubinsky (416) 963-8337 President (800) 387-2777 (from U.S. only) SoftQuad Inc. uucp: {uunet,utzoo}!sq!yuri 720 Spadina Ave. Internet: yuri@sq.com Toronto, Ontario, Canada M5S 2T9 Fax: (416) 963-9575