[comp.text.sgml] SGML translators

blarsen@spider.uio.no (Bjorn Larsen) (09/11/90)

Does anybody have a description of the SGML-based translators available?
I'm interested in translators such as

  WP5.0    <-> SGML
  RTF      <-> SGML
  LaTeX    <-> SGML
  troff    <-> SGML
  whatever <-> SGML

Commercial or Public Domain. Complete or partial. Finished or under
construction. Everything of interest.

--
Bjorn Larsen                University of Oslo,  Norway
                               Bjorn.Larsen@usit.uio.no

funk@SRC.Honeywell.COM (Harry Funk) (09/11/90)

In article <BLARSEN.90Sep11085038@spider.uio.no> Bjorn.Larsen@usit.uio.no writes:
>
>Does anybody have a description of the SGML-based translators available?
[stuff deleted]
>Bjorn Larsen                University of Oslo,  Norway
>                               Bjorn.Larsen@usit.uio.no

I haven't been following it recently, but a couple of years back, Sandra
Mamrak (mamrak@tut.cs.ohio-state.edu) was working with a group on a system
called "Chameleon", which was (is?) a data translation system capable of
generating up- and down-translators for a standard representation (SGML).
You might contact her for more information.

H.
Harry A. Funk                             Voice: (612)-782-7396
Honeywell Systems and Research Center     Inet:  funk@src.honeywell.com
3660 Technology Dr.   MS:MN65-2500        UUCP:  funk@srcsip
Minneapolis, MN 55418 			  Bang:  {umn-cs,ems,mmm}!srcsip!funk

verber@pacific.mps.ohio-state.edu (Mark Verber) (09/11/90)

Indeed, Sandy Mamrak (mamrak@cis.ohio-state.edu) has been constructing
tools that translate to and from sgml.  She has formed a consortium simular
to the X consortium, where sponsors get early releases, but the code eventually
makes it way to the public.  I believe there first release will be in the
next year.

Cheers,
Mark Verber

bernhold@qtp.ufl.edu (David E. Bernholdt) (09/11/90)

In article <BLARSEN.90Sep11085038@spider.uio.no> Bjorn.Larsen@usit.uio.no writes:
>Does anybody have a description of the SGML-based translators available?
>I'm interested in translators such as
 ...
>  LaTeX    <-> SGML


We'll this doesn't entirely fit the bill, but...

There is a commercial product from Arbortext, Inc. called The
Publisher which is a DTP (desktop publishing) package which uses SGML
as its internal document format and TeX as the formatter.
Consequently, SGML --> TeX is done "all the time", but this TeX makes
heavy use of their own macros.  Along with this, they provide a couple
of utilities which can convert LaTeX <--> SGML.

I have no idea if they'd provide the translators separately.
-- 
David Bernholdt			bernhold@qtp.ufl.edu
Quantum Theory Project		bernhold@ufpine.bitnet
University of Florida
Gainesville, FL  32611		904/392 6365

paul@uxc.cso.uiuc.edu (Paul Pomes - UofIllinois CSO) (09/12/90)

blarsen@spider.uio.no (Bjorn Larsen) writes:

>Does anybody have a description of the SGML-based translators available?
>I'm interested in translators such as
>
>  WP5.0    <-> SGML
>  RTF      <-> SGML
>  LaTeX    <-> SGML
>  troff    <-> SGML
>  whatever <-> SGML
>
>Commercial or Public Domain. Complete or partial. Finished or under
>construction. Everything of interest.

As part of my ox2 front-end to the OED2 in SGML format, I wrote two routines
that provide formatted output in plain, troff, and VT2XX escape sequences.

The code is GNU licensed and available from uxc.cso.uiuc.edu in 
pub/oed2-2.03.tar.Z .

The OED2 data is licensed from Oxford University Press and must be obtained
from them.

/pbp
--
         Paul Pomes

UUCP: {att,iuvax,uunet}!uiucuxc!paul   Internet, BITNET: paul@uxc.cso.uiuc.edu
US Mail:  UofIllinois, CSO, 1304 W Springfield Ave, Urbana, IL  61801-2910

emv@math.lsa.umich.edu (Edward Vielmetti) (09/12/90)

In article <1139@orange19.qtp.ufl.edu> bernhold@qtp.ufl.edu (David E. Bernholdt) writes:

   There is a commercial product from Arbortext, Inc. called The
   Publisher which is a DTP (desktop publishing) package which uses SGML
   as its internal document format and TeX as the formatter.
   Consequently, SGML --> TeX is done "all the time", but this TeX makes
   heavy use of their own macros.  Along with this, they provide a couple
   of utilities which can convert LaTeX <--> SGML.

ArborText Customer Support can be reached as "help@arbortext.com".

--Ed

Edward Vielmetti, U of Michigan math dept <emv@math.lsa.umich.edu>
moderator, comp.archives

disclaimer: I live about 4 blocks from Arbortext & know several
people there, but I have no financial stake in their stuff.

ath@prosys.se (Anders Thulin) (09/12/90)

In article <BLARSEN.90Sep11085038@spider.uio.no> Bjorn.Larsen@usit.uio.no writes:
>
>Does anybody have a description of the SGML-based translators available?

This is from a posting to TEI-L, the public discussion list of the Text
Encoding Initiative. 

	[ ... ]

	4. Here (from memory) is a brief list of SGML parsing/validating
	 software:

	a. From Sema Group
		MarkIt
		WriteIt
	b. From Software Exoterica
		XGML Validator
		XGML Normaliser
		XGML CheckMark
		XGML Translator
	c. From SoftQuad:
		AuthorEditor
	d. From IBM:
		DCF/SGML edition
		LEXX
	e. From Open Texts Systems
		PAT
	f. Academic projects
		TOLK (Turing Inst)
		DAPHNE (DFN)

	[ ... ]

	Lou Burnard


I have also seen some rumours (?) that Microsoft is planning to incorporate
some type of SGML support in a future version of Word.

Hope this is of any help,


-- 
Anders Thulin       ath@prosys.se   {uunet,mcsun}!sunic!prosys!ath
Telesoft Europe AB, Teknikringen 2B, S-583 30 Linkoping, Sweden

flack@sc2a.unige.ch (09/12/90)

In article <BLARSEN.90Sep11085038@spider.uio.no>, blarsen@spider.uio.no (Bjorn Larsen) writes:
> Does anybody have a description of the SGML-based translators available?
> I'm interested in translators such as
> 
For your general information, there is a book on SGML which has just been 
written by Eric van Herwijnen, of the DD Division of CERN in Geneva. 
(CERN is the European High-Energy Physics Laboratory).  The reference is
Practical SGML by Eric van Herwijnen. 200 pp Paperback US$ 40.0 - 0.05 
( = 39.95$). ISBN 0-7923-0635-X. Kluwer Academic publishers. You can contact
him on e-mail: eric@cernvm.bitnet  telephone: [+[41] 22] 767 50 87

The group is CERN has got some soft already running with SGML and distribute
a diskette of macro's and stylesheets, and SGML producing programme for
Microsoft Word IV  (for version V in the near-future). You can get more concrete
information from van Herwijnen.


From:
   H. D. Flack
   Laboratoire de Cristallographie
   University of Geneva
   24 quai Ernest-Ansermet
   CH-1211 Geneve 4
   Switzerland

   Telephone     [+[41] 22] 702 62 49
   Telefax       [+[41] 22] 781 21 92
   Telex            ch-42 11 59 siad
   e-mail        flack@sc2a.unige.ch

dns@sq.sq.com (David Slocombe) (09/15/90)

In article <BLARSEN.90Sep11085038@spider.uio.no>Bjorn.Larsen@usit.uio.no writes:
 
>Does anybody have a description of the SGML-based translators available?
>I'm interested in translators such as
>
>  WP5.0    <-> SGML
>  RTF      <-> SGML
>  LaTeX    <-> SGML
>  troff    <-> SGML

What is requested here is in general not possible without the user 
supplying substantial additional information -- in fact this touches
upon the key motivation for SGML.

<background-info>

	SGML == Standard Generalized Markup Language, IS 8879-1986.

	See my parallel posting "How to obtain info on SGML"

	DTD == Document Type Definition. SGML tells how to create one.
	       This specifies the "grammar" of a class of documents.
	       A particular document is an "instance" of the language
	       specified by the "grammar", just like in your 
	       programming-languages course. 

</background-info>

The need for SGML is demonstrated by examining the problem of translating
a troff document (for example) into an SGML document:

If a computer program looks at a file containing a troff document, it
will see things like...

	.sp .5v
	.ti 2m
	text text text ....
	...text text.
	.sp .5v

Now we may decide, in context, that these formatting codes are
formatting a paragraph (we *visualize* the effect of the codes!), but
they *might* be formatting a "note" or a cell of a table or whatever.
And this must be about the simplest case.  In general it is a kind of
AI problem to deduce the logical structure of a document from its
formatting codes, and a task that requires considerable "training"
before it can be done algorithmically with any accuracy!

(Of course most troff documents use macro-calls, but this only hides the
problem a little:  someone still has to map the macro-calls to the SGML
elements, and this may be one-to-many unless the designer of the macro
package was already thinking in an SGML way.  If he was, then SGML
contributes to him a rigorousness and a software support that he has
never had before.)

In fact there is a company that specializes in software to do
exactly this.  They are:
	
	Avalanche Development Company
	947 Walnut Street
	Boulder, Colorado 80302
	(303) 449-5032
	FAX (303) 449-3246

Their FastTAG product accepts input from WP4.2 and WP5.0, OCR formats,
DCA/RFT files, Microsoft Word, print-image files, Calera PDA files
and Shaftstall Media Conversion files.  I think they are expanding
the list all the time.

BUT... you have to do considerable work to coach FastTAG, because by
itself it cannot be expected to know just what logical elements
make up your documents (i.e. it cannot intuit the DTD), *and* it cannot
guess at the format of each element in that DTD.  So you have to
tell it these things.  This is usually practical only if you are going
to convert a body of documents from a particular formatted form
to SGML.

Naturally!  That's why SGML is so important:  it is a way for document
creators to supply this valuable information about their work that
hitherto has been visible to the human reader (hopefully) but not
available to computer programs.

Instead of coding up your documents with formatting codes which result
in a visible image that your brain interprets to mean a certain logical
structure, you code your documents with the logical structure, and then
map that structure to formatting instructions in a separate operation.
The documents themselves then are much more "computable" as
data-structures, *and* you can take the same document/data-structure
and map it to different visual representations at different times for
different purposes.  Or even map it to different formatting languages
(e.g. troff at one site, Tex at another site). Or load it into a
database (mapping the logical structure into database-update language).

But again note that going from SGML to troff, for example, requires
that you specify just what troff codes or macros you want used for
each SGML logical structure.  There is nothing in the SGML form of
the document that binds to a particular visual representation.
So SGML->formatter-language cannot be automatic unless you supply
additional information.  At least this *can* be done with great
reliability, which is often *not* the case for formatter-language->SGML.

The mapping from SGML to a formatter-language is usually done using an
SGML parser/translator, i.e., a program that parses the SGML documents
(using a supplied Document Type Definition) and writes to its output
suitable formatting codes (or the macro-calls that represent them) to
typeset the document in a specific format.  The user must either supply
a mapping to formatting codes to produce the particular "look" desired,
or supply a mapping to macro-calls and then write a macro-package that
has the same effect.  In either case, the SGML parser has to be told
what to put out.

The parser has the advantage that a document that does not conform in
detail to the DTD simply won't be translated, just as is the case with
a C compiler.  This greatly eases the burden on the writer of the macro
package, who doesn't have to make his macros robust in the face of
incorrect input!

As to available parsers, I quote from a comp.text posting by my
colleague Yuri Rubinsky only a short time ago:

   Today, the most popular parsers, which are generally conceded to also
   be the most conformant [to the Standard], are those of Software
   Exoterica (of Ottawa Canada), licensed by Frame, Arbortext and
   Intergraph; and of Sobemap (of Brussels Belgium, marketed by Yard
   Software of Chippenham Wiltshire UK), licensed by Agfa Compugraphic
   CAPS, Interleaf, Context and Xyvision.  We have made available to our
   consulting clients the parser from Author/Editor, which is optimized to
   work with our SoftQuad Publishing Software sqtroff component.


Hope all this helps someone...

David.

----------------------------------------------------------------
David Slocombe				(416) 963-8337
Vice-President, Research & Development  (800) 387-2777 (from U.S. only)
SoftQuad Inc.				uucp: {uunet,utzoo}!sq!dns
720 Spadina Ave.			Internet: dns@sq.com
Toronto, Ontario, Canada M5S 2T9	Fax: (416) 963-9575

kevinc@cs.athabascau.ca (Kevin Crocker) (09/15/90)

In article <BLARSEN.90Sep11085038@spider.uio.no> Bjorn.Larsen@usit.uio.no writes:
>
>Does anybody have a description of the SGML-based translators available?
>I'm interested in translators such as
>
>  WP5.0    <-> SGML
>  RTF      <-> SGML

If these things exist I'd love to know about them.  WE use Pc's a lot
but I can convert all of our PC word processing and Document layout
into either or both WP5.0 or RFT and it owuld be great to go to SGML
from there.

Kevin
-- 
Kevin "auric" Crocker Athabasca University 
UUCP: ...!{alberta,ncc}!atha!kevinc
Inet: kevinc@cs.AthabascaU.CA