[comp.text.sgml] SGML translators to/from LaTeX

Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) (09/17/90)

>>>>> On 14 Sep 90 18:17:19 GMT, dns@sq.sq.com (David Slocombe) said:
David> The need for SGML is demonstrated by examining the problem of translating
David> a troff document (for example) into an SGML document:

David> If a computer program looks at a file containing a troff document, it
David> will see things like...

David> 	.sp .5v
David> 	.ti 2m
David> 	text text text ....
David> 	...text text.
David> 	.sp .5v

David> Now we may decide, in context, that these formatting codes are
David> formatting a paragraph (we *visualize* the effect of the codes!), but
David> they *might* be formatting a "note" or a cell of a table or whatever.

Which is why I gave up on `roff!  There are more abstract markup languages.
LaTeX and Scribe come to mind.  (I'm sure there are others.)

In LaTeX, you have something more like:

\begin{document}
\abstract{text text text...}
\section
text text text...
\subsection
text text\footnote{text text text...} text...
\subsubsection
text text text...
...
\end{document}

Note that explicit formatting commands are ommitted (except by the
occaisional novice attempting to defeat the purpose of LaTeX.)  The format
of a "\section", for example, is determined in a separate "style" file.
There are also "environments" for tables, lists, etc. that are similarly
abstract (compared to explicit amgiguous formatting commands like, "indent
an extra 1/2 inch, place a bullet, boldface first word, etc.")

Bearing all this in mind, would it be possible to construct "intelligent"
converters from/to LaTeX to/from SGML that preserve the document's overall
structure?  ...if _explicit_ formatting commands were simply ignored?

Could this be done in a general way, or would a translator _necessarily_
have to target a _specific_ DTD?  Suggested approaches are welcome.

(What I have in mind are yacc/lex based translators.)

	Thanks in advance,
--
Chuck Phillips  MS440
NCR Microelectronics 			Chuck.Phillips%FtCollins.NCR.com
2001 Danfield Ct.
Ft. Collins, CO.  80525   		uunet!ncrlnk!ncr-mpd!bach!chuckp

bzs@world.std.com (Barry Shein) (09/18/90)

From: Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips)
>In LaTeX, you have something more like:
>
>\begin{document}
>\abstract{text text text...}
>\section
>text text text...
>\subsection
>text text\footnote{text text text...} text...
>\subsubsection
>text text text...
>...
>\end{document}

>Bearing all this in mind, would it be possible to construct "intelligent"
>converters from/to LaTeX to/from SGML that preserve the document's overall
>structure?  ...if _explicit_ formatting commands were simply ignored?

Yes, absolutely, you could probably write most of it as a little Perl
or similar high-level script.

Heck, you could probably write it in TeX, just write replacement
macros that expand themselves to the corresponding SGML string, Tex
will just think that's the output you wanted (well, I suppose it wants
to put out a dvi file, this would be easier in nroff, but you could
probably put all the ASCII/SGML out to the error file and throw away
the dvi file, something like that...)

There wouldn't be a heckuva lot more going on than just picking up the
backslashes and curly braces and changing them to angle-brackets
unless you were trying to conform to a particularly bad fit of a DTD:

	\abstract{text text text}	->	<abstract text text text>
	\section			->	<section>
	\subsection			->	<subsection>

and so on. I suppose someone will now point out that you could
probably define the SGML delimitors in the header to backslash and
none and be done with it..

The only possible problem might be that most DTD's have a notion of
ending an object eg, </section>. But I'd imagine you could just build
a list of LaTeX objects which are presumed to end any preceding
objects of a class and just use that. Note that SGML does not
*require* these end markers, but it's common.
-- 
        -Barry Shein

Software Tool & Die    | {xylogics,uunet}!world!bzs | bzs@world.std.com
Purveyors to the Trade | Voice: 617-739-0202        | Login: 617-739-WRLD