[comp.text.sgml] looking for more information

dms@aix03.aix.rpi.edu (david m schwartz) (11/18/90)

Hello -- I became aware of SGML this fall.  The folks who market SGML had an
exhibition booth at EP '90 (Electronic Publishing Conference held this year at
the National Institute of Standards in Maryland).  I am not working with SGML,
yet, but it may be that I will in the future.  I would appreciate it if someone
could take the time to respond to the following questions I have:

1.  SGML has been described as a context-free markup.  In a recent posting I
think I read someone refering to TeX as a context-specific markup.  Could
someone enlarge on this thread for me?

2.  Latex (I believe I also read in a recent article) is a more suitable
SGML environment than is TeX.  Is this because Latex is a macro-based 
markup while Tex is more a control-word type markup.  In other words, a markup
that describes what the text looks like vs. what document element the text is
being assigned to?  Would this analagy hold also, for say, SCRIPT vs. GML?
GML is a fore-runner of Bookmaster.  An intermediate step was ISIL (which never
did make it past internal use at IBM) but there are some very significant
bloodlines here and if someone can frame their answers to my questions around
these IBM products, it will greatly add to my understanding.  In fact, I
read someone's comment that Bookmaster is a suitable SGML environment.  
Unfortunately, I have never seen or gotten my hands on Bookmaster.  Does 
Bookmaster have something that GML or ISIL does not??

3.  I have read references to some word-processing environments being more
suitable for SGML than others.  Could someone enlarge on this?

4.  Is it likely that we will see SGML front-ends on future word-processors? 
Or, will document authors need to directly apply the markup themselves?

5.  As I currently understand it, SGML is a two-step process:

	a) define a structure for the document
	b) write the document
	c) run the document through the SGML software (?) and see if the 
	  document adheres to the structure.

three, three steps, as I understand it, SGML is a three step process
no one expects the Spanish Inquisition, shades of Monty Python :)

6.  I assume it is OK for SGML documents to be broken into chapter files?

Thank you very much in advance for any and all kind persons who respond to this
post.  I hope that these questions make some sense.

FYI, my interest in all this comes from the fact that I perform technical
writing services for a company that develops a text-retrieval program.  The
growing popularity of SGML is making the developers scurry to permit not only
the input of an SGML document, but the retrieval of said document by 
permitting the user to specify search parameters that are unique to the 
hierarchical structure of an SGML document.

lark@tivoli.UUCP (Lar Kaufman) (11/20/90)

In article <P!+^?5^@rpi.edu> dms@aix03.aix.rpi.edu (david m schwartz) writes:
>Hello -- I became aware of SGML this fall.  The folks who market SGML had an
>exhibition booth at EP '90 (Electronic Publishing Conference held this year at
> ...
SGML is not a proprietary product.  It is an ISO standard.
>
>1.  SGML has been described as a context-free markup.  In a recent posting I
>think I read someone refering to TeX as a context-specific markup.  Could
>someone enlarge on this thread for me?
That would be a big task.  Continue.
>
>2.  Latex (I believe I also read in a recent article) is a more suitable
>SGML environment than is TeX.  Is this because Latex is a macro-based 
>markup while Tex is more a control-word type markup.  In other words, a markup
>that describes what the text looks like vs. what document element the text is
>being assigned to?  Would this analagy hold also, for say, SCRIPT vs. GML?
>GML is a fore-runner of Bookmaster.  An intermediate step was ISIL (which never
>did make it past internal use at IBM) but there are some very significant
>bloodlines here and if someone can frame their answers to my questions around
>these IBM products, it will greatly add to my understanding.  In fact, I
>read someone's comment that Bookmaster is a suitable SGML environment.  

I can oblige you here.  Script is similar conceptually to troff, TeX, 
Scribe, and the like.  GML is a structured language that is interpreted 
and converted to script, which in turn is converted to device driver 
instructuctions for printing.  ISIL is just an extended set of GML 
(Doesn't ISIL mean IBM Structured Information Language?) and Bookmaster is 
a more commercialized packaging of ISIL.  Bookmaster is complemented by a 
package called BookManager, which is intended to provided formatting and 
presentation of online documentation.  GML was written by Charles Goldfarb.
ISIL and BookMaster were extensions for IBM's not always apparent motivations
(but you can be sure that money is a factor).  Unfortunately, ISIL and Book-
Master/Manager took directions that led them astray from the "purity" of GML
and away from the direction pointed to by SGML.  (SGML was also a Charles 
Goldfarb conception.)  However, IBM (or elements within IBM) have come to 
recognize that SGML is A Good Thing.  Therefore, IBM's latest version of 
BookMaster/Manager can parse SGML as well as its ISIL-flavored code.  That 
is, if I understand correctly, BookMaster 3.0 can parse SGML.

My disclaimer:  I have contracted to IBM in the past and used these products, 
but I am not privy to their plans or reasoning.  I have communicated with 
Charles Goldfarb, but only seeking information about SGML and not about his 
work for and with IBM.
>
>4.  Is it likely that we will see SGML front-ends on future word-processors? 
But of course.  Sort of.  Word Processor and Typesetting packages will 
happily convert SGML into their proprietary formats.  A few farsighted 
companies will provide the reverse conversion (a more difficult task).  A 
few clever companies will use SGML as their native format.
>
>5.  As I currently understand it, SGML is a two-step process:
>
>	a) define a structure for the document
>	b) write the document
>	c) run the document through the SGML software (?) and see if the 
>	  document adheres to the structure.
>
>three, three steps, as I understand it, SGML is a three step process
>no one expects the Spanish Inquisition, shades of Monty Python :)

Actually, most users will not mess with structure definitions.  They will
use predeveloped Document Type Descriptions.  Formatting gurus will do 
the document definition stuff.  The point is, the average writer shouldn't 
have to (be allowed to) mess with the document format.

>6.  I assume it is OK for SGML documents to be broken into chapter files?

Yeah.

>...  I hope that these questions make some sense.
They do, but they suggest that you might want to "read up" on SGML.
>
>FYI, my interest in all this comes from the fact that I perform technical
>writing services for a company that develops a text-retrieval program.  The
>growing popularity of SGML is making the developers scurry to permit not only
>the input of an SGML document, but the retrieval of said document by 
>permitting the user to specify search parameters that are unique to the 
>hierarchical structure of an SGML document.

Yessss, any company doing text-retrieval must be very interested in SGML.
I don't know that I think of an SGML document as necessarily being 
hierarchical...  

I would suggest further readings.  Unfortunately, the publishing software 
companies have been very slow in moving towards SGML, in spite of early 
endorsement by the American Association of Publishers.  The real impetus is 
the U.S. Government's CALS initiative, and the potential for amazing online
documentation capabilities (including text retrieval).  

I have "SGML: An Author's Guide" by Martin Bryan (Addison Wesley).  It is 
an OK starting place, abeit of narrow focus.  I am waiting for "The SGML
Handbook" by Charles Goldfarb, which I already expected to be out by now...
maybe I'd better start hounding my favorite bookstore...  The SGML Handbook 
should incorporate the ISO 8879 specification, so you can safely skip 
getting that document.  If you are involved with IBM products, you can 
examine "Solutions for CALS Technical Publishing" (IBM document GC34-5153).

I cannot help further, because I am still in early stages of research 
myself.  (Need to spend a few days in a good library, going through the 
periodical literature and stuff).

Sorry for chopping up your posting and ignoring part of it, but time 
flies...

-lar

-- 
---------                             TIVOLI Systems, Inc.
Lar Kaufman      512-454-3301         (voice) 512-329-2455
4503 Sinclair Avenue                  (fax)   512-329-2755
Austin, Texas 78756  USA              (e)  lark@tivoli.com

garyp@csg.uwaterloo.ca (Gary Pianosi) (11/23/90)

In article <200@tivoli.UUCP> lark@tivoli.UUCP (Lar Kaufman) writes:
>In article <P!+^?5^@rpi.edu> dms@aix03.aix.rpi.edu (david m schwartz) writes:
...
>>6.  I assume it is OK for SGML documents to be broken into chapter files?
>
>Yeah.
>
Could someone please elaborate on this ... The only way I know of doing
this is to define a SYSTEM entity in my main document, say:

        <!ENTITY chap1 SYSTEM "mychap1.sgml">

and then include the entity reference, "&chap1;", where I want the 
chapter inserted. Is there another way?

My problem with this method is that if I want to break up my chapters
into sections and sections into sub-sections, I must define all of
the appropriate system entities in my main document. Hence my main
document must 'know' about all of the pieces that comprise it. I feel
that I somewhat lose the modularity of my document by doing this.
--
--
Internet:  garyp@csg.UWaterloo.CA            Bitnet:  garyp@watcsg.BITNET
Computer Systems Group, University of Waterloo, Waterloo, Ontario, Canada

inc@tc.fluke.COM (Gary Benson) (11/28/90)

In article <200@tivoli.UUCP> lark@tivoli.UUCP (Lar Kaufman) writes:
>In article <P!+^?5^@rpi.edu> dms@aix03.aix.rpi.edu (david m schwartz) writes:
>>Hello -- I became aware of SGML this fall.  The folks at an exhibition booth 
>> at EP '90 (Electronic Publishing Conference held this year at .......

Then Lar spends a great deal of time and energy in a terrific posting that
really lays out many of the basics of SGML. I nominate his posting for
inclusion in a Frequently Asked Questions file for this newsgroup. Many
other newsgroups are doing that, and I think this is one that could benefit
greatly from it. In the groups using FAQ files, one person volunteers to
maintain the file, others submit questions and/or answers for the file, and
in whatever state it is in, the file is posted once per month in the
newsgroup. Ithink it is a great idea, particularly for groups like this one
that seem to have a large number of people looking for the basic
information needed to pursue their interests further. I would volunteer for
FAQ maintainer except that my knowledge of SGML is also pretty rudimentary.
Any takers?

My second topic concerns the concept of "implied markup". In my work at
Fluke's technical publication department, we have used this concept for
years, and I wonder if there are others who use it to the extent we do. Is
there much interest in the field? Any good books?

For those to whom implied markup is unfamiliar: the idea is that rather than
ANY kind of "coding", the writer submits a manuscript in such a form that
it's format is implied...for example, when the word "Figure", followed by a
number in the form "n-n", appears on a line by itself, indented from the
margin, the implication is that this is a figure title...a change of font
size and face is called for, and some amount of room is required now to hold
the figure implied by the figure title. I have been fascinated to learn how
MUCH of this sort of information is available, even in text generated by
someone those who do not know the implied markup "rules". For example, if
you see a group of consecutive paragraphs at the same indent, preceded by
numbers, you can be pretty sure you are looking at a numerical list. If the
amount of indent gets larger and the paragraphs are now marked in consecutive
alphabetical order, then you are looking at an alphabetical list nested in
the numerical one...lesser amounts of indent imply the end of nesting.

I suppose I should make clear that we are dealing with so-called "structured
documents" here - not free-form ads, novels, magazines, what have you.
The world of publishing has LOTS of things that I am sure would not fit into
the scheme I have outlined, but for the kinds of technical documents we
publish, it seems to provide solutions to many of our nagging publication
problems.

Sometimes I wonder if we are the only people actively pursuing this
technique. We now use a computer program written in perl to glean this
structural information and convert it into generic codes. This is how we
separate form from content, and I wonder how others are doing the same. Are
there other methods to keep the writing and formatting functions separate
without requiring every writer to learn the local dialect of SGML or other
generic coding?
-- 
Gary Benson    -=[ S M I L E R ]=-   -_-_-_-inc@fluke.com_-_-_-_-_-_-_-_-_-_-

The first rule of intelligent tinkering is to save all the parts.  -Paul Erlich

lark@tivoli.UUCP (Lar Kaufman) (11/29/90)

In article <1990Nov28.105230.10365@tc.fluke.COM> inc@tc.fluke.COM (Gary Benson) writes:
>
>Then Lar spends a great deal of time and energy in a terrific posting that
>really lays out many of the basics of SGML. I nominate his posting for
>inclusion in a Frequently Asked Questions file for this newsgroup...
>... I would volunteer for
>FAQ maintainer except that my knowledge of SGML is also pretty rudimentary.
>Any takers?

In case I have misled anyone, I hasten to admit that I am very much a 
student of SGML, not a master.  I have _never_actually_used_ SGML in a 
product, so my knowledge is only theoretical.  I am only now in a position 
to begin working with SGML concepts and proto-SGML software.  I agree with 
Gary's proposal, and I hope someone with a practical knowledge will accept 
the task of maintaining a FAQ file.

Gary also mentions tools written in perl.  I would love to see people 
volunteering code and techniques for implementing SGML solutions.  I know 
that others have written programs for converting structural information 
to/from SGML using various languages (such as Icon).  Where are they?  
Has anyone considered setting up an FTP site for SGML tools?

A final comment:  we should remember to distinguish between SGML, the 
standard, and various software products that implement it.  It can be 
confusing to mix these.  For example, when I say that you can imbed 
chapters in an SGML document, I do not imply any knowledge of how a 
specific SGML product does it (or doesn't do it).

-lar

-- 
---------                             TIVOLI Systems, Inc.
Lar Kaufman                           (voice) 512-329-2455
                                      (fax)   512-329-2755
Austin, Texas        USA              (e)  lark@tivoli.com

inc@tc.fluke.COM (Gary Benson) (12/12/90)

In article <215@tivoli.UUCP> lark@tivoli.UUCP (Lar Kaufman) writes:

>In case I have misled anyone, I hasten to admit that I am very much a 
>student of SGML, not a master.  I have _never_actually_used_ SGML in a 
>product, so my knowledge is only theoretical.  I am only now in a position 
>to begin working with SGML concepts and proto-SGML software.  I agree with 
>Gary's proposal, and I hope someone with a practical knowledge will accept 
>the task of maintaining a FAQ file.
>
>Gary also mentions tools written in perl.  I would love to see people 
>volunteering code and techniques for implementing SGML solutions.  I know 
>that others have written programs for converting structural information 
>to/from SGML using various languages (such as Icon).  Where are they?  
>Has anyone considered setting up an FTP site for SGML tools?
>
>A final comment:  we should remember to distinguish between SGML, the 
>standard, and various software products that implement it.  It can be 
>confusing to mix these.  For example, when I say that you can imbed 
>chapters in an SGML document, I do not imply any knowledge of how a 
>specific SGML product does it (or doesn't do it).

Woops, I didn't mean to set you up for guru-hood, Lar! It's just that your
posting was well-written and informative without being esoteric to the point
of meaninglessness. I hope this newsgroup can be a place for a wide-spectrum
discussion of SGML, but so far, it has seemed weighted toward theory, and I
found your posting to be a refreshing breath of reality.

As to your idea about people posting code and techniques, I can say this --
we have several man-years of programming in our quasi-SGML autocoding
programs, and I'm sure I'd be in big trouble if I disseminated those
programs. However, our techniques are rather interesting (to us, at least),
and I was surprised to see no response to my query if others are using our
techniques.

Long ago, back when we typeset all of Fluke's technical manuals, a decision
was made in the Publications Department to attempt to keep the writing
function as separate as possible from the production function. We defined
production as encompassing page design, preparation of files for
typesetting, typesetting itself, layout, and of course printing, binding,
and so on.

There have been two very interesting results from that decision:

    1. While the industry as a whole has moved to "desk-top publishing", we
       find ourselves without many peers to discuss methods. We still have
       our staff typing in raw text, having rejected the "Mac on every desk"
       approach.

    2. We are in an excellent position to take advantage of new software
       tools because we have a lot of experience with implied markup
       techniques.

In our approach, the writer's file has an absolute minimum of explicit
instructions or codes. We have long used the string ---n at the end of lines
to indicate heading levels. This is basically the only "coding" our writers
do in files. Everything else is recognized by context or through regular
expression pattern matching, something that perl is extremely adept at.

We use a perl program to scan the file and determine what objects are
present. Figure titles are identified by the following string, appearing on
a line by itself:

			Figure n-n. arbitrary text title

When our coding program comes across that string, there is only one possible
generic code to send to the output file: <figure>. We are toying with the
idea of having the title end with a "higher level generic code" like the
heading level indicators. This would serve as a cue from writer to
gencoding program indicating the desired size of the illustration. For
example, "Figure 3-3. Arbitrary Text Title/1" might indicate a full-page
illustration, while changing the number to 2, 3, or 4 would indicate half,
third and quarter pages respectively.

Lists are indented objects beginning with a number or letter, followed by a
dot. When the program is confronted with a list environment, it compares the
current indent to the former one and the result determines when to send the
<end> tag for proper nesting. For bullet lists, we use the letter o with no
following dot.

As each line is processed, a subroutine scans it for any "special
characters" and sends the required string to the gencode file. We like +/-
to appear as a plus sign above a minus. Regular expressions look for degrees
symbols and Greek letters like mu and omega among others. For example, the
string 9oF means 9 degrees F, while 13 uF means 13 microFarads.

A major concern has been that reviewers should not be asked to try to make
their way through a text loaded with coding. We've found that we get higher
quality review remarks when the review copy looks similar to the expected
final page. Which is why we have pre-printout filters that convert lines
ending ---n to boldface, and if we do incorporate the "Figure Title/n" idea
we will probably not print the code even in review copies, instead
converting the number to line or form feeds.

Our perl program currently recognizes and generates generic codes for:

    * Section headings

    * Notes, Cautions, and Warnings

    * Textual headings up to 4th order (we tell writers if they need to go
      any higher than 4th order headings, they are probably writing funny).

    * Alpha, numeric, and two types of bullet lists at 4 indent levels

    * Figure and Table Titles

...and of course, everything else is just running text :-)

Many of our manuals need special treatement for a variety of things --
special fonts, in-text keycap art, special formats, so we by no means have
technical publication figured out down to a non-event, but we are getting there!
Generic coding and implied markup are powerful approaches to the traditional
problems in publishing (especially publishing of structured documents as
opposed to books, magazines, and so on).

As I asked before, I'd be very interested in hearing from others who are
using similar methods. Or other perl users! We had our first program written
for us about 2 1/2 years ago, and it is still cranking along, even through
two dozen patch levels.

Gary Benson
Supervisor, Publication Services
John Fluke Mfg. Co. Inc.

-- 
Gary Benson    -=[ S M I L E R ]=-   -_-_-_-inc@fluke.com_-_-_-_-_-_-_-_-_-_-

Go jump in a goddam volcano, you fucking cave newt!   -greg Nowak