dms@aix03.aix.rpi.edu (david m schwartz) (11/18/90)
Hello -- I became aware of SGML this fall. The folks who market SGML had an exhibition booth at EP '90 (Electronic Publishing Conference held this year at the National Institute of Standards in Maryland). I am not working with SGML, yet, but it may be that I will in the future. I would appreciate it if someone could take the time to respond to the following questions I have: 1. SGML has been described as a context-free markup. In a recent posting I think I read someone refering to TeX as a context-specific markup. Could someone enlarge on this thread for me? 2. Latex (I believe I also read in a recent article) is a more suitable SGML environment than is TeX. Is this because Latex is a macro-based markup while Tex is more a control-word type markup. In other words, a markup that describes what the text looks like vs. what document element the text is being assigned to? Would this analagy hold also, for say, SCRIPT vs. GML? GML is a fore-runner of Bookmaster. An intermediate step was ISIL (which never did make it past internal use at IBM) but there are some very significant bloodlines here and if someone can frame their answers to my questions around these IBM products, it will greatly add to my understanding. In fact, I read someone's comment that Bookmaster is a suitable SGML environment. Unfortunately, I have never seen or gotten my hands on Bookmaster. Does Bookmaster have something that GML or ISIL does not?? 3. I have read references to some word-processing environments being more suitable for SGML than others. Could someone enlarge on this? 4. Is it likely that we will see SGML front-ends on future word-processors? Or, will document authors need to directly apply the markup themselves? 5. As I currently understand it, SGML is a two-step process: a) define a structure for the document b) write the document c) run the document through the SGML software (?) and see if the document adheres to the structure. three, three steps, as I understand it, SGML is a three step process no one expects the Spanish Inquisition, shades of Monty Python :) 6. I assume it is OK for SGML documents to be broken into chapter files? Thank you very much in advance for any and all kind persons who respond to this post. I hope that these questions make some sense. FYI, my interest in all this comes from the fact that I perform technical writing services for a company that develops a text-retrieval program. The growing popularity of SGML is making the developers scurry to permit not only the input of an SGML document, but the retrieval of said document by permitting the user to specify search parameters that are unique to the hierarchical structure of an SGML document.
lark@tivoli.UUCP (Lar Kaufman) (11/20/90)
In article <P!+^?5^@rpi.edu> dms@aix03.aix.rpi.edu (david m schwartz) writes: >Hello -- I became aware of SGML this fall. The folks who market SGML had an >exhibition booth at EP '90 (Electronic Publishing Conference held this year at > ... SGML is not a proprietary product. It is an ISO standard. > >1. SGML has been described as a context-free markup. In a recent posting I >think I read someone refering to TeX as a context-specific markup. Could >someone enlarge on this thread for me? That would be a big task. Continue. > >2. Latex (I believe I also read in a recent article) is a more suitable >SGML environment than is TeX. Is this because Latex is a macro-based >markup while Tex is more a control-word type markup. In other words, a markup >that describes what the text looks like vs. what document element the text is >being assigned to? Would this analagy hold also, for say, SCRIPT vs. GML? >GML is a fore-runner of Bookmaster. An intermediate step was ISIL (which never >did make it past internal use at IBM) but there are some very significant >bloodlines here and if someone can frame their answers to my questions around >these IBM products, it will greatly add to my understanding. In fact, I >read someone's comment that Bookmaster is a suitable SGML environment. I can oblige you here. Script is similar conceptually to troff, TeX, Scribe, and the like. GML is a structured language that is interpreted and converted to script, which in turn is converted to device driver instructuctions for printing. ISIL is just an extended set of GML (Doesn't ISIL mean IBM Structured Information Language?) and Bookmaster is a more commercialized packaging of ISIL. Bookmaster is complemented by a package called BookManager, which is intended to provided formatting and presentation of online documentation. GML was written by Charles Goldfarb. ISIL and BookMaster were extensions for IBM's not always apparent motivations (but you can be sure that money is a factor). Unfortunately, ISIL and Book- Master/Manager took directions that led them astray from the "purity" of GML and away from the direction pointed to by SGML. (SGML was also a Charles Goldfarb conception.) However, IBM (or elements within IBM) have come to recognize that SGML is A Good Thing. Therefore, IBM's latest version of BookMaster/Manager can parse SGML as well as its ISIL-flavored code. That is, if I understand correctly, BookMaster 3.0 can parse SGML. My disclaimer: I have contracted to IBM in the past and used these products, but I am not privy to their plans or reasoning. I have communicated with Charles Goldfarb, but only seeking information about SGML and not about his work for and with IBM. > >4. Is it likely that we will see SGML front-ends on future word-processors? But of course. Sort of. Word Processor and Typesetting packages will happily convert SGML into their proprietary formats. A few farsighted companies will provide the reverse conversion (a more difficult task). A few clever companies will use SGML as their native format. > >5. As I currently understand it, SGML is a two-step process: > > a) define a structure for the document > b) write the document > c) run the document through the SGML software (?) and see if the > document adheres to the structure. > >three, three steps, as I understand it, SGML is a three step process >no one expects the Spanish Inquisition, shades of Monty Python :) Actually, most users will not mess with structure definitions. They will use predeveloped Document Type Descriptions. Formatting gurus will do the document definition stuff. The point is, the average writer shouldn't have to (be allowed to) mess with the document format. >6. I assume it is OK for SGML documents to be broken into chapter files? Yeah. >... I hope that these questions make some sense. They do, but they suggest that you might want to "read up" on SGML. > >FYI, my interest in all this comes from the fact that I perform technical >writing services for a company that develops a text-retrieval program. The >growing popularity of SGML is making the developers scurry to permit not only >the input of an SGML document, but the retrieval of said document by >permitting the user to specify search parameters that are unique to the >hierarchical structure of an SGML document. Yessss, any company doing text-retrieval must be very interested in SGML. I don't know that I think of an SGML document as necessarily being hierarchical... I would suggest further readings. Unfortunately, the publishing software companies have been very slow in moving towards SGML, in spite of early endorsement by the American Association of Publishers. The real impetus is the U.S. Government's CALS initiative, and the potential for amazing online documentation capabilities (including text retrieval). I have "SGML: An Author's Guide" by Martin Bryan (Addison Wesley). It is an OK starting place, abeit of narrow focus. I am waiting for "The SGML Handbook" by Charles Goldfarb, which I already expected to be out by now... maybe I'd better start hounding my favorite bookstore... The SGML Handbook should incorporate the ISO 8879 specification, so you can safely skip getting that document. If you are involved with IBM products, you can examine "Solutions for CALS Technical Publishing" (IBM document GC34-5153). I cannot help further, because I am still in early stages of research myself. (Need to spend a few days in a good library, going through the periodical literature and stuff). Sorry for chopping up your posting and ignoring part of it, but time flies... -lar -- --------- TIVOLI Systems, Inc. Lar Kaufman 512-454-3301 (voice) 512-329-2455 4503 Sinclair Avenue (fax) 512-329-2755 Austin, Texas 78756 USA (e) lark@tivoli.com
garyp@csg.uwaterloo.ca (Gary Pianosi) (11/23/90)
In article <200@tivoli.UUCP> lark@tivoli.UUCP (Lar Kaufman) writes: >In article <P!+^?5^@rpi.edu> dms@aix03.aix.rpi.edu (david m schwartz) writes: ... >>6. I assume it is OK for SGML documents to be broken into chapter files? > >Yeah. > Could someone please elaborate on this ... The only way I know of doing this is to define a SYSTEM entity in my main document, say: <!ENTITY chap1 SYSTEM "mychap1.sgml"> and then include the entity reference, "&chap1;", where I want the chapter inserted. Is there another way? My problem with this method is that if I want to break up my chapters into sections and sections into sub-sections, I must define all of the appropriate system entities in my main document. Hence my main document must 'know' about all of the pieces that comprise it. I feel that I somewhat lose the modularity of my document by doing this. -- -- Internet: garyp@csg.UWaterloo.CA Bitnet: garyp@watcsg.BITNET Computer Systems Group, University of Waterloo, Waterloo, Ontario, Canada
inc@tc.fluke.COM (Gary Benson) (11/28/90)
In article <200@tivoli.UUCP> lark@tivoli.UUCP (Lar Kaufman) writes: >In article <P!+^?5^@rpi.edu> dms@aix03.aix.rpi.edu (david m schwartz) writes: >>Hello -- I became aware of SGML this fall. The folks at an exhibition booth >> at EP '90 (Electronic Publishing Conference held this year at ....... Then Lar spends a great deal of time and energy in a terrific posting that really lays out many of the basics of SGML. I nominate his posting for inclusion in a Frequently Asked Questions file for this newsgroup. Many other newsgroups are doing that, and I think this is one that could benefit greatly from it. In the groups using FAQ files, one person volunteers to maintain the file, others submit questions and/or answers for the file, and in whatever state it is in, the file is posted once per month in the newsgroup. Ithink it is a great idea, particularly for groups like this one that seem to have a large number of people looking for the basic information needed to pursue their interests further. I would volunteer for FAQ maintainer except that my knowledge of SGML is also pretty rudimentary. Any takers? My second topic concerns the concept of "implied markup". In my work at Fluke's technical publication department, we have used this concept for years, and I wonder if there are others who use it to the extent we do. Is there much interest in the field? Any good books? For those to whom implied markup is unfamiliar: the idea is that rather than ANY kind of "coding", the writer submits a manuscript in such a form that it's format is implied...for example, when the word "Figure", followed by a number in the form "n-n", appears on a line by itself, indented from the margin, the implication is that this is a figure title...a change of font size and face is called for, and some amount of room is required now to hold the figure implied by the figure title. I have been fascinated to learn how MUCH of this sort of information is available, even in text generated by someone those who do not know the implied markup "rules". For example, if you see a group of consecutive paragraphs at the same indent, preceded by numbers, you can be pretty sure you are looking at a numerical list. If the amount of indent gets larger and the paragraphs are now marked in consecutive alphabetical order, then you are looking at an alphabetical list nested in the numerical one...lesser amounts of indent imply the end of nesting. I suppose I should make clear that we are dealing with so-called "structured documents" here - not free-form ads, novels, magazines, what have you. The world of publishing has LOTS of things that I am sure would not fit into the scheme I have outlined, but for the kinds of technical documents we publish, it seems to provide solutions to many of our nagging publication problems. Sometimes I wonder if we are the only people actively pursuing this technique. We now use a computer program written in perl to glean this structural information and convert it into generic codes. This is how we separate form from content, and I wonder how others are doing the same. Are there other methods to keep the writing and formatting functions separate without requiring every writer to learn the local dialect of SGML or other generic coding? -- Gary Benson -=[ S M I L E R ]=- -_-_-_-inc@fluke.com_-_-_-_-_-_-_-_-_-_- The first rule of intelligent tinkering is to save all the parts. -Paul Erlich
lark@tivoli.UUCP (Lar Kaufman) (11/29/90)
In article <1990Nov28.105230.10365@tc.fluke.COM> inc@tc.fluke.COM (Gary Benson) writes: > >Then Lar spends a great deal of time and energy in a terrific posting that >really lays out many of the basics of SGML. I nominate his posting for >inclusion in a Frequently Asked Questions file for this newsgroup... >... I would volunteer for >FAQ maintainer except that my knowledge of SGML is also pretty rudimentary. >Any takers? In case I have misled anyone, I hasten to admit that I am very much a student of SGML, not a master. I have _never_actually_used_ SGML in a product, so my knowledge is only theoretical. I am only now in a position to begin working with SGML concepts and proto-SGML software. I agree with Gary's proposal, and I hope someone with a practical knowledge will accept the task of maintaining a FAQ file. Gary also mentions tools written in perl. I would love to see people volunteering code and techniques for implementing SGML solutions. I know that others have written programs for converting structural information to/from SGML using various languages (such as Icon). Where are they? Has anyone considered setting up an FTP site for SGML tools? A final comment: we should remember to distinguish between SGML, the standard, and various software products that implement it. It can be confusing to mix these. For example, when I say that you can imbed chapters in an SGML document, I do not imply any knowledge of how a specific SGML product does it (or doesn't do it). -lar -- --------- TIVOLI Systems, Inc. Lar Kaufman (voice) 512-329-2455 (fax) 512-329-2755 Austin, Texas USA (e) lark@tivoli.com
inc@tc.fluke.COM (Gary Benson) (12/12/90)
In article <215@tivoli.UUCP> lark@tivoli.UUCP (Lar Kaufman) writes: >In case I have misled anyone, I hasten to admit that I am very much a >student of SGML, not a master. I have _never_actually_used_ SGML in a >product, so my knowledge is only theoretical. I am only now in a position >to begin working with SGML concepts and proto-SGML software. I agree with >Gary's proposal, and I hope someone with a practical knowledge will accept >the task of maintaining a FAQ file. > >Gary also mentions tools written in perl. I would love to see people >volunteering code and techniques for implementing SGML solutions. I know >that others have written programs for converting structural information >to/from SGML using various languages (such as Icon). Where are they? >Has anyone considered setting up an FTP site for SGML tools? > >A final comment: we should remember to distinguish between SGML, the >standard, and various software products that implement it. It can be >confusing to mix these. For example, when I say that you can imbed >chapters in an SGML document, I do not imply any knowledge of how a >specific SGML product does it (or doesn't do it). Woops, I didn't mean to set you up for guru-hood, Lar! It's just that your posting was well-written and informative without being esoteric to the point of meaninglessness. I hope this newsgroup can be a place for a wide-spectrum discussion of SGML, but so far, it has seemed weighted toward theory, and I found your posting to be a refreshing breath of reality. As to your idea about people posting code and techniques, I can say this -- we have several man-years of programming in our quasi-SGML autocoding programs, and I'm sure I'd be in big trouble if I disseminated those programs. However, our techniques are rather interesting (to us, at least), and I was surprised to see no response to my query if others are using our techniques. Long ago, back when we typeset all of Fluke's technical manuals, a decision was made in the Publications Department to attempt to keep the writing function as separate as possible from the production function. We defined production as encompassing page design, preparation of files for typesetting, typesetting itself, layout, and of course printing, binding, and so on. There have been two very interesting results from that decision: 1. While the industry as a whole has moved to "desk-top publishing", we find ourselves without many peers to discuss methods. We still have our staff typing in raw text, having rejected the "Mac on every desk" approach. 2. We are in an excellent position to take advantage of new software tools because we have a lot of experience with implied markup techniques. In our approach, the writer's file has an absolute minimum of explicit instructions or codes. We have long used the string ---n at the end of lines to indicate heading levels. This is basically the only "coding" our writers do in files. Everything else is recognized by context or through regular expression pattern matching, something that perl is extremely adept at. We use a perl program to scan the file and determine what objects are present. Figure titles are identified by the following string, appearing on a line by itself: Figure n-n. arbitrary text title When our coding program comes across that string, there is only one possible generic code to send to the output file: <figure>. We are toying with the idea of having the title end with a "higher level generic code" like the heading level indicators. This would serve as a cue from writer to gencoding program indicating the desired size of the illustration. For example, "Figure 3-3. Arbitrary Text Title/1" might indicate a full-page illustration, while changing the number to 2, 3, or 4 would indicate half, third and quarter pages respectively. Lists are indented objects beginning with a number or letter, followed by a dot. When the program is confronted with a list environment, it compares the current indent to the former one and the result determines when to send the <end> tag for proper nesting. For bullet lists, we use the letter o with no following dot. As each line is processed, a subroutine scans it for any "special characters" and sends the required string to the gencode file. We like +/- to appear as a plus sign above a minus. Regular expressions look for degrees symbols and Greek letters like mu and omega among others. For example, the string 9oF means 9 degrees F, while 13 uF means 13 microFarads. A major concern has been that reviewers should not be asked to try to make their way through a text loaded with coding. We've found that we get higher quality review remarks when the review copy looks similar to the expected final page. Which is why we have pre-printout filters that convert lines ending ---n to boldface, and if we do incorporate the "Figure Title/n" idea we will probably not print the code even in review copies, instead converting the number to line or form feeds. Our perl program currently recognizes and generates generic codes for: * Section headings * Notes, Cautions, and Warnings * Textual headings up to 4th order (we tell writers if they need to go any higher than 4th order headings, they are probably writing funny). * Alpha, numeric, and two types of bullet lists at 4 indent levels * Figure and Table Titles ...and of course, everything else is just running text :-) Many of our manuals need special treatement for a variety of things -- special fonts, in-text keycap art, special formats, so we by no means have technical publication figured out down to a non-event, but we are getting there! Generic coding and implied markup are powerful approaches to the traditional problems in publishing (especially publishing of structured documents as opposed to books, magazines, and so on). As I asked before, I'd be very interested in hearing from others who are using similar methods. Or other perl users! We had our first program written for us about 2 1/2 years ago, and it is still cranking along, even through two dozen patch levels. Gary Benson Supervisor, Publication Services John Fluke Mfg. Co. Inc. -- Gary Benson -=[ S M I L E R ]=- -_-_-_-inc@fluke.com_-_-_-_-_-_-_-_-_-_- Go jump in a goddam volcano, you fucking cave newt! -greg Nowak