chenevoy@loria.crin.fr (CHENEVOY) (07/07/90)
In article <EMV.90Jul7000337@urania.math.lsa.umich.edu> emv@math.lsa.umich.edu (Edward Vielmetti) writes: > >this is a call for discussion for the newsgroup 'comp.text.sgml', to discuss >the ISO 'Standard General Markup Language' and systems which use it. > I am also interrested with document structures. We are developping here a knowledge-based system for structured document recognition. The system should deal with models of documents represented with standard normas. The problems of document recognition are not quite the same as document composition, assuming that the physical structure is more important (because we have the layout as an input), therefore, SGML is not necessary the best standard for us. We are also interrested with ODA, ODL and all aspects of document structure. > [...] >the name: > comp.text.sgml. seems reasonable. Why not comp.text.struct, or better comp.doc.struct ? Yannick Chenevoy - email: chenevoy@loria.crin.fr
emv@math.lsa.umich.edu (Edward Vielmetti) (07/11/90)
In article <2498@loria.crin.fr> chenevoy@loria.crin.fr (CHENEVOY) writes: I am also interrested with document structures. We are developping here a knowledge-based system for structured document recognition. The system should deal with models of documents represented with standard normas. Can you give a concrete example ? The picture that I am envisioning is the problem of recapturing the information in say a railway timetable, or marks on a printed form, or distinguishing amongst a set of similar printed forms. The problems of document recognition are not quite the same as document composition, assuming that the physical structure is more important (because we have the layout as an input), therefore, SGML is not necessary the best standard for us. We are also interrested with ODA, ODL and all aspects of document structure. I would hope that whatever group would come about would have a wide enough charter that no one would be obligated to assert that SGML was the best standard and should be applied to every problem. For whatever reason, it appears to be the underpinning or at least the style in which several packages I am familiar with are organized. Are ODA and ODL available from standards organizations? What is special about them that makes them more suited for your task? > comp.text.sgml. seems reasonable. Why not comp.text.struct, or better comp.doc.struct ? I suppose I do show a bias here -- the class of problems that interest me, or that I see myself facing, include texts which on the surface do not appear to have much structure at all. Part of the challenge will be marking them so that there is some information recovered, or so that a human going over them later with a browser can make sense of them. SGML is a convenient enough tag-word to latch on to a proper set of interested people; other ways of describing the group, though perhaps more appropriate for a given task, don't seem to be likely to draw the proper set of people together. --Ed Edward Vielmetti, U of Michigan math dept <emv@math.lsa.umich.edu> comp.archives moderator
enag@ifi.uio.no (Erik Naggum) (07/14/90)
Edward Vielmetti asks: > Are ODA and ODL available from standards organizations? ODA is available from your favorite ISO outlet as ISO 8613 (8 parts), and costs an incredible amount of money. That's where I foolishly bought it before I discovered that... ODA is also available from your favorite CCITT outlet as Blue Book Volume VII, fascicle VII.6, Terminal equipment and protocols for telematic services, T.400-T.418, and costs a nominal CHF 47 (or CHF 57 if you order it from CCITT yourself). ISO 8613 and the T.400-series are supposedly identical, as per ISO 8613-1, Annex B, Relationships with other standards, B.2 Other standards, section B.2.1, second paragraph: | The text of ISO 8613-1 to ISO 8613-8 are identical to the texts in | the correspondingly numbered CCITT Recommendations T.411 to T.418 | except for mandated stylistic differences and provisions of ISO 8613 | that are outside the scope of these Recommendations. I've been walking around in the CCITT (Red,Blue) Books, and it was quite disturbing to find that ODA's so expertly hidden as a "telematic service". You can save a good number of dollars by not buying from ISO. I don't know anything about what the "provisions ... outside the scope of these Recommendations" refer to. ODL is defined in ISO 8613-1, 3. Definitions: | 3.124 Office Document Language; ODL (abbreviation): A Standard | Generalized Markup Language (SGML, ISO 8879) application for | representing documents conforming to ISO 8613. A German "expert group" who contributed to the ISO 9069 Technical Report noted that SGML covers much wider applications than ODA does, and that the entire ODA document structures can be expressed with ease in SGML (hence ODL), while SGML documents would easily overflow the limits of ODA. The extremely useful Marked Sections, for instance, have no counterpart in ODA, and multiple concurrent documents don't exist in ODA, although ODA has two concurrent documents structures. It should be noted that ODL is an SGML _application_. (I don't have the German paper anymore, but can probably find a reference to the book it was going to be, or is, part of, if somebody is interested.) > What is special about them that makes them more suited for your > task? I can't answer for the person you ask, but ODA has defined a set of fixed layouts and such that makes it easier to accomplish "inter- working between conforming systems". (I love that phrase!) Also, ODA is encoded in the ASN.1 "Basic Encoding Rules", and I'm told that it's easier to parse than SGML, but some of those who work with ASN.1 around here vehemently denies that ASN.1 is easy to parse, so I don't know -- I haven't looked into it, really, but to end on a quote of mine: "I've always liked data representations that I can read, and oscillo- scopes and hex dumps don't count." -- [Erik Naggum]
enag@ifi.uio.no (Erik Naggum) (07/14/90)
In article <ENAG.90Jul13203710@slembe.uio.no> I wrote:
A German "expert group" who contributed to the ISO 9069 Technical
Report noted that SGML covers much wider applications than ODA does,
That was supposed to be ISO TR 9573. Number numbness set in again.
Now, let's see what this new C-news software does with this article.
--
[Erik Naggum]