[news.groups] call for discussion: comp.text.sgml

chenevoy@loria.crin.fr (CHENEVOY) (07/07/90)

In article <EMV.90Jul7000337@urania.math.lsa.umich.edu> emv@math.lsa.umich.edu (Edward Vielmetti) writes:
>
>this is a call for discussion for the newsgroup 'comp.text.sgml', to discuss
>the ISO 'Standard General Markup Language' and systems which use it.
>
I am also interrested with document structures. We are developping here a
knowledge-based system for structured document recognition. The system should
deal with models of documents represented with standard normas. The problems
of document recognition are not quite the same as document composition, 
assuming that the physical structure is more important (because we have the
layout as an input), therefore, SGML is not necessary the best standard for
us. We are also interrested with ODA, ODL and all aspects of document 
structure.

> [...]
>the name: 
>	comp.text.sgml.  seems reasonable.

Why not comp.text.struct, or better comp.doc.struct ?

Yannick Chenevoy -
email: chenevoy@loria.crin.fr

emv@math.lsa.umich.edu (Edward Vielmetti) (07/11/90)

In article <2498@loria.crin.fr> chenevoy@loria.crin.fr (CHENEVOY) writes:

   I am also interrested with document structures. We are developping here a
   knowledge-based system for structured document recognition. The system should
   deal with models of documents represented with standard normas.

Can you give a concrete example ?  The picture that I am envisioning
is the problem of recapturing the information in say a railway
timetable, or marks on a printed form, or distinguishing amongst a set
of similar printed forms.  

   The problems
   of document recognition are not quite the same as document composition, 
   assuming that the physical structure is more important (because we have the
   layout as an input), therefore, SGML is not necessary the best standard for
   us. We are also interrested with ODA, ODL and all aspects of document 
   structure.

I would hope that whatever group would come about would have a wide
enough charter that no one would be obligated to assert that SGML was
the best standard and should be applied to every problem.  For
whatever reason, it appears to be the underpinning or at least the
style in which several packages I am familiar with are organized.

Are ODA and ODL available from standards organizations?  What is
special about them that makes them more suited for your task?

   >	comp.text.sgml.  seems reasonable.

   Why not comp.text.struct, or better comp.doc.struct ?

I suppose I do show a bias here -- the class of problems that interest
me, or that I see myself facing, include texts which on the surface do
not appear to have much structure at all.  Part of the challenge will
be marking them so that there is some information recovered, or so
that a human going over them later with a browser can make sense of
them.  SGML is a convenient enough tag-word to latch on to a proper
set of interested people; other ways of describing the group, though
perhaps more appropriate for a given task, don't seem to be likely to
draw the proper set of people together.

--Ed

Edward Vielmetti, U of Michigan math dept <emv@math.lsa.umich.edu>
comp.archives moderator

enag@ifi.uio.no (Erik Naggum) (07/14/90)

Edward Vielmetti asks:
> Are ODA and ODL available from standards organizations?

ODA is available from your favorite ISO outlet as ISO 8613 (8 parts),
and costs an incredible amount of money.  That's where I foolishly
bought it before I discovered that...

ODA is also available from your favorite CCITT outlet as Blue Book
Volume VII, fascicle VII.6, Terminal equipment and protocols for
telematic services, T.400-T.418, and costs a nominal CHF 47 (or CHF 57
if you order it from CCITT yourself).  ISO 8613 and the T.400-series
are supposedly identical, as per ISO 8613-1, Annex B, Relationships
with other standards, B.2 Other standards, section B.2.1, second
paragraph:

| The text of ISO 8613-1 to ISO 8613-8 are identical to the texts in
| the correspondingly numbered CCITT Recommendations T.411 to T.418
| except for mandated stylistic differences and provisions of ISO 8613
| that are outside the scope of these Recommendations.

I've been walking around in the CCITT (Red,Blue) Books, and it was
quite disturbing to find that ODA's so expertly hidden as a "telematic
service".  You can save a good number of dollars by not buying from
ISO.  I don't know anything about what the "provisions ... outside the
scope of these Recommendations" refer to.

ODL is defined in ISO 8613-1, 3. Definitions:
| 3.124 Office Document Language; ODL (abbreviation): A Standard
| Generalized Markup Language (SGML, ISO 8879) application for
| representing documents conforming to ISO 8613.

A German "expert group" who contributed to the ISO 9069 Technical
Report noted that SGML covers much wider applications than ODA does,
and that the entire ODA document structures can be expressed with ease
in SGML (hence ODL), while SGML documents would easily overflow the
limits of ODA.  The extremely useful Marked Sections, for instance,
have no counterpart in ODA, and multiple concurrent documents don't
exist in ODA, although ODA has two concurrent documents structures.
It should be noted that ODL is an SGML _application_.  (I don't have
the German paper anymore, but can probably find a reference to the
book it was going to be, or is, part of, if somebody is interested.)

> What is special about them that makes them more suited for your
> task?

I can't answer for the person you ask, but ODA has defined a set of
fixed layouts and such that makes it easier to accomplish "inter-
working between conforming systems".  (I love that phrase!)  Also, ODA
is encoded in the ASN.1 "Basic Encoding Rules", and I'm told that it's
easier to parse than SGML, but some of those who work with ASN.1
around here vehemently denies that ASN.1 is easy to parse, so I don't
know -- I haven't looked into it, really, but to end on a quote of
mine:

"I've always liked data representations that I can read, and oscillo-
scopes and hex dumps don't count."
--
[Erik Naggum]

enag@ifi.uio.no (Erik Naggum) (07/14/90)

In article <ENAG.90Jul13203710@slembe.uio.no> I wrote:
   A German "expert group" who contributed to the ISO 9069 Technical
   Report noted that SGML covers much wider applications than ODA does,

That was supposed to be ISO TR 9573.  Number numbness set in again.

Now, let's see what this new C-news software does with this article.
--
[Erik Naggum]