[comp.text.sgml] SGML question

emv@math.lsa.umich.edu (Edward Vielmetti) (09/07/90)

In article <141829@sun.Eng.Sun.COM> tut@cairo.Sun.COM (Bill "Bill" Tuthill) writes:
   Is there a newsgroup comp.text.sgml somewhere?  Not where I work.

comp.text.sgml has just been created a few minutes ago, you should see
it soon.  This should be the first message in it.

I hope to come up with a more reasonable introduction to the group in
the next few weeks.  In the interim if you're on the internet the
site "sgml.math.lsa.umich.edu" has a directory /pub/sgml with some
stuff in it.

--Ed

Edward Vielmetti, U of Michigan math dept <emv@math.lsa.umich.edu>

allan@rind.cs.cornell.edu (James Allan) (09/11/90)

Well, asl ong as this newsgroup exists, I suppose I might as well post a
question.  I'm doing a little bit of work with an encyclopedia which was
given to us for research work in SGML format.  There are a couple of
things in the text which I don't have any documentation for and am
not totally sure of.

Some articles in this encyclopedia contain cross references to other articles
and they use some SGML syntax to do that.  Are those commands part of
SGML or extensions?  How can I find out what the "base" SGML is (if
it exists)?

Thanks for any help.

			-- james allan
			   allan@cs.cornell.edu

ath@prosys.se (Anders Thulin) (09/11/90)

In article <45617@cornell.UUCP> allan@rind.cs.cornell.edu (James Allan) writes:

>Some articles in this encyclopedia contain cross references to other articles
>and they use some SGML syntax to do that.  Are those commands part of
>SGML or extensions?  How can I find out what the "base" SGML is (if
>it exists)?

'Base' SGML is defined by the ISO 8879 (?), and isn't by itself very
useful for text markup. It doesn't contain any markup commands - those
are provided by separate document type definitions (DTDs).

The DTD used by your encyclopeadia should be declared by a <!DOCUMENT
name ...> clause close to the beginning of the file. The file
specified by this clause is where you're likely to find the definition
of those cross reference commands.

Hope this is of any help,

-- 
Anders Thulin       ath@prosys.se   {uunet,mcsun}!sunic!prosys!ath
Telesoft Europe AB, Teknikringen 2B, S-583 30 Linkoping, Sweden

scjones@thor.UUCP (Larry Jones) (09/11/90)

In article <45617@cornell.UUCP>, allan@rind.cs.cornell.edu (James Allan) writes:
> 
> Well, asl ong as this newsgroup exists, I suppose I might as well post a
> question.  I'm doing a little bit of work with an encyclopedia which was
> given to us for research work in SGML format.  There are a couple of
> things in the text which I don't have any documentation for and am
> not totally sure of.
> 
> Some articles in this encyclopedia contain cross references to other articles
> and they use some SGML syntax to do that.  Are those commands part of
> SGML or extensions?  How can I find out what the "base" SGML is (if
> it exists)?

SGML is a syntactic standard, not a semantic standard.  That is, it
describes what the markup should look like, but not what it means.
If you want to know about SGML, it is ISO standard 8879 which is
available from ANSI, 1430 Broadway, New York, NY  10018
212-354-3300.  If you want to know about the markup in your
encyclopedia, you need to ask the author or publisher.
----
Larry Jones                         UUCP: uunet!sdrc!thor!scjones
SDRC                                      scjones@thor.UUCP
2000 Eastman Dr.                    BIX:  ltl
Milford, OH  45150-2789             AT&T: (513) 576-2070
Ever notice how tense grown-ups get when they're recreating? -- Calvin

emv@math.lsa.umich.edu (Edward Vielmetti) (09/12/90)

   In article <45617@cornell.UUCP> allan@rind.cs.cornell.edu (James Allan) writes:

   >Some articles in this encyclopedia contain cross references to
   >other articles and they use some SGML syntax to do that.  Are
   >those commands part of SGML or extensions?  How can I find out
   >what the "base" SGML is (if it exists)?

James, 

It would be interesting to see an excerpt from this encyclopedia,
to show the format of what they are using for cross references.

I can't imagine anyone putting byte offsets into the file as cross
references; perhaps they are (page no, line no) offsets, but that
would be problematic for electronic text.  Better would be some sort
of unambigous reference to where the reference is (however you would
say "paragraph 3 of the 'Publishing' reference in the Macropedia").

--Ed

Edward Vielmetti, U of Michigan math dept <emv@math.lsa.umich.edu>
moderator, comp.archives

bzs@world.std.com (Barry Shein) (09/12/90)

From: emv@math.lsa.umich.edu (Edward Vielmetti)
>I can't imagine anyone putting byte offsets into the file as cross
>references; perhaps they are (page no, line no) offsets, but that
>would be problematic for electronic text.  Better would be some sort
>of unambigous reference to where the reference is (however you would
>say "paragraph 3 of the 'Publishing' reference in the Macropedia").

Actually, it's less unimaginable than you might think. Remember that
texts are a lot less changeable than other types of documents you
might be accustomed to. I doubt anyone is editing the bible tho (and
if they do something like that then they better make a *copy* and call
it something else!)

Better would probably be to insert unique-reference tags into the text
and then compile a look-aside table of byte offsets for quick
reference which can be rebuilt easily if needed.

Meaning, you'd insert things like this: <refid 123347 "fnords"> into
the text (the number just being an increasing integer to ensure
uniqueness) and then compile a table (with a program) that might have
entries like:

	ref	as appears in text	filename	byte-offset

	fnords	<refid 123347 "fnord">	foo.txt.1	2366
	
Something like that.
-- 
        -Barry Shein

Software Tool & Die    | {xylogics,uunet}!world!bzs | bzs@world.std.com
Purveyors to the Trade | Voice: 617-739-0202        | Login: 617-739-WRLD

allan@rind.cs.cornell.edu (James Allan) (09/12/90)

emv@math.lsa.umich.edu (Edward Vielmetti) writes:
>It would be interesting to see an excerpt from this encyclopedia,
>to show the format of what they are using for cross references.

Each article starts with "<fw.art id=XXX>" where XXX is a unique identifier
the article (the article ends with "</fw.art>").  The cross references
then look like 

	<xr type=see syntax=sep><x id=AARE></x>Aare</xr>

which, I believe, means "see also Aare, which is a separate document
with id AARE".

So the format isn't as restrictive as explicit byte offsets.

Thanks for the information about DTD's; now I just have to track one
down!

			-- james allan
			   allan@cs.cornell.edu