[comp.sys.atari.st] SIGNUM internal format

dcw@doc.ic.ac.uk (Duncan C White) (07/02/87)

Hello everyone,

	A friend of mine has a small problem: he has entered a 40
	page paper about Theoretical Physics [chock full of silly
	little Greek symbols, mathematical equations and such like]
	into SIGNUM (a document processing system on the Atari ST).

	Now, he has decided that Signum is not good enough for his
	purposes, and prefers TeX [on a VMS VAX]

	So, we want to extract as much of the data as possible out
	of the Signum files [ideally, text, greek and equations, but
	just having the ascii text would be better than nothing]

	The manual doesn't appear to say anything about the internal
	format that Signum uses.  It contains a section on importing
	plain ASCII files into Signum, but no corresponding section
	on exporting. [yes, I know that's more difficult]

	We have started by looking at a hex dump of a fairly short
	section of the paper.  After some initial blurb, which appears
	to include the names of the fonts used in the text, the rest
	of the text appears to consist of a sequence of character
	pairs : the first is some form of 'tag' character, and
	invariably has the top bit set, and the second is the actual
	character.

	The major problem is that spaces are not stored in the document:
	it seems that a word start is signalled by a tag value of one
	of several values: A0, A4 and 9A are three such values.
	However, we have not managed to deduce when it uses which
	particular values, or indeed what the full set of 'word start'
	tags are.
	Worse, there are some tag values [98, for example] which do
	not appear to be UNIVERSALLY 'word start' tags: some words
	are marked by this tag, but the same tag value is also present
	in the middle of other words!
	Also, for some reason, y and z are swapped, and the word 'We'
	gets rendered as "W e".

	We have had no real success with the equations or greek letters:
	however, we can probably mark their positions [or at least, the
	positions of total garbage]

	What we already have is much better than nothing, but we would
	appreciate any hints, pointers, suggestions, full-blown 'C' or
	PASCAL source code ( well, it's worth a try :-) on the format
	Signum uses.   If we get enough information, we could write
	a conversion utility which will convert Signum files into ASCII
	files, probably with special 'dot commands' to represent the
	equations and greek squiggles.  [ like troff/eqn on Unix ]

	The ultimate tool for this job would obviously be a Signum -> TeX
	translator.  Unfortunately, I do not know very much about TeX
	either, so I couldn't really write such a beast. [anyone else
	want a fun project, and know TeX and Signum ???]

	Please mail any suggestions etc etc to me, and I'll summarize what
	I get...

		ecnavda ni xnahT

			nacnuD

(well, everyone says I'm backward :-)

-----------------------------------------------------------------------------
JANET address : dcw@uk.ac.ic.doc| Snail Mail :  Duncan White,
--------------------------------|               Dept of Computing,
  This space intentionally      |               Imperial College,
  left blank......              |               180 Queen's Gate,
  (paradoxical excerpt from     |               South Kensington,
  IBM manuals)                  |               London SW7
----------------------------------------------------------------------------
Tel: UK 01-589-5111 x 4982/4991
----------------------------------------------------------------------------