dcw@doc.ic.ac.uk (Duncan C White) (07/02/87)
Hello everyone, A friend of mine has a small problem: he has entered a 40 page paper about Theoretical Physics [chock full of silly little Greek symbols, mathematical equations and such like] into SIGNUM (a document processing system on the Atari ST). Now, he has decided that Signum is not good enough for his purposes, and prefers TeX [on a VMS VAX] So, we want to extract as much of the data as possible out of the Signum files [ideally, text, greek and equations, but just having the ascii text would be better than nothing] The manual doesn't appear to say anything about the internal format that Signum uses. It contains a section on importing plain ASCII files into Signum, but no corresponding section on exporting. [yes, I know that's more difficult] We have started by looking at a hex dump of a fairly short section of the paper. After some initial blurb, which appears to include the names of the fonts used in the text, the rest of the text appears to consist of a sequence of character pairs : the first is some form of 'tag' character, and invariably has the top bit set, and the second is the actual character. The major problem is that spaces are not stored in the document: it seems that a word start is signalled by a tag value of one of several values: A0, A4 and 9A are three such values. However, we have not managed to deduce when it uses which particular values, or indeed what the full set of 'word start' tags are. Worse, there are some tag values [98, for example] which do not appear to be UNIVERSALLY 'word start' tags: some words are marked by this tag, but the same tag value is also present in the middle of other words! Also, for some reason, y and z are swapped, and the word 'We' gets rendered as "W e". We have had no real success with the equations or greek letters: however, we can probably mark their positions [or at least, the positions of total garbage] What we already have is much better than nothing, but we would appreciate any hints, pointers, suggestions, full-blown 'C' or PASCAL source code ( well, it's worth a try :-) on the format Signum uses. If we get enough information, we could write a conversion utility which will convert Signum files into ASCII files, probably with special 'dot commands' to represent the equations and greek squiggles. [ like troff/eqn on Unix ] The ultimate tool for this job would obviously be a Signum -> TeX translator. Unfortunately, I do not know very much about TeX either, so I couldn't really write such a beast. [anyone else want a fun project, and know TeX and Signum ???] Please mail any suggestions etc etc to me, and I'll summarize what I get... ecnavda ni xnahT nacnuD (well, everyone says I'm backward :-) ----------------------------------------------------------------------------- JANET address : dcw@uk.ac.ic.doc| Snail Mail : Duncan White, --------------------------------| Dept of Computing, This space intentionally | Imperial College, left blank...... | 180 Queen's Gate, (paradoxical excerpt from | South Kensington, IBM manuals) | London SW7 ---------------------------------------------------------------------------- Tel: UK 01-589-5111 x 4982/4991 ----------------------------------------------------------------------------