TeXhax@CS.WASHINGTON.EDU (TeXhax Digest) (12/27/90)
TeXhax Digest Wednesday, December 26, 1990 Volume 90 : Issue 076
Moderators: Tiina Modisett and Pierre MacKay
%%% The TeXhax digest is brought to you as a service of the TeX Users Group %%%
%%% in cooperation with the UnixTeX distribution service at the %%%
%%% University of Washington %%%
Today's Topics:
MS-DOS version of Makeindex
Electronic submission to Vancouver group of biomedical journals?
Reasons for having a new 7-bit encoding scheme
-------------------------------------------------------------------------------
Date: Sat, 8 Dec 90 11:33:18 EST
From: Hal_Varian@um.cc.umich.edu
Subject: MS-DOS version of Makeindex
Keywords: MakeIndex, MS-DOS
Does anyone have a patch to Makeindex for MS-DOS that will allow it
to handle more than 1000 or so entries? I have heard that such a
patch exists, but haven't been able to locate it. Please reply
to TeXHax, or directly to me and I'll summarize for the group.
Hal_Varian@um.cc.umich.edu
------------------------------------------------------------------------------
Date: Sun, 9 Dec 90 17:49:38 GMT
From: David_Rhead@vme.ccc.nottingham.ac.uk
Subject: Electronic submission to Vancouver group of biomedical journals?
Keywords: electronin submissions
I thought I'd better file a report on correspondence that I've had about
possible electronic submission of manuscripts to the Vancouver group of
biomedical journals. (The Vancouver group consists of about 300 journals
that have a uniform set of "instructions for authors". See either Annals
of Internal Medicine 1988;108:258-65 or British Medical Journal
1988;296:401-5.)
Stephen Lock of the British Medical Journal is one of the 2 people who
handle comments on behalf of the Vancouver group. (The other is Edward
Huth of Annals of Internal Medicine.) I sent Stephen Lock a letter that
contained the following:
When the "uniform requirements" are next revised, might it be worth giving
some consideration to how authors and publishers can take advantage of
post-typewriter technology? The essential requirement would be to separate
the author's job (specification of the text and its structure) from the
document-designer's job (typographic representation of the structure).
Software options include LaTeX and SGML. With such systems, the author
and referee could read near-typeset drafts, and the manuscript could be
transmitted electronically (without re-keying) to be typeset in the house-style
of the particular journal. The main change from the author's point-of-view
might be a different way of signifying section headings, etc. I enclose some
examples of the "instructions for authors" given by some journals that are
already moving in this direction.
His reply contained the following:
... it is quite clear that sooner or later the Vancouver style will
have to be updated to take desktop publishing into account. ... I will
put your letter and its enclosures on the agenda for our next meeting
- which, conveniently, is due to take place in San Francisco in February
- and to assure you that it will be considered, although obviously
modifications may take rather longer than a few months.
Perhaps some readers of TeXhax are also authors of papers that get submitted to
the Vancouver group. If any such people have views about how the "uniform
requirements" might be updated to enable both authors and publishers to
(painlessly) take advantage of electronic publishing technology, this might
be a good time to make comments to Edward Huth or Stephen Lock. (Such people
will already have copies of the article describing the "uniform requirements",
and will be able to get the relevant addresses from the article!)
David Rhead
P.S. Our VME system will be affected by work on our air-conditioning system
from 17th to 21st December. If anyone wants to mail me during that
period, I'd suggest cczdgr@uk.ac.nottingham.ccc.vax
--------------------------------------------------------------------------
Date: Wed, 5 DEC 90 13:14:33 GMT
From: TEX@rmcs.cranfield.ac.uk
Subject: Reasons for having a new 7-bit encoding scheme
Keywords: encoding scheme, 7-bit
A week or two ago, Dominic Wujastyk entered a plea against re-inventing
wheels because he'd heard rumours about a new encoding scheme, which is
shortly to enter service as the default encoding method at the UK TeX
Archive at Aston University.
Since then, Graham Toal and Pierre MacKay have supported Dominic, so I
think the time has come to publish my reply to Graham and hopefully
convince everybody as to why present encoding schemes are inadequate for
use at archives such as Aston's, where files are collected by users who
have many different architectures and operating systems --- you will see
from the end of the message that I've managed to convince Dominic!
You'll also see, from the attached specification, that it meets Pierre
MacKay's requirements for an encoding scheme.
%%%************************************
I appreciate your concerns about inventing yet another file encoding
scheme (I nearly called the program YAFES).
One of the major problems that we've had at Aston is incompatibility of
binary files (e.g. PK) between stream file systems (as on Unix, DOS) and
record oriented file systems (e.g. VMS, VM/CMS, MVS). I have tried to
find a coding scheme that meets our needs, not just those of the
Unix/MS-DOS community, but without success.
> I throw my rather substantial weight and less substantial influence
> behind Dominik :-) ... to introduce a new 7-bit encoding format would
> be shooting ourselves in the foot. I have exchanged binary files with
> many sites abroad - often through bitnet - and the standard 'xxencode'
> works beautifully (not the misnamed new program which was called
> xxencode for a short time I might add).
Have you exchanged binary files with computers that use record oriented
file systems? Is the "misnamed new program" Nelson Beebe's version of
XXcode?
> Phil Taylor explained to me why a new program is wanted - it is
> because a 7-bit encoded binary file cannot be properly reconstituded
> on VMS without some extra information. Well, I can think of two
> solutions:
>
> 1) Add *extra* vms information *BEFORE* a normal kosher xxencode
> file (or after it of course, but not *in* it)
A nice idea and this is in fact the case for stream files. If fixed
or variable length record files have to be sent then you need to include
information in the file to indicate where the record boundaries are.
> 2) Since we only have a small fixed number of file types in the
> archive where this is a problem (tfm, pk, gf, pxl?) we could
> write a 'fixup' command which converted a stream_lf or ra binary
> file to the appropriate record format.
The font files are generally held as fixed length record files on
VAX/VMS whereas object files are held as variable length binary files.
I'm not sure that the conversion between variable length binary files
and stream binary files is a reversible process. (Since first writing
the above, I have confirmed that this is impossible; under VMS a stream
file has implicit record boundaries --- three `flavours' of stream use
LF, CR or CRLF as the marker. Binary files don't necessarily contain
any of these characters, which would make the entire file into one
record --- VMS file reading is always record oriented because the RMS
services perform reads of rather larger entities than a character.
Files that didn't contain any such marks would be limited to
a total size of 32kB by the blocking conducted by RMS. Furthermore,
keeping all files in the Aston Archive in a stream format would prevent
the retrieval of any file in which more than 2kB appeared without an
intervening end-of-line mark in the stream, because of the Coloured
Books software.)
> Apart from those file types mentioned, I recommend that all other
> files in the archive are line-based text files which should get through
> most ftp implementations with their line-stucture preserved.
I agree, but others want to be able to fetch packages in .tar.Z format.
I've attached part of the very preliminary draft documentation for
VVCODE which I hope will explain why I have been forced to produce yet
another coding scheme.
Any comments on the attached draft would be appreciated.
I'll leave the last words to Dominik:
> Yes, Neil, I see. It really boils down to structured file support, I
> guess. I have never had a VMS account: all I know is DOS and Unix, and
> I am a bit myopic because of that.
>
> Doesn't VMS now support some kind of stream file format?
>
> Anyway, now I see the reason for VVencode, I shall swap to it. It would
> be good if it could be spread very widely, even outside the TeX world.
>
> Dominik
Niel Kempson
[Attached file: VVCODE.DOC]
%------------------------------------------------------------------------------
VVCODE PRELIMINARY DOCUMENTATION
Version 0.0 of 26 October 1990
1. INTRODUCTION
Encoding schemes introduced to transmit binary files over text mail
systems. Primary examples are:
**TODO**
a. Hexadecimal
b. BOO
c. UUcode
d. XXcode
The known implementations of these schemes have been designed
primarily for operating systems with stream file systems. They are
unsuitable for exchanging data between operating systems with
record/block oriented file systems (e.g. VAX/VMS, VM/CMS) where
different file formats are used for different types of files. Some
encoding systems can be used to exchange data between operating
systems with record oriented file systems, but tend to be specific
to a particular operating system (e.g. TELCODE, MFTU for VAX/VMS).
2. THE IDEAL CODING SYSTEM
After a review of the known encoding systems (shortly after XXCODE
was released last year), an outline specification of the "ideal"
coding scheme was drawn up. The key points of the specification are:
2.1 CODING SCHEME
It should be possible to specify the coding table to be used
to encode the data. The coding table used shall be recorded
with each part of the encoded data.
If a recorded coding table is found while decoding the encoded
data file, it should be used to construct an appropriate
decoding table. Simple one-to-one character corruptions should
be corrected as long as only one of the input characters is
mapped to any one output character.
The default encoding/decoding table should avoid the
corruptions commonly encountered when passing mail through
badly-behaved gateways such as the UK.AC.EARN-RELAY EARN/JANET
geteway. The recommended table is the default XXcode table
using only the characters:
+-0123456789
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
Encoded lines should be prefixed by an approprite character
string to distinguish them from unwanted lines such as mail
headers and trailers. Lines should not end with whitespace
characters as some mailers and operating systems strip off
trailing whitespace.
2.2 FILE SPLITTING
The encoding program should be able to split the encoded output
into parts, each no larger than a maximum specified size.
Splitting the output into smaller parts is useful if the
encoded data is to transmitted using electronic mail or over
unreliable network links that do not stay up long enough to
transmit a large file. The recommended default maximum part
size is 30kB.
The decoding program should be able to decode a multi-part
encoded file very flexibly. It should not be necessary to
a. strip out mail headers and trailers.
b. combine all of the parts into one file in the
correct order.
c. process each part of the encoded data as a
separate file.
2.3 VERIFICATION
The encoding program should calculate parameters of the input
file such as the number of bytes and CRC and record them at the
end of the encoded data.
The decoding program should calculate the same parameters from
the decoded data and compare the values obtained from those
recorded at the end of the encoded data.
2.4 FILE ORGANIZATION
The encoding program should be able to read different types of
input file and record the organization of the file at the start
of the encoded data. This is not too important for operating
systems with stream type file systems (e.g. Unix, MS-DOS) where
files are simply written as streams of bytes, but is very
important for operating systems with record oriented file
systems (e.g. VAX/VMS, VM/CMS) where different types of file
are organized in different ways.
The decoding program should be able to use this information to
create the output file using the organization appropriate to
the operating system in use.
2.5COMPATIBILITY
The encoding and decoding schemes should be able to read and
write files compatible with one or more other coding schemes.
2.6 AVAILABILITY
The source code for the programs should be freely available.
It should also be portable and usable with as many computers,
operating systems and compilers as possible.
3. VVCODE
After scouring unsuccessfully around the networks and mailing lists
for such a coding system, we decided to implement yet another file
encoding scheme called VVCODE. VVCODE is an extension to the
standard Unix UUcode utilities used for the transmission of (binary)
files over a medium capable of passing only text files.
The VVCODE encoding and decoding programs implement most of the
specification detailed above. The features of VVENCODE and VVDECODE
are summarized below, keyed to the specification.
3.1 CODING SCHEME
The default coding table for both VVENCODE and VVDECODE is the
standard XXcode table using the characters:
+-0123456789
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
The encoding table used by VVENCODE is recorded in the encoded
data file.
If VVDECODE encounters an encoding table in the encoded data,
it is used to construct a decoding table. Simple one-to-one
character corruptions can be corrected as long as only one of
the input characters is mapped to any one output character.
A command line qualifier can be used to override the coding
table used by VVENCODE and VVDECODE.
Each line of the VVENCODEd data has a unique prefix ("Vv") and
suffix ("V"). VVDECODE ignores any lines in the input file
that do not begin with this prefix such as mail headers and
trailers. The suffix is not used - it is present to avoid
trailing whitespace on any line of the encoded data.
3.2 FILE SPLITTING
VVENCODE can split the encoded output into parts, each no
larger than a maximum specified size. The default maximum part
size is 30kB.
VVDECODE can decode a multi-part encoded file in a very
flexible way. The parts may be presented to VVDECODE in the
following ways:
a. as one file containing all of the parts in any
order.
b. each part is in a separate file. Ideally each
part number has the file extension ".v##", where
## is the part number, but if VVDECODE cannot find
a file with this extension it will prompt the user
to supply the file specification for the part.
c. a combination a. and b., i.e. a number of files,
each containing one or more parts in any order
Again the parts can be presented to VVDECODE as received; it
is not necessary to remove mail headers or trailers.
3.3 VERIFICATION
VVENCODE calculates the number of bytes and the 16 bit CRC of
the input file and records these parameters at the end of the
encoded data.
Whilst decoding, VVDECODE calculates the number of bytes and
the 16 bit CRC of the decoded data. If these parameters are
recorded in the encoded data file the two versions are compared
to verify the fidelity of the encoding/transmission/decoding
process.
3.4 FILE ORGANIZATION
**TODO**
modes: binary, text
file formats: stream (Unix, MS-DOS, TOPS)
fixed length records (VAX/VMS, VM/CMS)
variable length records (VAX/VMS, VM/CMS ?)
record length: specified and recorded in the VVCODE file
3.5 COMPATIBILITY
VVCODE cannot yet read or write encoded files compatible with other
systems such as UUcode and XXcode. Soon, VVCODE will be able to
read UUcode and VVcode files, but not write them.
3.6 AVAILABILITY
The source code for VVCODE will be freely available (see section 6
for conditions).
VVCODE has been ported to most of the commonly used environments.
For a full list of the environments supported, see section 10.
4. FORMAT OF A VVENCODED FILE
4.1 PREFIXES AND SUFFIXES
Vv prefix to help distinguish VVENCODEd lines from other lines
such as mail headers and trailers
V suffix to prevent lines ending in spaces which may be
trimmed by certain mailers and file systems
4.2 HEADER INFORMATION
a. mode
b. format
c. table
d. begin
e. skipfrom
4.3 ENCODED DATA
4.4 TRAILER INFORMATION
a. end
b. skipto
c. bytecount
d. crc16
5. USING VVCODE
**TODO**
6. AVAILABILITY OF VVCODE
The VVCODE programs may be freely copied and circulated to others,
provided that no fee (beyond reasonable media copying charges) is
levied. The authors welcome bug reports and encourages suggestions
for porting to other environments and operating systems, by mail
(paper or electronic) or by telephone.
If you port this program to a previously unsupported environment or
operating system, please feed your changes back to the authors so
that others may benefit. Contributions received will be gratefully
acknowledged.
7. PORTING VVCODE TO A NEW ENVIRONMENT
**TODO**
8.THE AUTHORS
Chief Architect:
Niel Kempson,
25 Whitethorn Drive,
CHELTENHAM
GL52 5LL
England
Tel: +44 242 579105 (home)
E-mail: TeX @ Uk.AC.Cranfield.RMCS
RMCS_TEX @ Uk.Ac.TeX
Advice and encouragement:
Brian {Hamilton Kelly},
School of Elec. Eng. & Science,
Royal Military College of Science,
Shrivenham,
SWINDON
SN6 8LA
England
Tel: +44 793 785252 (office)
E-mail: TeX @ Uk.AC.Cranfield.RMCS
RMCS_TEX @ Uk.Ac.TeX
9. ACKNOWLDEGEMENTS
16 bit CRC function and other general ideas:
Nelson H. F. Beebe,
Center for Scientific Computing,
Department of Mathematics,
220 South Physics Building,
University of Utah,
Salt Lake City,
UT 84112
E-mail: beebe @ science.utah.edu
10.ENVIRONMENTS SUPPORTED
**TODO**
11.MODIFICATION HISTORY
**TODO**
-----------------------------------------------------------------------
%%% Further information about the TeXhax Digest, the TeX
%%% Users Group, and the latest software versions is available
%%% in every tenth issue of the TeXhax Digest.
%%%
%%% Concerning subscriptions, address changes, unsubscribing:
%%%
%%% BITNET: send a one-line mail message to LISTSERV@xxx
%%% SUBSCRIBE TEX-L <your name> % to subscribe
%%% or UNSUBSCRIBE TEX-L
%%%
%%% Internet: send a similar one line mail message to
%%% TeXhax-request@cs.washington.edu
%%% JANET users may choose to use
%%% texhax-request@uk.ac.nsf
%%% All submissions to: TeXhax@cs.washington.edu
%%%
%%% Back issues available for FTPing as:
%%% machine: directory: filename:
%%% JUNE.CS.WASHINGTON.EDU TeXhax/TeXhaxyy.nnn
%%% yy = last two digits of current year
%%% nnn = issue number
%%%
%%%\bye
%%%
End of TeXhax Digest
**************************
-------