[comp.text.tex] TeXhax Digest V90 #076

TeXhax@CS.WASHINGTON.EDU (TeXhax Digest) (12/27/90)

TeXhax Digest    Wednesday,  December 26, 1990  Volume 90 : Issue 076

Moderators: Tiina Modisett and Pierre MacKay

%%% The TeXhax digest is brought to you as a service of the TeX Users Group %%%
%%%       in cooperation with the UnixTeX distribution service at the       %%%
%%%                      University of Washington                           %%%

Today's Topics:         

                      MS-DOS version of Makeindex
      Electronic submission to Vancouver group of biomedical journals?
              Reasons for having a new 7-bit encoding scheme

-------------------------------------------------------------------------------

Date: Sat, 8 Dec 90 11:33:18 EST
From: Hal_Varian@um.cc.umich.edu
Subject: MS-DOS version of Makeindex
Keywords: MakeIndex, MS-DOS

Does anyone have a patch to Makeindex for MS-DOS that will allow it
to handle more than 1000 or so entries?  I have heard that such a
patch exists, but haven't been able to locate it.  Please reply
to TeXHax, or directly to me and I'll summarize for the group.
                                   Hal_Varian@um.cc.umich.edu

------------------------------------------------------------------------------

Date: Sun, 9 Dec 90 17:49:38 GMT
From: David_Rhead@vme.ccc.nottingham.ac.uk
Subject: Electronic submission to Vancouver group of biomedical journals?
Keywords: electronin submissions 

I thought I'd better file a report on correspondence that I've had about
possible electronic submission of manuscripts to the Vancouver group of
biomedical journals.  (The Vancouver group consists of about 300 journals
that have a uniform set of "instructions for authors".  See either Annals
of Internal Medicine 1988;108:258-65 or British Medical Journal
1988;296:401-5.)
 
Stephen Lock of the British Medical Journal is one of the 2 people who
handle comments on behalf of the Vancouver group.  (The other is Edward
Huth of Annals of Internal Medicine.)  I sent Stephen Lock a letter that
contained the following:

When the "uniform requirements" are next revised, might it be worth giving
some consideration to how authors and publishers can take advantage of
post-typewriter technology?  The essential requirement would be to separate
the author's job (specification of the text and its structure) from the
document-designer's job (typographic representation of the structure).
Software options include LaTeX and SGML.  With such systems, the author
and referee could read near-typeset drafts, and the manuscript could be
transmitted electronically (without re-keying) to be typeset in the house-style
of the particular journal.  The main change from the author's point-of-view
might be a different way of signifying section headings, etc.  I enclose some
examples of the "instructions for authors" given by some journals that are
already moving in this direction.

 
His reply contained the following:

      ... it is quite clear that sooner or later the Vancouver style will
have to be updated to take desktop publishing into account.  ... I will
put your letter and its enclosures on the agenda for our next meeting
- which, conveniently, is due to take place in San Francisco in February
- and to assure you that it will be considered, although obviously
modifications may take rather longer than a few months.

 
Perhaps some readers of TeXhax are also authors of papers that get submitted to
the Vancouver group.  If any such people have views about how the "uniform
requirements" might be updated to enable both authors and publishers to
(painlessly) take advantage of electronic publishing technology, this might
be a good time to make comments to Edward Huth or Stephen Lock.  (Such people
will already have copies of the article describing the "uniform requirements",
and will be able to get the relevant addresses from the article!)
 
                                                                  David Rhead
 
P.S. Our VME system will be affected by work on our air-conditioning system
     from 17th to 21st December.  If anyone wants to mail me during that
     period, I'd suggest  cczdgr@uk.ac.nottingham.ccc.vax

--------------------------------------------------------------------------

Date: Wed, 5 DEC 90 13:14:33 GMT
From: TEX@rmcs.cranfield.ac.uk
Subject: Reasons for having a new 7-bit encoding scheme
Keywords: encoding scheme, 7-bit

A week or two ago, Dominic Wujastyk entered a plea against re-inventing
wheels because he'd heard rumours about a new encoding scheme, which is
shortly to enter service as the default encoding method at the UK TeX
Archive at Aston University.

Since then, Graham Toal and Pierre MacKay have supported Dominic, so I
think the time has come to publish my reply to Graham and hopefully
convince everybody as to why present encoding schemes are inadequate for
use at archives such as Aston's, where files are collected by users who
have many different architectures and operating systems --- you will see
from the end of the message that I've managed to convince Dominic!
You'll also see, from the attached specification, that it meets Pierre
MacKay's requirements for an encoding scheme.

%%%************************************

I appreciate your concerns about inventing yet another file encoding
scheme (I nearly called the program YAFES).  

One of the major problems that we've had at Aston is incompatibility of
binary files (e.g. PK) between stream file systems (as on Unix, DOS) and
record oriented file systems (e.g. VMS, VM/CMS, MVS).  I have tried to
find a coding scheme that meets our needs, not just those of the
Unix/MS-DOS community, but without success.

> I throw my rather substantial weight and less substantial influence
> behind Dominik :-) ... to introduce a new 7-bit encoding format would
> be shooting ourselves in the foot.  I have exchanged binary files with
> many sites abroad - often through bitnet - and the standard 'xxencode'
> works beautifully (not the misnamed new program which was called
> xxencode for a short time I might add).  

Have you exchanged binary files with computers that use record oriented
file systems?  Is the "misnamed new program" Nelson Beebe's version of
XXcode?

>    Phil Taylor explained to me why a new program is wanted - it is
> because a 7-bit encoded binary file cannot be properly reconstituded
> on VMS without some extra information.  Well, I can think of two
> solutions:  
> 
>    1) Add *extra* vms information *BEFORE* a normal kosher xxencode
>       file (or after it of course, but not *in* it)

A nice idea and this is in fact the case for stream files.  If fixed
or variable length record files have to be sent then you need to include
information in the file to indicate where the record boundaries are.

>    2) Since we only have a small fixed number of file types in the
>       archive where this is a problem (tfm, pk, gf, pxl?) we could
>       write a 'fixup' command which converted a stream_lf or ra binary
>       file to the appropriate record format.

The font files are generally held as fixed length record files on
VAX/VMS whereas object files are held as variable length binary files.
I'm not sure that the conversion between variable length binary files
and stream binary files is a reversible process. (Since first writing
the above, I have confirmed that this is impossible; under VMS a stream
file has implicit record boundaries --- three `flavours' of stream use
LF, CR or CRLF as the marker.  Binary files don't necessarily contain
any of these characters, which would make the entire file into one
record --- VMS file reading is always record oriented because the RMS
services perform reads of rather larger entities than a character. 
Files that didn't contain any such marks would be limited to
a total size of 32kB by the blocking conducted by RMS.  Furthermore,
keeping all files in the Aston Archive in a stream format would prevent
the retrieval of any file in which more than 2kB appeared without an
intervening end-of-line mark in the stream, because of the Coloured
Books software.)

> Apart from those file types mentioned, I recommend that all other
> files in the archive are line-based text files which should get through
> most ftp implementations with their line-stucture preserved.

I agree, but others want to be able to fetch packages in .tar.Z format.

I've attached part of the very preliminary draft documentation for
VVCODE which I hope will explain why I have been forced to produce yet
another coding scheme.  

Any comments on the attached draft would be appreciated.

I'll leave the last words to Dominik:

> Yes, Neil, I see.  It really boils down to structured file support, I
> guess.  I have never had a VMS account: all I know is DOS and Unix, and
> I am a bit myopic because of that.
> 
> Doesn't VMS now support some kind of stream file format?  
> 
> Anyway, now I see the reason for VVencode, I shall swap to it.  It would
> be good if it could be spread very widely, even outside the TeX world.
> 
> Dominik


Niel Kempson



[Attached file: VVCODE.DOC]
%------------------------------------------------------------------------------

VVCODE PRELIMINARY DOCUMENTATION

Version 0.0 of 26 October 1990



1.    INTRODUCTION

Encoding schemes introduced to transmit binary files over text mail
systems.  Primary examples are:

      **TODO**

      a.    Hexadecimal
      b.    BOO
      c.    UUcode
      d.    XXcode

The known implementations of these schemes have been designed
primarily for operating systems with stream file systems.  They are
unsuitable for exchanging data between operating systems with
record/block oriented file systems (e.g. VAX/VMS, VM/CMS) where
different file formats are used for different types of files.  Some
encoding systems can be used to exchange data between operating
systems with record oriented file systems, but tend to be specific
to a particular operating system (e.g. TELCODE, MFTU for VAX/VMS).



2.    THE IDEAL CODING SYSTEM

After a review of the known encoding systems (shortly after XXCODE
was released last year), an outline specification of the "ideal"
coding scheme was drawn up.  The key points of the specification are:


      2.1   CODING SCHEME

      It should be possible to specify the coding table to be used
      to encode the data.  The coding table used shall be recorded
      with each part of the encoded data.

      If a recorded coding table is found while decoding the encoded
      data file, it should be used to construct an appropriate
      decoding table.  Simple one-to-one character corruptions should
      be corrected as long as only one of the input characters is
      mapped to any one output character.

      The default encoding/decoding table should avoid the
      corruptions commonly encountered when passing mail through
      badly-behaved gateways such as the UK.AC.EARN-RELAY EARN/JANET
      geteway.  The recommended table is the default XXcode table
      using only the characters:

            +-0123456789
            abcdefghijklmnopqrstuvwxyz
            ABCDEFGHIJKLMNOPQRSTUVWXYZ

      Encoded lines should be prefixed by an approprite character
      string to distinguish them from unwanted lines such as mail
      headers and trailers.  Lines should not end with whitespace
      characters as some mailers and operating systems strip off
      trailing whitespace.


      2.2   FILE SPLITTING

      The encoding program should be able to split the encoded output
      into parts, each no larger than a maximum specified size. 
      Splitting the output into smaller parts is useful if the
      encoded data is to transmitted using electronic mail or over
      unreliable network links that do not stay up long enough to
      transmit a large file.  The recommended default maximum part
      size is 30kB.

      The decoding program should be able to decode a multi-part
      encoded file very flexibly.  It should not be necessary to 

            a.    strip out mail headers and trailers.
            b.    combine all of the parts into one file in the
                  correct order.
            c.    process each part of the encoded data as a
                  separate file.


      2.3   VERIFICATION

      The encoding program should calculate parameters of the input
      file such as the number of bytes and CRC and record them at the
      end of the encoded data.

      The decoding program should calculate the same parameters from
      the decoded data and compare the values obtained from those
      recorded at the end of the encoded data.

      
      2.4   FILE ORGANIZATION

      The encoding program should be able to read different types of
      input file and record the organization of the file at the start
      of the encoded data.  This is not too important for operating
      systems with stream type file systems (e.g. Unix, MS-DOS) where
      files are simply written as streams of bytes, but is very
      important for operating systems with record oriented file
      systems (e.g. VAX/VMS, VM/CMS) where different types of file
      are organized in different ways.

      The decoding program should be able to use this information to
      create the output file using the organization appropriate to
      the operating system in use.

2.5COMPATIBILITY

      The encoding and decoding schemes should be able to read and
      write files compatible with one or more other coding schemes.


      2.6   AVAILABILITY

      The source code for the programs should be freely available. 
      It should also be portable and usable with as many computers,
      operating systems and compilers as possible.



3.    VVCODE

After scouring unsuccessfully around the networks and mailing lists
for such a coding system, we decided to implement yet another file
encoding scheme called VVCODE.  VVCODE is an extension to the
standard Unix UUcode utilities used for the transmission of (binary)
files over a medium capable of passing only text files.

The VVCODE encoding and decoding programs implement most of the
specification detailed above.  The features of VVENCODE and VVDECODE
are summarized below, keyed to the specification.


      3.1   CODING SCHEME

      The default coding table for both VVENCODE and VVDECODE is the
      standard XXcode table using the characters:

            +-0123456789
            abcdefghijklmnopqrstuvwxyz
            ABCDEFGHIJKLMNOPQRSTUVWXYZ

      The encoding table used by VVENCODE is recorded in the encoded
      data file.  

      If VVDECODE encounters an encoding table in the encoded data,
      it is used to construct a decoding table.   Simple one-to-one
      character corruptions can be corrected as long as only one of
      the input characters is mapped to any one output character.

      A command line qualifier can be used to override the coding
      table used by VVENCODE and VVDECODE.

      Each line of the VVENCODEd data has a unique prefix ("Vv") and
      suffix ("V").  VVDECODE ignores any lines in the input file
      that do not begin with this prefix such as mail headers and
      trailers.  The suffix is not used - it is present to avoid
      trailing whitespace on any line of the encoded data.

      3.2   FILE SPLITTING

      VVENCODE can split the encoded output into parts, each no
      larger than a maximum specified size.  The default maximum part
      size is 30kB.

      VVDECODE can decode a multi-part encoded file in a very
      flexible way.  The parts may be presented to VVDECODE in the
      following ways:

            a.    as one file containing all of the parts in any
                  order.
            b.    each part is in a separate file.  Ideally each
                  part number has the file extension ".v##", where
                  ## is the part number, but if VVDECODE cannot find
                  a file with this extension it will prompt the user
                  to supply the file specification for the part.
            c.    a combination a. and b., i.e. a number of files,
                  each containing one or more parts in any order

      Again the parts can be presented to VVDECODE as received; it
      is not necessary to remove mail headers or trailers.


      3.3   VERIFICATION

      VVENCODE calculates the number of bytes and the 16 bit CRC of
      the input file and records these parameters at the end of the
      encoded data.

      Whilst decoding, VVDECODE calculates the number of bytes and
      the 16 bit CRC of the decoded data.  If these parameters are
      recorded in the encoded data file the two versions are compared
      to verify the fidelity of the encoding/transmission/decoding
      process.


      3.4   FILE ORGANIZATION

**TODO**

modes:    binary, text          
file formats:    stream (Unix, MS-DOS, TOPS)
    fixed length records (VAX/VMS, VM/CMS)
    variable length records (VAX/VMS, VM/CMS ?)
record length:    specified and recorded in the VVCODE file


      3.5   COMPATIBILITY

      VVCODE cannot yet read or write encoded files compatible with other
      systems such as UUcode and XXcode.  Soon, VVCODE will be able to
      read UUcode and VVcode files, but not write them.


      3.6   AVAILABILITY

      The source code for VVCODE will be freely available (see section 6
      for conditions).

      VVCODE has been ported to most of the commonly used environments. 
      For a full list of the environments supported, see section 10.



4.    FORMAT OF A VVENCODED FILE

      4.1   PREFIXES AND SUFFIXES

      Vv prefix to help distinguish VVENCODEd lines from other lines
      such as mail headers and trailers
      
      V suffix to prevent lines ending in spaces which may be
      trimmed by certain mailers and file systems

      
      4.2   HEADER INFORMATION

            a.    mode
            b.    format
            c.    table
            d.    begin
            e.    skipfrom


      4.3   ENCODED DATA


      4.4   TRAILER INFORMATION

            a.    end
            b.    skipto
            c.    bytecount
            d.    crc16


5.    USING VVCODE

**TODO**

6.    AVAILABILITY OF VVCODE

The VVCODE programs may be freely copied and circulated to others,
provided that no fee (beyond reasonable media copying charges) is
levied.  The authors welcome bug reports and encourages suggestions
for porting to other environments and operating systems, by mail
(paper or electronic) or by telephone.  

If you port this program to a previously unsupported environment or
operating system, please feed your changes back to the authors so
that others may benefit.  Contributions received will be gratefully
acknowledged.



7.    PORTING VVCODE TO A NEW ENVIRONMENT

**TODO**

8.THE AUTHORS

Chief Architect:

      Niel Kempson,
      25 Whitethorn Drive,
      CHELTENHAM
      GL52 5LL
      England
      
      Tel: +44 242 579105 (home)

      E-mail:     TeX @ Uk.AC.Cranfield.RMCS
                  RMCS_TEX @ Uk.Ac.TeX

      

Advice and encouragement:

      Brian {Hamilton Kelly},
      School of Elec. Eng. & Science,
      Royal Military College of Science,
      Shrivenham, 
      SWINDON
      SN6 8LA
      England
      
      Tel: +44 793 785252 (office)

      E-mail:     TeX @ Uk.AC.Cranfield.RMCS
                  RMCS_TEX @ Uk.Ac.TeX
      

9.    ACKNOWLDEGEMENTS

16 bit CRC function and other general ideas:

      Nelson H. F. Beebe,
      Center for Scientific Computing,
      Department of Mathematics,
      220 South Physics Building,
      University of Utah,
      Salt Lake City,
      UT 84112

      E-mail:     beebe @ science.utah.edu



10.ENVIRONMENTS SUPPORTED

**TODO**



11.MODIFICATION HISTORY

**TODO**

-----------------------------------------------------------------------

%%% Further information about the TeXhax Digest, the TeX
%%% Users Group, and the latest software versions is available
%%% in every tenth issue of the TeXhax Digest.
%%%
%%% Concerning subscriptions, address changes, unsubscribing:
%%%
%%%  BITNET: send a one-line mail message to LISTSERV@xxx
%%%         SUBSCRIBE TEX-L <your name>    % to subscribe
%%%      or UNSUBSCRIBE TEX-L
%%%
%%% Internet: send a similar one line mail message to
%%%           TeXhax-request@cs.washington.edu
%%% JANET users may choose to use
%%%           texhax-request@uk.ac.nsf
%%% All submissions to: TeXhax@cs.washington.edu
%%%
%%% Back issues available for FTPing as:
%%%          machine:              directory:  filename:
%%%   JUNE.CS.WASHINGTON.EDU          TeXhax/TeXhaxyy.nnn
%%%              yy = last two digits of current year
%%%                       nnn = issue number
%%%
%%%\bye
%%%

End of TeXhax Digest
**************************
-------