[comp.lang.c] Data compression algorithms...

btrue@emdeng.Dayton.NCR.COM (Barry.True) (07/10/89)

Does anyone know of a data compression/decompression algorithm that can be
used to compress an eleven byte MS-DOS file mask (i.e., filename/ext.) so
that two of the bytes can be freed up for use? Sources would be helpful.

btrue@emdeng.Dayton.NCR.COM (Barry.True) (07/10/89)

Does anyone know of a data compression/decompression algorithm that we can
use to compress an eleven-byte MS-DOS file mase (i.e., filename/ext.) so
that two of the bytes can be freed up for use? Sources would be helpful.
Please respond via E-Mail.

Bob.Stout@p6.f506.n106.z1.fidonet.org (Bob Stout) (07/13/89)

In an article of <10 Jul 89 14:19:28 GMT>, Barry.True writes:

 >Does anyone know of a data compression/decompression algorithm that can be
 >used to compress an eleven byte MS-DOS file mask (i.e., filename/ext.) so
 >that two of the bytes can be freed up for use? 

The fact that MS-DOS file names are always upper case and may only consist of  
alphanumeric characters plus 15 additional characters (total of 51 characters  
out of 256 possible codes) might suggest something. Even assuming 64 valid  
characters would compress 11 bytes into 8.25 bytes, thereby freeing up the two  
bytes you need. 

richard@pantor.UUCP (Richard Sargent) (07/14/89)

> From: Bob.Stout@p6.f506.n106.z1.fidonet.org (Bob Stout)
> Message-ID: <16942.24BD4537@urchin.fidonet.org>
> Date: 13 Jul 89 07:18:57 GMT

[ stuff deleted ]

> The fact that MS-DOS file names are always upper case and may only consist of
> alphanumeric characters plus 15 additional characters (total of 51 characters
> out of 256 possible codes) might suggest something. 

Really? Try the following at your DOS prompt:

c> edlin test<ALT-1-2-8>.dat
...
c> dir .dat

I think you'll be surprised. All codes from 128 through 255 are acceptable
to DOS (actually, I only sampled the range, including boundary conditions,
so maybe ALT-1-6-0 won't work, but it is unlikely).

No codes between 0 and 31 seem to be acceptable.

My testing was done under Compaq DOS 3.31.


Richard Sargent                   Internet: richard@pantor.UUCP
Systems Analyst                   UUCP:     uunet!pantor!richard

tneff@bfmny0.UUCP (Tom Neff) (07/16/89)

One should not confuse the set of filenames MS-DOS will create and
access (via its INT 21H system calls) with the subset thereof that the
default shell COMMAND.COM is prepared to parse from your command line.
Using MS-DOS calls you can create a file using almost any characters
except '.' and NUL - the former delimits the name/extension boundary and
the latter denotes end of string.  (DOS 2.0+ routines assumed here.)
Several vendors take advantage of this to put strange filenames in their
directories, to keep Joe Dumb User from fooling with them or to avoid
innocent name space collisions.  COMMAND.COM, on the other hand, when it
parses your ASCII command line to extract things like filenames, has far
stricter rules.

This isn't really comp.lang.c but what the heck.
-- 
"My God, Thiokol, when do you     \\	Tom Neff
want me to launch -- next April?"  \\	uunet!bfmny0!tneff

Bob.Stout@p6.f506.n106.z1.fidonet.org (Bob Stout) (07/17/89)

In an article of <14 Jul 89 14:05:55 GMT>, (Richard Sargent) writes:

 >> From: Bob.Stout@p6.f506.n106.z1.fidonet.org (Bob Stout)
 >> The fact that MS-DOS file names are always upper case and may only consist 
 >> of alphanumeric characters plus 15 additional characters (total of 51 
 >> characters out of 256 possible codes) might suggest something. 
 >
 >Really? Try the following at your DOS prompt:
 >
 >c> edlin test<ALT-1-2-8>.dat
 >...
 >c> dir .dat
 >
 >I think you'll be surprised. All codes from 128 through 255 are acceptable

OK, you got me - DOS internally couldn't care less, but the docs for most  
versions of DOS will tell you that only A-Z, 0-9, and most special characters  
other than '*', '?', '.', and ' ' are legal. If you follow the rules (i.e.  
guaranteed to work for all versions of DOS including those yet to come) laid  
down my Microsoft and IBM, 64 characters should suffice. 

diamond@csl.sony.JUNET (Norman Diamond) (07/20/89)

In an article of <14 Jul 89 14:05:55 GMT>, (Richard Sargent) writes:

>>I think you'll be surprised. All codes from 128 through 255 are acceptable
[to MS-DOS].

In article <17193.24C32414@urchin.fidonet.org> Bob.Stout@p6.f506.n106.z1.fidonet.org (Bob Stout) writes:

>OK, you got me - DOS internally couldn't care less, but the docs for most  
>versions of DOS will tell you that only A-Z, 0-9, and most special characters  
>other than '*', '?', '.', and ' ' are legal. If you follow the rules (i.e.  
>guaranteed to work for all versions of DOS including those yet to come) laid  
>down my Microsoft and IBM, 64 characters should suffice. 

Some versions of MS-DOS are sold to the other 80% of the industrialized
world.  So some versions of MS-DOS permit some of those other characters
to be used.  The rules are not guaranteed and it is not a good idea to
"know" that 64 characters suffice.

--
-- 
Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.jp@relay.cs.net)
  The above opinions are inherited by your machine's init process (pid 1),
  after being disowned and orphaned.  However, if you see this at Waterloo or
  Anterior, then their administrators must have approved of these opinions.

spierk@turing.cs.rpi.edu (Kevin Spier) (08/16/90)

I am looking for a book which describes a variety of compression
schemes and algorithms along with sample C implementations. Please
e-mail your suggestions to me and will post a summary if there
is interest.

Kevin L. Spier
spierk@turing.cs.rpi.edu



Kevin L. Spier
spierk@turing.cs.rpi.edu