[sci.crypt] Data Compression Algorithms

smvorkoetter@watmum.UUCP (04/06/87)
In article <775@viper.UUCP> john@viper.UUCP (John Stanley) writes:
>In article <383@newton.praxis.co.uk> mct%praxis.uucp@ukc.ac.uk writes:
> >Once apon a time, about five years ago, a UK company called DJAI, (the
> >authors of The Last One - a 4GL program generator) said that they had an
> >algorithm which would (reversibly) compress arbitrary text into 40-80 bytes.
> >They even had a demonstration system which claimed to show the algorithm in
> >operation.  
> >I expect the algorithm has been suppressed by the UK MoD :-)
>
>  This is a facinating story...  Anyone else ever hear about this?
>  Anyone know -anything- (or have any guesses) about the algorithm?

This is not possible.  Here is a proof.
(I am using ^ as a symbol for exponentiation)

	Consider all possible strings of 80 bytes.
	There are 80 * 8 = 640 bits in such a string.
	Thus there are 2^640 such strings.
	Similarily there are 2^639 strings of 639 bits.
	So, there are 2^640 + 2^639 + 2^638 + ... + 1 strings of
	less than or equal to 80 bytes.
	This adds up to 2^641 such strings.

	No consider the set of all text files.  Assume we limit ourselves
	to just the uppercase and lowercase alphabet, and the space 
	character.  So we have a character set of 53 characters.  There
	are a total of

	   1 + 53 + 53^2 + 53^3 + ... 53^MAX_FILE_SIZE

	text files.  Since the algorithm is reversible, each possible
	text file must have a unique representation when compressed.
	Therefore, there must be enough unique representations.  Thus,

	   2^641 <= 1 + 53 + 53^2 + ... 53^MAX_FILE_SIZE

	which is clearly not possible unless MAX_FILE_SIZE is quite
	small (much less than 641 characters).

	Q.E.D.

	Stefan Vorkoetter
	University of Waterloo