billc@mirror.TMC.COM (Bill Callahan) (04/21/88)
-There is a tantalizing article in the science section of the NYTimes -today. Does anyone know anything about it? The work was done by -Dr Micali of MIT along with "a loose group of mathematicians --- also -including Andrew Yao, Manuel Blum, Lenore Blum, Shafi Goldwasser, and -Michael Shub". The article in the Times presents the result as a way -to generate random numbers, but I wonder if that is the real -significance. A brief quote: - - "The new technique involves taking some starting number, - multiplying it by itself, dividing by the product of two - primes, taking the remainder and using it to repeat the - process over and over again. The mathematicians have proved - that there is a peculiar connection between the randomness of - the outcome and the difficulty of factoring large numbers. - Specifically, they have proved that if factoring large numbers - is truly hard --- as most believe likely --- the resulting - sequence will be indistinguishable from true randomness." - - "Conversely, if some structure or pattern could be found - in the sequence, that pattern would be, in effect, the - long-sought solution to the factoring problem." Of course, this means that we could use the a very simple encryption technique where we XOR each new random number with the next 'word' of our plaintext. If what the article talks about is true, as long as we pick big primes and don't allow our message to get longer than the product of them (to avoid cyclicity problems) the code should be hard to crack. Bill Callahan voice: 617-661-0777, ext. 149 Mirror Systems 2067 Massachusetts Avenue Cambridge, MA, 02140 billc@mirror.TMC.COM UUCP : {mit-eddie, pyramid, wjh12, cca, datacube}!mirror!billc
sham@arizona.edu (Shamim Pogner Mohamed) (04/22/88)
In article <12968@mirror.TMC.COM>, billc@mirror.TMC.COM (Bill Callahan) writes: > -There is a tantalizing article in the science section of the NYTimes > -today. ... [ text removed ] > - The mathematicians have proved - that there is a peculiar connection between the randomness of - the outcome and the difficulty of factoring large numbers. - Specifically, they have proved that if factoring large numbers - is truly hard --- as most believe likely --- the resulting > - sequence will be indistinguishable from true randomness." "Indistinguishable" by a polynomial-time probabilistic algorithm, I think. Most results have narrowed the restrictions further - if "squaring modulo a Blum-integer (where a Blum-integer is the product of two distinct primes both congruent to 3 mod 4) is a weak one-to-one one-way function" comes to mind (memory a little hazy on this). The generator has been referred to as a "Cryptographically Secure Pseudo-Random Bit Generator" (CSPRB generator) I'm not sure of the converse. Do the gurus out there have any opinions? References: Goldreich, Goldwasser & Micali - "How to construct random functions" (FOCS 1984) (A sort of measure of randomness) Blum & Micali - "How to generate cryptographically strong sequences of pseudo-random bits" (FOCS 1982) (CSPRB generators) Blum, Blum & Shub - "A simple secure pseudo-random number generator" (CRYPTO-82) Yao - "Theory and applications of trapdoor functions" (FOCS 82) (on the existence of CSPRB generators given a "weak one-way" function) -- Shamim Mohamed, Dept. of Computer Science, U of Arizona, Tucson AZ 85721 (602) 621-4891 {ihnp4,allegra,cmcl2...}!arizona!sham | sham@arizona.edu
alanf@bruce.OZ (Alan Grant Finlay) (03/06/90)
I have written an encription system for personal files on an IBM-PC compatible. The algorithm is designed to be fast enough for small files and totally secure if a key is chosen properly. I mean it should be secure against known plain text attack but not chosen plain text attack where large numbers of attempts are permitted. Included in this article are a source program for the system in C, a user manual, and a formal description of the algorithm. This is my first non trivial C program so I welcome any constructive criticism. I would also like to know if the formal description is implemented correctly in C. I am an amateur at cyphers so I am not certain that my claims about security are correct. I would appreciate any comments by those more knowledgeable about the security of this system. An archive with all relevant files including executables will be posted soon in comp.binaries.ibm.pc, included is a challenge file which is an encription of the source file using a key known only to me. If anyone would like to attempt a chosen plaintext attack I am willing to use this same key to encrypt a limited number of files. I am not sure if I need to supply a secret file encrypted using this key to make the challenge more meaningful. The program is currently called CRYPT 2.0, I apologise for using a name that has probably already been used but it's a bit late to change now. ---------------------------------------------------------------------------- User manual for Crypt --------------------- The source code "crypt.c" can be used to produce two executable files which are assumed to be called "encrypt" and "decrypt" by the program itself. The source is in Turbo C (Trademark) and can be compiled with the tiny memory model. Since use is made of system dependent functions for obtaining the input file size for encrypt, the program is not portable as is. To use encrypt and decrypt simply type the command name followed by the source and destination file names. You will be prompted for a password and status information is produced. The password is echoed since for the length of password intended it would be too easy to mess it up. The password should be a long sequence of characters such as a sentence with some strange words or characters in it. The maximum length is the same as the maximum block array width (100 as supplied). The password is scrolled off the screen after the line is terminated. The algorithm used was inspired by the Rubik's cube. The core algorithm works as follows: First a fixed block size is determined which is the product of two numbers "wide" and "high". The data is processed in a two dimensional array with these dimensions. If the last block cannot be entirely filled it is padded with junk characters. The password is used to determine a key on a column per character basis. The key value for each column (modulo the height) is the amount this column is shifted down. The part which emerges from the bottom is inserted into the top (i.e. a cyclic shift). The rows are similarly shifted but using the values 1,2,3,...,etc. This two stage process is repeated a fixed number of times (10+10) times in the original system). The result is a permutation of the array determined by the password alone. There is one major problem with this simple scheme. The use of a pure permutation is compromised by the need to pad the last block. Especially if the last block is mostly padding. It is very difficult to choose padding that cannot be distinguished from the intentional contents without revealing too much. The upshot of all this is that the scheme is modified slightly. For the first one half of the passes, after the usual column and row shifts, every second row is replaced by the exclusive-or of itself and the previous row character by character. This means the shuffling process is now more than just a pure permutation but alters values according to the history of the shuffle. As a precaution the rest of the passes are pure permutations. A crude analogy would be shuffling cards with the printing still wet. Fortunately the process is reversible when done digitally. In addition to this precaution the block size is usually chosen so that the last block is at most a quarter padding (the rare exceptions are reported when they occur). Finally the padding is chosen by randomly selecting bytes from the old contents of the array (usually bytes from this or a previous pass). If the input file is smaller than 200 bytes it is rejected. This may seem to be over cautious but remember that if a password is discovered by cracking an easy case, every file encoded with that same password is available. It is intended that users will not use many distinct passwords since they should never be written down and need to be long. The security of the system is due more to the choice of password and particularly its length. Passwords which are too short are rejected. Naturally the password in not recorded permanently anywhere by the encryption program. The easiest way to choose a good password is to use a whole sentence (spaces and punctuation are allowed) with some strange characters in it. It is relatively easy to try out all short sentences with words from a standard dictionary. Brute force cracking of an encrypted file with n blocks of block size b, a password of p units chosen from a set of q values and with s shuffles per block requires O(s * b * (q^p)) operations. Obviously increasing q and/or p pays great dividends. For a pure permutation it would also be possible to try all b! rearrangements of a block. This is ruled out by the minimum block size being 100. It is clear that increasing s is not an effective way of improving security. The time taken by the algorithm is O(s * b * n). Since (b*n) cannot be changed s should be chosen as small as is considered safe. The chosen value is as small as seems secure but is not chosen for any real theoretically determined reason. Note that s is a compile time constant. When choosing block dimensions adequate width is more important than adequate height. If the width is shorter than the password then the tail of the password is not used. The minimum block dimensions are approximately 24 * 8 so that the algorithm has enough height to minimise the chance of short cycles. Every shuffle is identical and could contain a small number of short cycles (i.e. a small set of positions whose contents always remain within the set). Experiments seem to indicate this is not likely with reasonably sized blocks (i.e. >200 bytes). No theoretical analysis of this issue has been attempted. Note however that the row shifts are chosen as 1..n for a column or row of n+1 positions hence there is never a zero row shift. So far the algorithm's strength depends critically upon the variety of the input. As described so for if you input a file with just space characters, that is mostly what you will get back. This applys on a block by block basis. To avoid this problem the algorithm is given one final twist, a simple substitution is performed before the first shuffle. This substitution should suffice to eliminate most simple patterns. Some final operational advice. Confidential documents should be created using a ram disk or a floppy which will be destroyed. Note that deleting and even overwriting magnetic media does not prevent the data's recovery. Watch out for editors which keep temporary copies of files which they are working upon in places of their own choice. A deleted temporary file is often easily recovered and you may not even be aware of its existence. ------------------------------------------------------------------------- /***********************************************************************/ /* (c) Copyright 1989, 1990, Alan Finlay. Beta version 2.0 . */ /* This program is intended for Public Domain, and may not be sold or */ /* marketed in any form without the permission and written consent of */ /* the author. I retain all copyrights to this program, in either the */ /* original or modified forms, and no violation, deletion, or change of */ /* the copyright notice is allowed. Every effort has been made to ensure */ /* this product provides a secure encryption system. Mistakes can be */ /* made however both in the theoretical basis and implementation of such */ /* a system. See the associated documentation for a discussion of known */ /* limitations. The source code has been provided so you can inspect it */ /* and use it at your own risk. I will accept no responsibility for any */ /* loss, damage, or compromised data caused directly or indirectly by */ /* this program. It is released on an "as is" basis with no warranty */ /* as to its being fit or suitable for encrypting any form of data. */ /***********************************************************************/ #include <stdio.h> #include <stdlib.h> #include <math.h> #include <dos.h> #include <time.h> #include <sys\stat.h> #define ENCRYPT 1 /* Choose encrypt (1) or decrypt (0) */ #define DEBUG 0 /* Show page after each shuffle if non zero */ #define TRUE -1 #define FALSE 0 #define LIMIT 100 #define SECURE 10 typedef unsigned char line[LIMIT]; char copyright[40] = "(c) copyright 1989,1990, Alan Finlay"; unpat(page,wide,high) /* Simple substitution to avoid patterns */ line page[LIMIT]; /* [width,height] */ int wide,high; { int i,j,k; k = 0; for (i=0;i<wide;i++) for (j=0;j<high;j++) { k = (k+7)%128; page[i][j] = page[i][j] ^ k; } } #if ENCRYPT shuffle(page1,code,wide,high) line page1[LIMIT]; /* [width,height] */ line code; int wide,high; { int i,j,k,key,shift; line *mix1,*mix2; line *oldline,page2[LIMIT]; /* [height,width] */ for (k=0;k<(SECURE*2);k++) { /* See mixup below (2 security phases) */ #if DEBUG show(page1,wide,high); #endif /* Shift columns */ for (i=0;i<wide;i++) { oldline = page1[i]; key = (int) code[i]; for (j=0;j<high;j++) page2[j][i] =(*oldline)[(j+key)%high]; } /* Mixup */ if (k<SECURE) { for (j=1;j<high;j+=2) { mix1 = page2[j-1]; mix2 = page2[j]; for (i=0;i<wide;i++) (*mix2)[i] = (*mix2)[i]^(*mix1)[i]; } } /* Shift rows */ for (j=0;j<high;j++) { oldline = page2[j]; shift = (j%(wide-1))+1; for (i=0;i<wide;i++) page1[i][j] = (*oldline)[(i+shift)%wide]; } } show(page1,wide,high); /* For text, user can check for blunders */ } #else unshuffle(page1,code,wide,high) line page1[LIMIT]; /* [width,height] */ line code; int wide,high; { int i,j,k,key,shift; line *mix1,*mix2; line *newline,page2[LIMIT]; /* [height,width] */ for (k=0;k<(SECURE*2);k++) { /* Check this times 2 as in shuffle */ #if DEBUG show(page1,wide,high); #endif /* Shift rows back */ for (j=0;j<high;j++) { newline = page2[j]; shift = wide-(j%(wide-1))-1; for (i=0;i<wide;i++) (*newline)[i] = page1[(i+shift)%wide][j]; } /* Reverse mixup */ if (k>=SECURE) { /* Note matching code in shuffle */ for (j=1;j<high;j+=2) { mix1 = page2[j-1]; mix2 = page2[j]; for (i=0;i<wide;i++) (*mix2)[i] = (*mix2)[i]^(*mix1)[i]; } } /* Shift columns back */ for (i=0;i<wide;i++) { newline = page1[i]; key = (int) code[i]; for (j=0;j<high;j++) (*newline)[(j+key)%high] = page2[j][i]; } } } #endif show(page,wide,high) line page[LIMIT]; int wide,high; { int i,j; puts("\n"); for (j=0;j<high;j++) { putc('\n',stdout); for (i=0;i<wide;i++) { if (page[i][j]<30) putc('*',stdout); else putc(page[i][j],stdout); } } } main (argc,argv) int argc; char *argv[]; { FILE *infile,*outfile; int wide,high,i,j,k; /* Block width and height, loop counters */ int blkn = 1; /* Block counter */ int clen; /* Password code length */ int ch = 0; int invers; /* Version of input file for decrypt */ int vers = 2; /* Version of this program */ line page[LIMIT],code; #if ENCRYPT int chrcnt; /* Character counter */ long fsize; /* Input file size */ int blocksize,nblocks; struct time t; /* For system time */ struct stat st; /* For input file stats */ /* Randomise the rand() function */ gettime(&t); srand(t.ti_min*400+t.ti_sec*100+t.ti_hund); /* random seed <30000 */ /* Check the input arguments */ if (argc!=3) {puts("\nUsage is: encrypt src dst\n"); exit(1);} #else int blkcnt; /* Check the input arguments */ if (argc!=3) {puts("\nUsage is: decrypt src dst\n"); exit(1);} #endif if ((infile = fopen(argv[1],"rb")) == NULL) { printf("\n%s",argv[1]); perror(" "); exit(1);} if ((outfile = fopen(argv[2],"wb")) == NULL) { printf("\n%s",argv[2]); perror(" "); exit(1);} #if ENCRYPT /* Get input file size */ if (stat(argv[1],&st)!=0) {perror(" "); exit(1);} fsize = st.st_size; printf("The input file size is %ld\n",fsize); /* Choose block size accordingly */ if (fsize<200) {puts("Input file is too small"); exit(1);} if (fsize<(LIMIT*LIMIT)) blocksize = (int) fsize; else { nblocks = (int) (fsize/(LIMIT*LIMIT)+1); blocksize = (int) (fsize/nblocks+1); } wide = ((int) sqrt((double)blocksize))+10; if (wide>LIMIT) wide = LIMIT; high = blocksize/wide+1; if (high>LIMIT) high = LIMIT; while (1) { blocksize = wide*high; if (fsize<(long) blocksize) break; else { /* Multiple blocks, check for last block too small */ if (((fsize-1)%blocksize)>(blocksize*3/4)) break; /* (fsize-1) is used above so perfect fit is accepted! */ high--; wide--; /* Try a smaller block */ } if (wide<50) { puts("It was necessary to use a short last block"); puts("To correct this change the input file size"); puts("Otherwise there is a minor security compromise"); break; } } printf("The width and height are (%d,%d)\n",wide,high); printf("The last block is %ld bytes\n",((fsize-1)%blocksize)+1); fprintf(outfile,"%d,%d,%d,",vers,wide,high); #else fscanf(infile,"%d,%d,%d,",&invers,&wide,&high); if (invers!=vers) {puts("Input file is version incompatible");exit(1);} #endif /* Get password */ while(1) { puts("\nPlease enter your password"); gets(code); clen = strlen(code); if (clen>LIMIT) {puts("Password too long - abort"); exit(1);} if (clen>9) break; puts("Insecure password, try a longer one."); puts("For security do not use a name or word in any dictionary."); puts("For example use something like \"Dazed and Konfuzed\""); } for (i=0;i<25;i++) puts(" "); /* Clear the screen */ if (clen>wide) puts("Warning: tail of password ignored"); /* Extend password to possible limit */ for (i=clen;i<LIMIT;i++) code[i] = code[i%clen] ^ '\152'; do { /* crypt a page */ #if ENCRYPT chrcnt = 0; #else if (fscanf(infile,"%d,",&blkcnt)==EOF) goto NOBLOCK; #endif for (j=0;j<high;j++) { for (i=0;i<wide;i++) { if ((ch = getc(infile)) != EOF) { page[i][j] = ch; #if ENCRYPT chrcnt++;} else if (i==0 && j==0) goto NOBLOCK; /* EOF at start of block! */ /* Pad the last block with existing junk */ else page[i][j] = page[rand()%wide][rand()%high]; #else ;} else {puts("Error: unexpected end of file"); goto NOBLOCK;} #endif } } #if ENCRYPT fprintf(outfile,"%d,",chrcnt); unpat(page,wide,high); shuffle(page,code,wide,high); for (j=0;j<high;j++) for (i=0;i<wide;i++) putc(page[i][j],outfile); #else unshuffle(page,code,wide,high); unpat(page,wide,high); for (j=0;j<high;j++) for (i=0;i<wide;i++) if ((j*wide+i)<blkcnt) putc(page[i][j],outfile); #endif printf("\nFinished block number %d\n",blkn++); } while (ch != EOF); NOBLOCK: /* Jump here to avoid writing an empty block */ fclose(infile); fclose(outfile); } ---------------------------------------------------------------------------- Formal description of the algorithm ----------------------------------- A block size is determined according to the size of the input file. For small files there will be only one block. Each block has two dimensions - wide and high. The last block usually needs some padding which is chosen from whatever happens to be in the array already. The junk is selected randomly though the random number generator has only about 30,000 possible seeds. This is probably much ado about nothing, the padding could probably be just null characters without any loss in security. The password or key is a string supplied by the user, this is extended to the maximum length (100) by repetition however the repeated strings are modified by exclusive-or '\152' to help extract the maximum information from the supplied string (since later only the remainder mod high will be utilised). For each block, let us call it page[x,y] where 0<=x<wide and 0<=y<high. ----------------------------------------------------------------------- 1) Perform a simple substitution: page[x,y] := page[x,y] XOR ((x*high+y)*7 mod 128) 2) Do 10 special shuffles: (each line is for all x and y concurrently) page[x,y] := page[x,(y+key[x]) mod high] if y is odd then page[x,y] := page[x,y-1] XOR page[x,y] page[x,y] := page[(x+(y mod (wide-1))+1) mod wide,y] 3) Do 10 normal shuffles: (each line is for all x and y concurrently) page[x,y] := page[x,(y+key[x]) mod high] page[x,y] := page[(x+(y mod (wide-1))+1) mod wide,y] Each step in the process is easily reversible. ------------------------------------------------------------------------- The encrypted file format is: 1) three 16 bit integers: program version number (= 2), wide, high. 3) for each block: one 16 bit integer = number of chars in block (non padding). the block of bytes: for j:=0..high-1 for i:=0..wide-1 page[i,j] ----------------------------<end>-------------------------------------
hal@gateway.mitre.org (Hal Feinstein) (03/07/90)
This is kind of a silly question. I recently sat through a college level course on telecommunications security that had a lot of elementary code breaking stuff in it. Of all the subjects to get hung up on, someone in the class asked if in counting 2-grams (bigrams) you shoud count every pair of letters or to tally each pair by sliding a pointer down one character at a time and taking the character pointed to and the one to its immediate right as a pair. Well, the class divided. The heck with traffic analysis and clandestine intercept, this was really important! Since then, blackboards have been filled with math and the college halls filled with people shouting themselves hoarse over this, the great bigram affair. Really! Now, I put it to you in networld: which is the "correct" way to count bigrams?
cme@zircon.sw.stratus.com (Carl Ellison) (03/08/90)
In article <100295@linus.UUCP> hal@gateway.mitre.org (Hal Feinstein) writes: >Of all the subjects to get hung up on, someone in the class >asked if in counting 2-grams (bigrams) you shoud count every pair >of letters or to tally each pair by sliding a pointer down one >character at a time and taking the character pointed to and the one >to its immediate right as a pair. [...] > >Now, I put it to you in networld: which is the "correct" way to >count bigrams? > The answer is: BOTH If the resulting statistics are not identical (by standard statistical tests), then you've discovered that your source text is naot stationary -- a very interesting result which should be investigated. --Carl
michael@xanadu.com (Michael McClary) (03/10/90)
In article <1877@bruce.OZ> alanf@bruce.OZ (Alan Grant Finlay) writes: >I have written an encription system for personal files on an IBM-PC compatible. >[] Included in this article are a source program for the system in C, ... >The source code "crypt.c" You should be aware that "crypt" is used on unix systems to refer to a particular utility (a one-rotor Enigma variant) and a particular subroutine (the DES-variant used for password encryption). Naming your source file "crypt.c" invites the same sort of confusion as naming a home-brew editor "vi", "ed", or "emacs", and may tend to bother unix hax. You might want to find another name.
michael@xanadu.com (Michael McClary) (03/10/90)
In article <100295@linus.UUCP> hal@gateway.mitre.org (Hal Feinstein) asks
the "correct" way to count bigrams:
- Each odd character, followed by its successor.
- Each character, followed by its successor.
If what you're looking for is bigram frequency, to check for caeser
cypers and the like, the second is the way to go. Consider that the
first approach will miss every instance of "th" that starts on an
even-numbered position.
In cyphers where each character is treated separately and similarly,
you can expect the two methods to produce similar results, but the
first to require twice as much cyphertext to give you an equivalent
amount of information.
If the cypher is chunking the data (with an even number of characters
per chunk), or otherwise treating odd and even cyphertext characters
differently, the first method might give you more useful information.
hal@gateway.mitre.org (Hal Feinstein) (03/13/90)
I'd like to thank all those who responded with the "correct" way to count bigrams. Interestingly, the standard reference work on this subject, Kullback's Statistical Methods in Cryptanalysis is silent on the subject. Here's a summary of what people said on the subject: The debate in our telecommunications security class was over how to do the 2-gram correlation technique for solving monoalphabet ciphers suggested in Konheim p. 79 (Cryptography, A Primer). At that point we had not considered more complex ciphers but got stuck (for a while) trying to figure this out. At least one reply said it was obvious. Well, some of us are still learning. I can't say it obvious right now. To paraphrase what most people said. (1)If the symbols in the underlying alphabet are single letters (simple transposition for example) then sliding one character at a time is OK. (2) If the symbols in the underlying alphabet are blocks of characters, for example two characters per symbol, then slide by the block length. This makes sense since forming bigrams by sliding one character a time across a bigram substitution cipher fractures the bigrams, probabily wrecking the accuracy of the tally. The problem is what to do when confronted with a unknown cryptogram? How should you do the tally? One reply stated that if a null had been inserted as the first character and you counted on even boundaries, you would get junk. So best to take both types of counts: One starting on the odd boundary and the other on an even boundary. Another said: just discover how your bigram table was counted and make sure you count in the same way. Hmmm... One respondent advised using Markov analysis to discern hidden processes as a way to discover the underlying cipher. OK, seems reasonable. Another reply suggested that the ciphertext is really generated by a non-stationary statistical source which explains why decimated bigram averages would differ.
cooper@cadsys.enet.dec.com (03/14/90)
This is in regards to whether to count discrete or overlapping digrams. I'm surprised that no one seems to have pointed out the "answer" to this, since it is part of standard statistics. It depends on the nature of the tests which you are performing on them. Overlapping digrams give you twice as much data, but the frequencies obtained are not independent. For example, it is not valid to use a chi-square test on overlapping digrams to determine if the digrams deviate significantly from chance distribution, since the chi-square test assumes that the frequencies are independent. On the other hand it is perfectly valid to estimate the probability that an X will be followed by a Y on the basis of overlapping digrams. In practice, except for a few hard-and-fast yes/no automatic procedures, in cryptographic work the increase in information from using overlapping digrams (and trigrams, etc.) will generally overshadow the slight loss in pure validity, since the information collected is likely to be leavened with a large dose of intuitive, trial-and-error adjustment by the cryptoanalyist in later phases of the work. Topher Cooper