djones@megatest.UUCP (Dave Jones) (12/18/87)
in article <1070@kulcs.UUCP>, luc@kulcs.UUCP (Luc Van Braekel) says: > > In article <1400@tulum.swatsun.UUCP>, hirai@swatsun (Eiji "A.G." Hirai) writes: >> In our recent ACM programming contest (regionals), one of the >> problems was to write a self-replicating program. That is, we had to >> write a program whose output was itself, the source code. No alterations >> of the original code during execution was allowed (I think). >> Does anyone have any code for this problem? We have one but >> it looks inelegant. I've also see bery bery short Prolog code for this. >> Help, we are looking for good codes to study! And yes, the contest is >> over (we ain't cheating). > > Here is a self-replicating Pascal program I wrote a few years ago. > The program looks dirty but it works ! > > program self (output); > var i,j: integer; > a: array[1..8] of packed array[1..59] of char; begin > a[1] := 'program self (output); '; > a[2] := 'var i,j: integer; '; > a[3] := ' a: array[1..8] of packed array[1..59] of char; begin '; > a[4] := 'for i := 1 to 3 do writeln(a[i]); '; > a[5] := 'for i := 1 to 8 do begin write('' a['',i:0,''] := '',chr(39));'; > a[6] := 'for j := 1 to 59 do begin write(a[i][j]);if a[i][j]=chr(39)'; > a[7] := 'then write(a[i][j]) end; writeln(chr(39),'';'') end; '; > a[8] := 'for i := 4 to 8 do writeln(a[i]) end. '; > for i := 1 to 3 do writeln(a[i]); > for i := 1 to 8 do begin write(' a[',i:0,'] := ',chr(39)); > for j := 1 to 59 do begin write(a[i][j]);if a[i][j]=chr(39) > then write(a[i][j]) end; writeln(chr(39),';') end; > for i := 4 to 8 do writeln(a[i]) end. > ... It's real easy to do in C. It's a bit more of a challenge to make it character-set independent. In the program you give, for example, the chr(39) is the ascii code for a single quote mark. The program will not work on an ebcdic machine. Here's a character-set independant one I did a couple of years ago. I wrote a program to write it. It is fully commented, and replicates its comments. A shell-archive, including the program which builds the self-replicating program follows, for all you unix owners. It is quite instructive. /* This program prints its source. */ main(argc, argv) char** argv; { char * dna = "/* This program prints its source. */\n\nmain(argc, argv)\n\ char** argv;\n{\n char * dna =\n\nZ;\n\n\n express(dna)\ ;\n exit(0);\n}\n\n\n/* Express the string, substituting a\ quotation of the string \n** for the character 'Z'. Break\ s the literal into lines of no\n** more than 60 chars.\n*/\n\ express(str)\n char* str;\n{\n char* ptr = str;\n char c\ h;\n int is_quoted = 0;\n\n while(ch = *ptr++)\n {\n\n\ if(ch == 'Z' && !is_quoted)\n\t{\n\t int count = 1;\n\ \t char* ptr = str;\n\t char ch;\n\t putchar('\"');\n\t \ while(ch = *ptr++)\n\t {\n\t switch(ch)\n\t {\ \n\t\tcase '\\n': printf(\"\\\\n\"); count +=2; break;\n\t\ \tcase '\\t': printf(\"\\\\t\"); count +=2; break;\n\t\tca\ se '\\\\': printf(\"\\\\\\\\\"); count +=2; break;\n\t\tcas\ e '\"': printf(\"\\\\\\\"\"); count +=2; break;\n\t\tdefau\ lt: putchar(ch); count +=1; break;\n\t }\n\t \ if(count >= 59)\n\t\t{ printf(\"\\\\\\n\");\n\t\t count =\ 0;\n\t\t}\n\t }\n\t putchar('\"');\n\t}\n\n else \ putchar(ch);\n is_quoted = ( ch == '\\'');\n }\n}\n\ "; express(dna); exit(0); } /* Express the string, substituting a quotation of the string ** for the character 'Z'. Breaks the literal into lines of no ** more than 60 chars. */ express(str) char* str; { char* ptr = str; char ch; int is_quoted = 0; while(ch = *ptr++) { if(ch == 'Z' && !is_quoted) { int count = 1; char* ptr = str; char ch; putchar('"'); while(ch = *ptr++) { switch(ch) { case '\n': printf("\\n"); count +=2; break; case '\t': printf("\\t"); count +=2; break; case '\\': printf("\\\\"); count +=2; break; case '"': printf("\\\""); count +=2; break; default: putchar(ch); count +=1; break; } if(count >= 59) { printf("\\\n"); count = 0; } } putchar('"'); } else putchar(ch); is_quoted = ( ch == '\''); } } # CUT HERE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # # This is a shell-achive. # # UNPACKING INSTRUCTIONS # # 1. Remove everything above the "CUT HERE" line. # 2. Move the resulting text (this archive) to an empty directory. # 3. Change (cd) to that directory. # 4. Type "/bin/sh" followed by the archive name. Example: # # /bin/sh archive # # The archived files are: # # selfrep.c # Makefile # buffered_file.c # express.c # conprog.c # soup.Z # #****************************************************************** if /bin/test -f selfrep.c then echo File \"selfrep.c\" already exists. Not overwritten. else # # echo x - selfrep.c sed 's/^XX//' > selfrep.c <<'@//E*O*F selfrep.c//' XX/* This program prints its source. */ XX XXmain(argc, argv) XX char** argv; XX{ XX char * dna = XX XX"/* This program prints its source. */\n\nmain(argc, argv)\n\ XX char** argv;\n{\n char * dna =\n\nZ;\n\n\n express(dna)\ XX;\n exit(0);\n}\n\n\n/* Express the string, substituting a\ XX quotation of the string \n** for the character 'Z'. Break\ XXs the literal into lines of no\n** more than 60 chars.\n*/\n\ XXexpress(str)\n char* str;\n{\n char* ptr = str;\n char c\ XXh;\n int is_quoted = 0;\n\n while(ch = *ptr++)\n {\n\n\ XX if(ch == 'Z' && !is_quoted)\n\t{\n\t int count = 1;\n\ XX\t char* ptr = str;\n\t char ch;\n\t putchar('\"');\n\t \ XX while(ch = *ptr++)\n\t {\n\t switch(ch)\n\t {\ XX\n\t case '\\n': printf(\"\\\\n\"); count +=2; break\ XX;\n\t\tcase '\\t': printf(\"\\\\t\"); count +=2; break;\n\t\ XX case '\\\\': printf(\"\\\\\\\\\"); count +=2; break\ XX;\n\t case '\"': printf(\"\\\\\\\"\"); count +=2; br\ XXeak;\n\t default: putchar(ch); count += 1; break;\n\t\ XX }\n\t if(count >= 59)\n\t\t{ printf(\"\\\\\\n\")\ XX;\n\t\t count = 0;\n\t\t}\n\t }\n\t putchar('\"');\n\t\ XX}\n\n else putchar(ch);\n is_quoted = ( ch == '\\\ XX'');\n }\n}\n"; XX XX XX express(dna); XX exit(0); XX} XX XX XX/* Express the string, substituting a quotation of the string XX** for the character 'Z'. Breaks the literal into lines of no XX** more than 60 chars. XX*/ XXexpress(str) XX char* str; XX{ XX char* ptr = str; XX char ch; XX int is_quoted = 0; XX XX while(ch = *ptr++) XX { XX XX if(ch == 'Z' && !is_quoted) XX { XX int count = 1; XX char* ptr = str; XX char ch; XX putchar('"'); XX while(ch = *ptr++) XX { XX switch(ch) XX { XX case '\n': printf("\\n"); count +=2; break; XX case '\t': printf("\\t"); count +=2; break; XX case '\\': printf("\\\\"); count +=2; break; XX case '"': printf("\\\""); count +=2; break; XX default: putchar(ch); count += 1; break; XX } XX if(count >= 59) XX { printf("\\\n"); XX count = 0; XX } XX } XX putchar('"'); XX } XX XX else putchar(ch); XX is_quoted = ( ch == '\''); XX } XX} @//E*O*F selfrep.c// fi if /bin/test -f Makefile then echo File \"Makefile\" already exists. Not overwritten. else # # echo x - Makefile sed 's/^XX//' > Makefile <<'@//E*O*F Makefile//' XX XXproof: selfrep XX selfrep > temp.c; diff temp.c selfrep.c; rm -f temp.c XX XXselfrep: selfrep.c XX cc selfrep.c -o selfrep XX XXselfrep.c: soup.Z conprog express.c XX /lib/cpp -P -C soup.Z > tempfile; conprog tempfile > selfrep.c; \ XX rm -f tempfile XX XXconprog: conprog.o express.o buffered_file.o XX cc conprog.o express.o buffered_file.o -o conprog XX XXclean: XX -rm -f selfrep selfrep.c conprog *.o @//E*O*F Makefile// fi if /bin/test -f buffered_file.c then echo File \"buffered_file.c\" already exists. Not overwritten. else # # echo x - buffered_file.c sed 's/^XX//' > buffered_file.c <<'@//E*O*F buffered_file.c//' XX#include <sys/param.h> XX#include <sys/file.h> XX#include <sys/dir.h> XX#include <sys/stat.h> XX#include <errno.h> XX XX/************************************************************************* XX** This procedure buffers up a named file, and returns a pointer to the XX** buffer. If something goes wrong, it returns NULL, and errno will have XX** been set. XX*/ XX XXchar* XXbuffered_file(file_name) XX char* file_name; XX XX{ XX XX char* file_buffer; XX struct stat stat_buf; XX int fd = open(file_name, O_RDONLY, 0); XX XX if (fd < 0 || fstat( fd, &stat_buf) == -1) XX return 0; XX XX file_buffer = (char*)malloc(stat_buf.st_size + 1); XX XX if(file_buffer == 0) XX return 0; XX XX if( read( fd, file_buffer, stat_buf.st_size ) != stat_buf.st_size ) XX { XX int error = errno; XX close(fd); XX free(file_buffer); XX errno = error; XX return 0; XX } XX XX file_buffer[stat_buf.st_size] = '\0'; XX XX close(fd); XX XX return file_buffer; XX XX} /* end.. buffered_file */ XX/****************************************************************************/ XX XX XX @//E*O*F buffered_file.c// fi if /bin/test -f express.c then echo File \"express.c\" already exists. Not overwritten. else # # echo x - express.c sed 's/^XX//' > express.c <<'@//E*O*F express.c//' XX/* Express the string, substituting a quotation of the string XX** for the character 'Z'. Breaks the literal into lines of no XX** more than 60 chars. XX*/ XXexpress(str) XX char* str; XX{ XX char* ptr = str; XX char ch; XX int is_quoted = 0; XX XX while(ch = *ptr++) XX { XX XX if(ch == 'Z' && !is_quoted) XX { XX int count = 1; XX char* ptr = str; XX char ch; XX putchar('"'); XX while(ch = *ptr++) XX { XX switch(ch) XX { XX case '\n': printf("\\n"); count +=2; break; XX case '\t': printf("\\t"); count +=2; break; XX case '\\': printf("\\\\"); count +=2; break; XX case '"': printf("\\\""); count +=2; break; XX default: putchar(ch); count += 1; break; XX } XX if(count >= 59) XX { printf("\\\n"); XX count = 0; XX } XX } XX putchar('"'); XX } XX XX else putchar(ch); XX is_quoted = ( ch == '\''); XX } XX} @//E*O*F express.c// fi if /bin/test -f conprog.c then echo File \"conprog.c\" already exists. Not overwritten. else # # echo x - conprog.c sed 's/^XX//' > conprog.c <<'@//E*O*F conprog.c//' XX XXmain(argc, argv) XX char** argv; XX{ XX express(buffered_file(argv[1])); XX exit(0); XX} @//E*O*F conprog.c// fi if /bin/test -f soup.Z then echo File \"soup.Z\" already exists. Not overwritten. else # # echo x - soup.Z sed 's/^XX//' > soup.Z <<'@//E*O*F soup.Z//' XX/* This program prints its source. */ XX XXmain(argc, argv) XX char** argv; XX{ XX char * dna = XX XXZ; XX XX XX express(dna); XX exit(0); XX} XX XX#include "express.c" @//E*O*F soup.Z// fi echo finished # # END OF ARCHIVE
ADLER1%BRANDEIS.BITNET@CUNYVM.CUNY.EDU (12/19/87)
I enjoyed the recent posting of programs written in high level languages whiich replicate their own source code. But upon reflection, it occurred to me that what I would really love to see would be a program which replicates the source code of other programs. How often has one obtained a copy of executable code with no source and no documentation ? It would be great to have a program which could throw together source code which could plausibly have given rise to the given executable, preferably in a choice of higher level languages. Has anyone ever encountered such a facility ? ADLER1@BRANDEIS.BITNET
kurt@hi.UUCP (Kurt Zeilenga) (12/20/87)
In article <10921@brl-adm.ARPA> ADLER1%BRANDEIS.BITNET@CUNYVM.CUNY.EDU writes: >I enjoyed the recent posting of programs written in high level languages >whiich replicate their own source code. But upon reflection, it occurred >to me that what I would really love to see would be a program which >replicates the source code of other programs. How often has one obtained >a copy of executable code with no source and no documentation ? It would >be great to have a program which could throw together source code which >could plausibly have given rise to the given executable, preferably in >a choice of higher level languages. > >Has anyone ever encountered such a facility ? > >ADLER1@BRANDEIS.BITNET A lot of systems actually do provide such a program. It's called a disassembler. -- Kurt (zeilenga@hc.dspo.gov)
lampman@heurikon.UUCP (Ray Lampman) (12/23/87)
> In article <10921@brl-adm.ARPA> ADLER1%BRANDEIS.BITNET@CUNYVM.CUNY.EDU writes: > It occurred to me that what I would really love to see would be a program > which replicates the source code of other programs. It would be great to have > a program which could throw together source code which could plausibly have > given rise to the given executable, preferably in a choice of higher level > languages. Has anyone ever encountered such a facility? > > ADLER1@BRANDEIS.BITNET It sounds like you're talking about translating object code into the source language of your choice. Some time ago, I investigated doing this for object code created by a UCSD pascal system. I concluded that it was possible. Many compiled languages retain enough information in their object code to identify stack and global variables, but the symbolic names are usually lost. Comments are almost always lost. Function or procedure names are lost too. A de-compiler (as I call it), would have to generate the symbolic information that the compiler removed. Some symbol names can be guessed, loop indices are pretty easy, just generate "index", or something similar. Globals are almost impossible, though. Now if you happen to have a symbol table, I can just imagine ... (generic-prompt) decc doit.o ... working ... external symbol table intact ... generating internal symbols ... condensing if-goto structures into while loops ... doit.c complete (generic-prompt) vi doit.c Has anyone else ever toyed with or know of efforts to write a decompiler? - Ray (lampman@heurikon.UUCP)
pardo@uw-june.UUCP (David Keppel) (12/23/87)
[Does anybody know about efforts to write a decompiler?] Somebody at UCBerkeley some (more than 3, less than 10) years ago wrote a C decompiler. He managed to get it working well enough to decompile rogue. Might have even been his master's thesis. If anybody is seriously interested, send me mail & I will try bombarding some of friends there and see if anybody can remember his name. There is another problem in decompiling C: macros. If you get fancy you can recognize getchar() and putchar(c), but user-defined macros get lost. ;-D on (Inline Contraction) Pardo