[comp.lang.c] self-replicating program

djones@megatest.UUCP (Dave Jones) (12/18/87)

in article <1070@kulcs.UUCP>, luc@kulcs.UUCP (Luc Van Braekel) says:
>
> In article <1400@tulum.swatsun.UUCP>, hirai@swatsun (Eiji "A.G." Hirai)
 writes:
>>     In our recent ACM programming contest (regionals), one of the
>> problems was to write a self-replicating program.  That is, we had to
>> write a program whose output was itself, the source code.  No alterations
>> of the original code during execution was allowed (I think).
>>     Does anyone have any code for this problem?  We have one but
>> it looks inelegant.  I've also see bery bery short Prolog code for this.
>> Help, we are looking for good codes to study!  And yes, the contest is
>> over (we ain't cheating).
>
> Here is a self-replicating Pascal program I wrote a few years ago.
> The program looks dirty but it works !
>
> program self (output);
> var i,j: integer;
>     a: array[1..8] of packed array[1..59] of char; begin
>   a[1] := 'program self (output);                                     ';
>   a[2] := 'var i,j: integer;                                          ';
>   a[3] := '    a: array[1..8] of packed array[1..59] of char; begin   ';
>   a[4] := 'for i := 1 to 3 do writeln(a[i]);                          ';
>   a[5] := 'for i := 1 to 8 do begin write(''  a['',i:0,''] := '',chr(39));';
>   a[6] := 'for j := 1 to 59 do begin write(a[i][j]);if a[i][j]=chr(39)';
>   a[7] := 'then write(a[i][j]) end; writeln(chr(39),'';'') end;         ';
>   a[8] := 'for i := 4 to 8 do writeln(a[i]) end.                      ';
> for i := 1 to 3 do writeln(a[i]);
> for i := 1 to 8 do begin write('  a[',i:0,'] := ',chr(39));
> for j := 1 to 59 do begin write(a[i][j]);if a[i][j]=chr(39)
> then write(a[i][j]) end; writeln(chr(39),';') end;
> for i := 4 to 8 do writeln(a[i]) end.
>
...


It's real easy to do in C.  It's a bit more of a challenge to make it
character-set independent.  In the program you give, for example, the
chr(39) is the ascii code for a single quote mark.  The program will
not work on an ebcdic machine.

Here's a character-set independant one I did a couple of years ago.
I wrote a program to write it.

It is fully commented, and replicates its comments.
A shell-archive, including the program which builds the self-replicating
program follows, for all you unix owners.  It is quite instructive.

/* This program prints its source. */

main(argc, argv)
  char** argv;
{
  char * dna =

"/* This program prints its source. */\n\nmain(argc, argv)\n\
  char** argv;\n{\n  char * dna =\n\nZ;\n\n\n  express(dna)\
;\n  exit(0);\n}\n\n\n/* Express the string, substituting a\
 quotation of the string \n** for the character 'Z'.  Break\
s the literal into lines of no\n** more than 60 chars.\n*/\n\
express(str)\n  char* str;\n{\n  char* ptr = str;\n  char c\
h;\n  int is_quoted = 0;\n\n  while(ch = *ptr++)\n    {\n\n\
      if(ch == 'Z' && !is_quoted)\n\t{\n\t  int count = 1;\n\
\t  char* ptr = str;\n\t  char ch;\n\t  putchar('\"');\n\t \
 while(ch = *ptr++)\n\t    {\n\t      switch(ch)\n\t      {\
\n\t\tcase '\\n': printf(\"\\\\n\");  count +=2; break;\n\t\
\tcase '\\t': printf(\"\\\\t\");  count +=2; break;\n\t\tca\
se '\\\\': printf(\"\\\\\\\\\"); count +=2; break;\n\t\tcas\
e '\"':  printf(\"\\\\\\\"\"); count +=2; break;\n\t\tdefau\
lt:   putchar(ch);    count +=1; break;\n\t      }\n\t     \
 if(count >= 59)\n\t\t{ printf(\"\\\\\\n\");\n\t\t  count =\
 0;\n\t\t}\n\t    }\n\t  putchar('\"');\n\t}\n\n      else \
putchar(ch);\n      is_quoted = ( ch == '\\'');\n    }\n}\n\
";


  express(dna);
  exit(0);
}


/* Express the string, substituting a quotation of the string
** for the character 'Z'.  Breaks the literal into lines of no
** more than 60 chars.
*/
express(str)
  char* str;
{
  char* ptr = str;
  char ch;
  int is_quoted = 0;

  while(ch = *ptr++)
    {

      if(ch == 'Z' && !is_quoted)
    {
      int count = 1;
      char* ptr = str;
      char ch;
      putchar('"');
      while(ch = *ptr++)
        {
          switch(ch)
          {
        case '\n': printf("\\n");  count +=2; break;
        case '\t': printf("\\t");  count +=2; break;
        case '\\': printf("\\\\"); count +=2; break;
        case '"':  printf("\\\""); count +=2; break;
        default:   putchar(ch);    count +=1; break;
          }
          if(count >= 59)
        { printf("\\\n");
          count = 0;
        }
        }
      putchar('"');
    }

      else putchar(ch);
      is_quoted = ( ch == '\'');
    }
}



# CUT HERE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#
# This is a shell-achive.
#
#                 UNPACKING INSTRUCTIONS
#
# 1. Remove everything above the "CUT HERE" line.
# 2. Move the resulting text (this archive) to an empty directory.
# 3. Change (cd) to that directory.
# 4. Type "/bin/sh" followed by the archive name.  Example:
#
#        /bin/sh archive
#
#    The archived files are:
#
#     selfrep.c
#     Makefile
#     buffered_file.c
#     express.c
#     conprog.c
#     soup.Z
#
#******************************************************************
if /bin/test -f selfrep.c
then
    echo File \"selfrep.c\" already exists. Not overwritten.
else
#
#
echo x - selfrep.c
sed 's/^XX//' > selfrep.c <<'@//E*O*F selfrep.c//'
XX/* This program prints its source. */
XX
XXmain(argc, argv)
XX  char** argv;
XX{
XX  char * dna =
XX
XX"/* This program prints its source. */\n\nmain(argc, argv)\n\
XX  char** argv;\n{\n  char * dna =\n\nZ;\n\n\n  express(dna)\
XX;\n  exit(0);\n}\n\n\n/* Express the string, substituting a\
XX quotation of the string \n** for the character 'Z'.  Break\
XXs the literal into lines of no\n** more than 60 chars.\n*/\n\
XXexpress(str)\n  char* str;\n{\n  char* ptr = str;\n  char c\
XXh;\n  int is_quoted = 0;\n\n  while(ch = *ptr++)\n    {\n\n\
XX      if(ch == 'Z' && !is_quoted)\n\t{\n\t  int count = 1;\n\
XX\t  char* ptr = str;\n\t  char ch;\n\t  putchar('\"');\n\t \
XX while(ch = *ptr++)\n\t    {\n\t      switch(ch)\n\t      {\
XX\n\t        case '\\n': printf(\"\\\\n\"); count +=2; break\
XX;\n\t\tcase '\\t': printf(\"\\\\t\"); count +=2; break;\n\t\
XX        case '\\\\': printf(\"\\\\\\\\\"); count +=2; break\
XX;\n\t        case '\"': printf(\"\\\\\\\"\"); count +=2; br\
XXeak;\n\t        default: putchar(ch); count += 1; break;\n\t\
XX      }\n\t      if(count >= 59)\n\t\t{ printf(\"\\\\\\n\")\
XX;\n\t\t  count = 0;\n\t\t}\n\t    }\n\t  putchar('\"');\n\t\
XX}\n\n      else putchar(ch);\n      is_quoted = ( ch == '\\\
XX'');\n    }\n}\n";
XX
XX
XX  express(dna);
XX  exit(0);
XX}
XX
XX
XX/* Express the string, substituting a quotation of the string
XX** for the character 'Z'.  Breaks the literal into lines of no
XX** more than 60 chars.
XX*/
XXexpress(str)
XX  char* str;
XX{
XX  char* ptr = str;
XX  char ch;
XX  int is_quoted = 0;
XX
XX  while(ch = *ptr++)
XX    {
XX
XX      if(ch == 'Z' && !is_quoted)
XX    {
XX      int count = 1;
XX      char* ptr = str;
XX      char ch;
XX      putchar('"');
XX      while(ch = *ptr++)
XX        {
XX          switch(ch)
XX          {
XX            case '\n': printf("\\n"); count +=2; break;
XX        case '\t': printf("\\t"); count +=2; break;
XX            case '\\': printf("\\\\"); count +=2; break;
XX            case '"': printf("\\\""); count +=2; break;
XX            default: putchar(ch); count += 1; break;
XX          }
XX          if(count >= 59)
XX        { printf("\\\n");
XX          count = 0;
XX        }
XX        }
XX      putchar('"');
XX    }
XX
XX      else putchar(ch);
XX      is_quoted = ( ch == '\'');
XX    }
XX}
@//E*O*F selfrep.c//
fi
if /bin/test -f Makefile
then
    echo File \"Makefile\" already exists. Not overwritten.
else
#
#
echo x - Makefile
sed 's/^XX//' > Makefile <<'@//E*O*F Makefile//'
XX
XXproof:  selfrep
XX    selfrep > temp.c; diff temp.c selfrep.c; rm -f temp.c
XX
XXselfrep: selfrep.c
XX    cc selfrep.c -o selfrep
XX
XXselfrep.c: soup.Z conprog express.c
XX    /lib/cpp -P -C soup.Z > tempfile; conprog tempfile > selfrep.c; \
XX    rm -f tempfile
XX
XXconprog: conprog.o express.o buffered_file.o
XX    cc  conprog.o express.o buffered_file.o -o conprog
XX
XXclean:
XX    -rm -f selfrep selfrep.c conprog *.o
@//E*O*F Makefile//
fi
if /bin/test -f buffered_file.c
then
    echo File \"buffered_file.c\" already exists. Not overwritten.
else
#
#
echo x - buffered_file.c
sed 's/^XX//' > buffered_file.c <<'@//E*O*F buffered_file.c//'
XX#include <sys/param.h>
XX#include <sys/file.h>
XX#include <sys/dir.h>
XX#include <sys/stat.h>
XX#include <errno.h>
XX
XX/*************************************************************************
XX** This procedure buffers up a named file, and returns a pointer to the
XX** buffer.  If something goes wrong, it returns NULL, and errno will have
XX** been set.
XX*/
XX
XXchar*
XXbuffered_file(file_name)
XX     char* file_name;
XX
XX{
XX
XX  char* file_buffer;
XX  struct stat stat_buf;
XX  int fd = open(file_name, O_RDONLY, 0);
XX
XX  if (fd < 0 || fstat( fd, &stat_buf) == -1)
XX    return 0;
XX
XX  file_buffer = (char*)malloc(stat_buf.st_size + 1);
XX
XX  if(file_buffer == 0)
XX    return 0;
XX
XX  if( read( fd, file_buffer, stat_buf.st_size ) != stat_buf.st_size )
XX    {
XX      int error = errno;
XX      close(fd);
XX      free(file_buffer);
XX      errno = error;
XX      return 0;
XX    }
XX
XX  file_buffer[stat_buf.st_size] = '\0';
XX
XX  close(fd);
XX
XX  return file_buffer;
XX
XX} /* end.. buffered_file */
XX/****************************************************************************/
XX
XX
XX
@//E*O*F buffered_file.c//
fi
if /bin/test -f express.c
then
    echo File \"express.c\" already exists. Not overwritten.
else
#
#
echo x - express.c
sed 's/^XX//' > express.c <<'@//E*O*F express.c//'
XX/* Express the string, substituting a quotation of the string
XX** for the character 'Z'.  Breaks the literal into lines of no
XX** more than 60 chars.
XX*/
XXexpress(str)
XX  char* str;
XX{
XX  char* ptr = str;
XX  char ch;
XX  int is_quoted = 0;
XX
XX  while(ch = *ptr++)
XX    {
XX
XX      if(ch == 'Z' && !is_quoted)
XX    {
XX      int count = 1;
XX      char* ptr = str;
XX      char ch;
XX      putchar('"');
XX      while(ch = *ptr++)
XX        {
XX          switch(ch)
XX          {
XX            case '\n': printf("\\n"); count +=2; break;
XX        case '\t': printf("\\t"); count +=2; break;
XX            case '\\': printf("\\\\"); count +=2; break;
XX            case '"': printf("\\\""); count +=2; break;
XX            default: putchar(ch); count += 1; break;
XX          }
XX          if(count >= 59)
XX        { printf("\\\n");
XX          count = 0;
XX        }
XX        }
XX      putchar('"');
XX    }
XX
XX      else putchar(ch);
XX      is_quoted = ( ch == '\'');
XX    }
XX}
@//E*O*F express.c//
fi
if /bin/test -f conprog.c
then
    echo File \"conprog.c\" already exists. Not overwritten.
else
#
#
echo x - conprog.c
sed 's/^XX//' > conprog.c <<'@//E*O*F conprog.c//'
XX
XXmain(argc, argv)
XX  char** argv;
XX{
XX  express(buffered_file(argv[1]));
XX  exit(0);
XX}
@//E*O*F conprog.c//
fi
if /bin/test -f soup.Z
then
    echo File \"soup.Z\" already exists. Not overwritten.
else
#
#
echo x - soup.Z
sed 's/^XX//' > soup.Z <<'@//E*O*F soup.Z//'
XX/* This program prints its source. */
XX
XXmain(argc, argv)
XX  char** argv;
XX{
XX  char * dna =
XX
XXZ;
XX
XX
XX  express(dna);
XX  exit(0);
XX}
XX
XX#include "express.c"
@//E*O*F soup.Z//
fi
echo finished
#
#  END OF ARCHIVE

ADLER1%BRANDEIS.BITNET@CUNYVM.CUNY.EDU (12/19/87)

I enjoyed the recent posting of programs written in high level languages
whiich replicate their own source code. But upon reflection, it occurred
to me that what I would really love to see would be a program which
replicates the source code of other programs. How often has one obtained
a copy of executable code with no source and no documentation ? It would
be great to have a program which could throw together source code which
could plausibly have given rise to the given executable, preferably in
a choice of higher level languages.

Has anyone ever encountered such a facility ?

ADLER1@BRANDEIS.BITNET

kurt@hi.UUCP (Kurt Zeilenga) (12/20/87)

In article <10921@brl-adm.ARPA> ADLER1%BRANDEIS.BITNET@CUNYVM.CUNY.EDU writes:
>I enjoyed the recent posting of programs written in high level languages
>whiich replicate their own source code. But upon reflection, it occurred
>to me that what I would really love to see would be a program which
>replicates the source code of other programs. How often has one obtained
>a copy of executable code with no source and no documentation ? It would
>be great to have a program which could throw together source code which
>could plausibly have given rise to the given executable, preferably in
>a choice of higher level languages.
>
>Has anyone ever encountered such a facility ?
>
>ADLER1@BRANDEIS.BITNET

A lot of systems actually do provide such a program.  It's called a
disassembler.

-- 
	Kurt (zeilenga@hc.dspo.gov)

lampman@heurikon.UUCP (Ray Lampman) (12/23/87)

> In article <10921@brl-adm.ARPA> ADLER1%BRANDEIS.BITNET@CUNYVM.CUNY.EDU writes:
> It occurred to me that what I would really love to see would be a program
> which replicates the source code of other programs. It would be great to have
> a program which could throw together source code which could plausibly have
> given rise to the given executable, preferably in a choice of higher level
> languages. Has anyone ever encountered such a facility?
>
> ADLER1@BRANDEIS.BITNET

It sounds like you're talking about translating object code into the source
language of your choice. Some time ago, I investigated doing this for object
code created by a UCSD pascal system. I concluded that it was possible.

Many compiled languages retain enough information in their object code to
identify stack and global variables, but the symbolic names are usually lost.
Comments are almost always lost. Function or procedure names are lost too.
A de-compiler (as I call it), would have to generate the symbolic information
that the compiler removed. Some symbol names can be guessed, loop indices
are pretty easy, just generate "index", or something similar. Globals are
almost impossible, though. Now if you happen to have a symbol table, I can
just imagine ...

(generic-prompt) decc doit.o
... working
... external symbol table intact
... generating internal symbols
... condensing if-goto structures into while loops
... doit.c complete
(generic-prompt) vi doit.c

Has anyone else ever toyed with or know of efforts to write a decompiler?

                                        - Ray (lampman@heurikon.UUCP)

pardo@uw-june.UUCP (David Keppel) (12/23/87)

[Does anybody know about efforts to write a decompiler?]

Somebody at UCBerkeley some (more than 3, less than 10) years ago wrote
a C decompiler.  He managed to get it working well enough to decompile
rogue.  Might have even been his master's thesis.  If anybody is seriously
interested, send me mail & I will try bombarding some of friends there and
see if anybody can remember his name.

There is another problem in decompiling C: macros.  If you get fancy you
can recognize getchar() and putchar(c), but user-defined macros get lost.

	;-D on   (Inline Contraction)  Pardo