[net.lang.c] self-printing C programs

lew (11/03/82)

Steve Wagar's self-printing program:

main(){char*a="main(){char*a=%c%s%c;printf(a,34,a,34);}";printf(a,34,a,34);}

is far from working. Here is my best effort at rescuing it:

main(){char*a="main(){char*a= %s%c%s %s %s%c%s%c ;a[14]=0;a[33]=0;printf(a+15,a,34,a,a+15,a+34,34,a+34,10);}";a[14]=0;a[33]=0;printf(a+15,a,34,a,a+15,a+34,34,a+34,10);}

(That's a 168 character line, I hope it goes through the net OK.)
Can anybody do better? I think the self-printing problem is very deep.
I have seen in print the statement that a self-printing program can
be written in any general programming language, but I don't think
it's true. What if C didn't allow decimal specification of characters?
I think you could easily have a language in which you could write
a program to produce any given output, but not a self-printing program.

I think the challenge to write a self-printing program in a canonical
C style is non-trivial. I propose a contest for the shortest self-printing C
program which is unaltered by cb.

Lew Mammel, Jr. ihuxr!lew

mcbryan (11/04/82)

As I understand it C has no I/O.   Thus writing a self-reproducing C
program is meaningless.   Given a particular library of C I/O routines
one could of course do it (e.g. with STDIO ).   However someones else's
IO library might allow shorter self-replications.   For example, the
OMCBIO library has a routine barf() which prints the source file
containing a reference to it.   Thus the program:

#include <omcbio.h>

main()
{
	barf();
}

is self-reproducing.  


NOTE:
The above program actually compiles and runs correctly on my system (4.1 bsd).
It must be run in the directory where it is compiled.

The file  omcbio.h  contains the single line:

#define  barf()   execlp("/bin/cat","cat",__FILE__,0)

This has the nice feature of self-anihalation (overlay) immediatly after the
self-replication.
An obvious extension using fork() provides a tree of self-replicating
programs.

swatt (11/04/82)

Regarding ihuxr!lew's challenge to write a self-printing program
in canonical C style:


:::::::::::::::
introspect.c:
:::::::::::::::
	char *lines[] ={
		"char *lines[] ={",
		"	0",
		"};",
		"",
		"#define BSLASH	'\\'",
		"#define NL	'\\n'",
		"#define DQUOTE	'\"'",
		"#define HT	'\\t'",
		"#define SQUOTE	'''",
		"#define EOS	'\\0'",
		"main()",
		"{",
		"	register char **lp;",
		"",
		"	puts (lines[0]);",
		"	for (lp = lines; *lp; )",
		"		putq (*lp++);",
		"	puts (lines[1]);",
		"	puts (lines[2]);",
		"	for (lp = &lines[3]; *lp; )",
		"		puts (*lp++);",
		"}",
		"",
		"putq (s)",
		"char	*s;",
		"{",
		"	putchar (HT);",
		"	putchar (DQUOTE);",
		"	for (; *s != EOS; s++) {",
		"		if (*s == BSLASH || *s == DQUOTE)",
		"			putchar (BSLASH);",
		"		putchar (*s);",
		"	}",
		"	putchar (DQUOTE);",
		"	putchar (',');",
		"	putchar (NL);",
		"}",
		"",
		"puts (s)",
		"char	*s;",
		"{",
		"	for (; *s != EOS; s++) {",
		"		if ((*s == BSLASH || *s == SQUOTE)",
		"		    && (s[-1] == SQUOTE && s[1] == SQUOTE))",
		"			putchar (BSLASH);",
		"		putchar (*s);",
		"	}",
		"	putchar (NL);",
		"}",
		0
	};

	#define BSLASH	'\\'
	#define NL	'\n'
	#define DQUOTE	'"'
	#define HT	'\t'
	#define SQUOTE	'\''
	#define EOS	'\0'
	main()
	{
		register char **lp;

		puts (lines[0]);
		for (lp = lines; *lp; )
			putq (*lp++);
		puts (lines[1]);
		puts (lines[2]);
		for (lp = &lines[3]; *lp; )
			puts (*lp++);
	}

	putq (s)
	char	*s;
	{
		putchar (HT);
		putchar (DQUOTE);
		for (; *s != EOS; s++) {
			if (*s == BSLASH || *s == DQUOTE)
				putchar (BSLASH);
			putchar (*s);
		}
		putchar (DQUOTE);
		putchar (',');
		putchar (NL);
	}

	puts (s)
	char	*s;
	{
		for (; *s != EOS; s++) {
			if ((*s == BSLASH || *s == SQUOTE)
			    && (s[-1] == SQUOTE && s[1] == SQUOTE))
				putchar (BSLASH);
			putchar (*s);
		}
		putchar (NL);
	}
:::::::::::::::

This one uses nothing but "putchar" to do output -- no "printf"
nonsense.  It also uses approved C definitions for character
constants.  The only parts of the "lines" table that require any
special editing are in the first 9 lines; all the rest can be produced
by running "sed" commands on the program proper.

When compiled "a.out | diff - introspect.c" produces silence, as does
"cb <introspect.c | diff - introspect.c".

It uses no special knowledge of ASCII, so it should work on any machine
which supports C, and has an equivalent of "putchar".  It is 98 lines,
using a reasonable C source formatting convention (please no flames on
that!).

The interesting thing about this approach is for every line of code you
have, there will be one line of data, plus 6 (three for the table
pro/epi-logue, and three more for the string versions of the same).

Given typical C formatting conventions, you have to be able to produce
the special characters '\n' and '\t'.  Given C string conventions, you
also have to be able to produce '"', '\\', and '\''.  Absent simply
using integer constants, I can't think of a way to do this without a
routine to reconstruct a string as the compiler would have seen it.
The minimum size gambit therefore boils down to how compact a function
you can write to reconstruct C strings for these 6 characters.

	Not ashamed to admit this took me 4 hours,

		- Alan S. Watt

stevenm@sri-unix (11/05/82)

My program (previously posted) is 99 chars long. I also have
a program which uses no numeric constants, and is well-formatted,
but I can't find it at the moment. Unlike another author, I
don't think this problem is very deep, just an exercise in
knowing your language's subtleties.

S. McGeady
Tektronix, Inc.