[comp.lang.c] Here is CDECL

rsalz@bbn.com (Rich Salz) (10/06/87)

[  Okay, okay.  This is a "once-in-a-lifetime" opportunity -- a posting
   of something from the archives by the moderator.  Info on the
   comp.sources.unix (nee mod.sources) archives is posted semi-regularly
   in that newsgroup.  Look for an unexpired old posting in your Usenet
   spool directory; I'll be posting a new update with the start of the
   new volume in a day or two.  I also gotta admit, at 15K for the
   whole shell archive, this gem is well worth its weightin gold!
   --r$  ]

Subject: v06i048:  English<->C translator for C declarations (cdecl)
Newsgroups: mod.sources
Approved: rs@mirror.UUCP

Submitted by: Chris Torek <mimsy.umd.edu!gymble!chris>
Mod.sources: Volume 6, Issue 48
Archive-name: cdecl

Cdecl has been on the net for a while, but never in mod.sources, or the
archives.  It was written by Graham Ross, once at tektronix!tekmdp!grahamr.
USG sites should add "-DINDEX=strchr" to the CFLAGS entry in the Makefile.

--------------------cut--------------------
: Run this shell script with "sh" not "csh"
PATH=/bin:/usr/bin:/usr/ucb:/etc:$PATH
export PATH
echo Extracting Makefile
sed 's/^X//' <<'//go.sysin dd *' >Makefile
# Makefile for cdecl
DESTDIR=
CFLAGS=	-O
LIBS=
MAKE=	make
WHERE=	/usr/local/bin

all:	cdecl

cdecl:	cdgram.o cdsupp.o
	${CC} ${CFLAGS} -o $@ cdgram.o cdsupp.o

clean:
	rm -f core *.o cdecl cdlex.c cdgram.c

install: cdecl
	install -s cdecl ${DESTDIR}${WHERE}/cdecl

depend: cdgram.c cdlex.c cdsupp.c
	for i in cdgram.c cdlex.c cdsupp.c; do\
	    cc -M ${INCPATH} $$i | sed -e 's, \./, ,' | \
	    awk '{ if ($$1 != prev) { if (rec != "") print rec; \
		rec = $$0; prev = $$1; } \
		else { if (length(rec $$2) > 78) { print rec; rec = $$0; } \
		else rec = rec " " $$2 } } \
		END { print rec }'; done >makedep
	echo '/^# DO NOT DELETE THIS LINE/+2,$$d' >eddep
	echo '$$r makedep' >> eddep
	echo 'w' >>eddep
	cp Makefile Makefile.bak
	ed - Makefile < eddep
	rm eddep makedep
	echo '# DEPENDENCIES MUST END AT END OF FILE' >> Makefile
	echo '# IF YOU PUT STUFF HERE IT WILL GO AWAY' >> Makefile
	echo '# see make depend above' >> Makefile

# DO NOT DELETE THIS LINE -- make depend uses it

cdgram.o: cdgram.c /usr/include/stdio.h cdlex.c /usr/include/stdio.h
cdgram.o: /usr/include/ctype.h
cdlex.o: cdlex.c /usr/include/stdio.h /usr/include/ctype.h
cdsupp.o: cdsupp.c /usr/include/stdio.h
# DEPENDENCIES MUST END AT END OF FILE
# IF YOU PUT STUFF HERE IT WILL GO AWAY
# see make depend above
//go.sysin dd *
if [ `wc -c < Makefile` != 1290 ]; then
	made=FALSE
	echo error transmitting Makefile --
	echo length should be 1290, not `wc -c < Makefile`
else
	made=TRUE
fi
if [ $made = TRUE ]; then
	chmod 644 Makefile
	echo -n '	'; ls -ld Makefile
fi
echo Extracting cdecl.1
sed 's/^X//' <<'//go.sysin dd *' >cdecl.1
X.TH CDECL 1
X.SH NAME
cdecl \- Compose C declarations
X.SH SYNOPSIS
X.B cdecl
X.SH DESCRIPTION
X.I Cdecl
is a program for encoding and decoding C type-declarations.
It reads standard input for statements in the language described below.
The results are written on standard output.
X.PP
X.I Cdecl's
scope is intentionally small.
It doesn't help you figure out storage classes or initializations.
X.SH "COMMAND LANGUAGE"
There are four statements in the language.
The "declare" statement composes a C type-declaration
from a verbose description.
The "cast" statement composes a C type-cast
as might appear in an expression.
The "explain" statement decodes a C type-declaration, producing a
verbose description.
The "help" statement describes the others.
X.PP
The following grammar describes the language.
In the grammar, words in "<>" are non-terminals,
bare lower-case words are terminals that stand for themselves.
Bare upper-case words are other lexical tokens:
NOTHING means the empty string;
NAME means a C identifier;
NUMBER means a string of decimal digits; and
NL means the new-line character.
X.LP
X.nf
<program>    ::= NOTHING
               | <program> <stat> NL
<stat>       ::= NOTHING
               | declare NAME as <decl>
               | cast NAME into <decl>
               | explain <cdecl>
               | help
<decl>       ::= array of <decl>
               | array NUMBER of <decl>
               | function returning <decl>
               | function ( NAME ) returning <decl>
               | pointer to <decl>
               | <type>
<cdecl>      ::= <cdecl1>
               | * <cdecl>
<cdecl1>     ::= <cdecl1> ( )
               | <cdecl1> [ ]
               | <cdecl1> [ NUMBER ]
               | ( <cdecl> )
               | NAME
<type>       ::= <typename> | <modlist>
               | <modlist> <typename>
               | struct NAME | union NAME | enum NAME
<typename>   ::= int | char | double | float
<modlist>    ::= <modifier> | <modlist> <modifier>
<modifier>   ::= short | long | unsigned
X.fi
X.SH EXAMPLES
To declare an array of pointers to functions like malloc(3), do
X.Ex
declare fptab as array of pointer to function returning pointer to char
X.Ee
The result of this command is
X.Ex
char *(*fptab[])()
X.Ee
When you see this declaration in someone else's code, you
can make sense out of it by doing
X.Ex
explain char *(*fptab[])()
X.Ee
The proper declaration for signal(2) cannot be described in
X.IR cdecl 's
language (it can't be described in C either).
An adequate declaration for most purposes is given by
X.Ex
declare signal as function returning pointer to function returning int
X.Ee
The function declaration that results has two sets of empty parentheses.
The author of such a function might wonder where the parameters go.
X.Ex
declare signal as function (args) returning pointer to function returning int
X.Ee
provides the solution:
X.Ex
int (*signal(args))()
X.Ee
X.SH DIAGNOSTICS
The declare statement tries to point out constructions
that are not supported in C.
Also, certain non-portable constructs are flagged.
X.PP
Syntax errors cause the parser to play dead until a newline is read.
X.SH "SEE ALSO"
Section 8.4 of the C Reference Manual.
X.SH BUGS
The pseudo-English syntax is excessively verbose.
//go.sysin dd *
if [ `wc -c < cdecl.1` != 3231 ]; then
	made=FALSE
	echo error transmitting cdecl.1 --
	echo length should be 3231, not `wc -c < cdecl.1`
else
	made=TRUE
fi
if [ $made = TRUE ]; then
	chmod 444 cdecl.1
	echo -n '	'; ls -ld cdecl.1
fi
echo Extracting cdgram.y
sed 's/^X//' <<'//go.sysin dd *' >cdgram.y
%{
#include <stdio.h>

#define	MB_SHORT	0001
#define	MB_LONG		0002
#define	MB_UNSIGNED	0004
#define MB_INT		0010
#define MB_CHAR		0020
#define MB_FLOAT	0040
#define MB_DOUBLE	0100

int modbits = 0;
int arbdims = 1;
char *savedtype;
char *savedname;
char *ds(), *cat();
char *index(), *malloc();
char prev;
%}

%union {
	char *dynstr;
	struct {
		char *left;
		char *right;
	} halves;
}

%token DECLARE CAST INTO AS HELP EXPLAIN
%token FUNCTION RETURNING POINTER TO ARRAY OF
%token <dynstr> NAME NUMBER STRUCTUNION UNSIGNED LONG SHORT
%token <dynstr> INT CHAR FLOAT DOUBLE
%type <dynstr> mod_list tname type modifier
%type <dynstr> cdecl cdecl1 cdims adims c_type
%type <halves> adecl

%start prog

%%
prog	: /* empty */
		| prog stat
		;

stat	: HELP '\n'
			{
			help();
			}
		| DECLARE NAME AS adecl '\n'
			{
			printf("%s %s%s%s",savedtype,$4.left,$2,$4.right);
#ifdef MKPROGRAM
			if (prev == 'f')
				printf("\n{\n}\n");
			else
				printf(";\n");
#else
			printf("\n");
#endif
			free($4.left);
			free($4.right);
			free($2);
			}
		| CAST NAME INTO adecl '\n'
			{
			if (prev == 'f')
				unsupp("Cast into function");
			else if (prev=='A' || prev=='a')
				unsupp("Cast into array");
			printf("(%s",savedtype);
			if (strlen($4.left)+strlen($4.right))
				printf(" %s%s",$4.left,$4.right);
			printf(")%s\n",$2);
			free($4.left);
			free($4.right);
			free($2);
			}
		| EXPLAIN type cdecl '\n'
			{ printf("declare %s as %s%s\n",savedname,$3,$2); }
		| '\n'
		| error '\n'
			{
			yyerrok;
			}
		;

cdecl	: cdecl1
		| '*' cdecl
			{ $$ = cat($2,ds("pointer to "),NULL); }
		;

cdecl1	: cdecl1 '(' ')'
			{ $$ = cat($1,ds("function returning "),NULL); }
		| cdecl1 cdims
			{ $$ = cat($1,ds("array "),$2); }
		| '(' cdecl ')'
			{ $$ = $2; }
		| NAME
			{
				savename($1);
				$$ = ds("");
			}
		;

cdims	: '[' ']'
			{ $$ = ds("of "); }
		| '[' NUMBER ']'
			{ $$ = cat($2,ds(" of "),NULL); }
		;

adecl	: FUNCTION RETURNING adecl
			{
			if (prev == 'f')
				unsupp("Function returning function");
			else if (prev=='A' || prev=='a')
				unsupp("Function returning array");
			$$.left = $3.left;
			$$.right = cat(ds("()"),$3.right,NULL);
			prev = 'f';
			}
		| FUNCTION '(' NAME ')' RETURNING adecl
			{
			if (prev == 'f')
				unsupp("Function returning function");
			else if (prev=='A' || prev=='a')
				unsupp("Function returning array");
			$$.left = $6.left;
			$$.right = cat(ds("("),$3,ds(")"));
			$$.right = cat($$.right,$6.right,NULL);
			prev = 'f';
			}
		| ARRAY adims OF adecl
			{
			if (prev == 'f')
				unsupp("Array of function");
			else if (prev == 'a')
				unsupp("Inner array of unspecified size");
			if (arbdims)
				prev = 'a';
			else
				prev = 'A';
			$$.left = $4.left;
			$$.right = cat($2,$4.right,NULL);
			}
		| POINTER TO adecl
			{
			if (prev == 'a')
				unsupp("Pointer to array of unspecified dimension");
			if (prev=='a' || prev=='A' || prev=='f') {
				$$.left = cat($3.left,ds("(*"),NULL);
				$$.right = cat(ds(")"),$3.right,NULL);
			} else {
				$$.left = cat($3.left,ds("*"),NULL);
				$$.right = $3.right;
			}
			prev = 'p';
			}
		| type
			{
			savetype($1);
			$$.left = ds("");
			$$.right = ds("");
			prev = 't';
			}
		;

adims	: /* empty */
			{
			arbdims = 1;
			$$ = ds("[]");
			}
		| NUMBER
			{
			arbdims = 0;
			$$ = cat(ds("["),$1,ds("]"));
			}
		;

type	: tinit c_type
			{ mbcheck(); $$ = $2; }
		;

tinit	: /* empty */
			{ modbits = 0; }
		;

c_type	: mod_list
			{ $$ = $1; }
		| tname
			{ $$ = $1; }
		| mod_list tname
			{ $$ = cat($1,ds(" "),$2); }
		| STRUCTUNION NAME
			{ $$ = cat($1,ds(" "),$2); }
		;

tname	: INT
			{ modbits |= MB_INT; $$ = $1; }
		| CHAR
			{ modbits |= MB_CHAR; $$ = $1; }
		| FLOAT
			{ modbits |= MB_FLOAT; $$ = $1; }
		| DOUBLE
			{ modbits |= MB_DOUBLE; $$ = $1; }
		;

mod_list: modifier
			{ $$ = $1; }
		| mod_list modifier
			{ $$ = cat($1,ds(" "),$2); }
		;

modifier: UNSIGNED
			{ modbits |= MB_UNSIGNED; $$ = $1; }
		| LONG
			{ modbits |= MB_LONG; $$ = $1; }
		| SHORT
			{ modbits |= MB_SHORT; $$ = $1; }
		;
%%
#include "cdlex.c"

#define LORS	(MB_LONG|MB_SHORT)
#define UORL	(MB_UNSIGNED|MB_LONG)
#define UORS	(MB_UNSIGNED|MB_SHORT)
#define CORL	(MB_CHAR|MB_LONG)
#define CORS	(MB_CHAR|MB_SHORT)
#define CORU	(MB_CHAR|MB_UNSIGNED)

mbcheck()
{
	if ((modbits&LORS) == LORS)
		unsupp("conflicting 'short' and 'long'");
	if ((modbits&UORL) == UORL)
		unport("unsigned with long");
	if ((modbits&UORS) == UORS)
		unport("unsigned with short");
	if ((modbits&CORL) == CORL)
		unsupp("long char");
	if ((modbits&CORS) == CORS)
		unsupp("short char");
	if ((modbits&CORU) == CORU)
		unport("unsigned char");
}

savetype(s)
char *s;
{
	savedtype = s;
}

savename(s)
char *s;
{
	savedname = s;
}
//go.sysin dd *
if [ `wc -c < cdgram.y` != 4726 ]; then
	made=FALSE
	echo error transmitting cdgram.y --
	echo length should be 4726, not `wc -c < cdgram.y`
else
	made=TRUE
fi
if [ $made = TRUE ]; then
	chmod 644 cdgram.y
	echo -n '	'; ls -ld cdgram.y
fi
echo Extracting cdlex.l
sed 's/^X//' <<'//go.sysin dd *' >cdlex.l
%{
#include <ctype.h>

char *visible();
%}
N	[0-9]
A	[A-Z_a-z]
AN	[0-9A-Z_a-z]
%%
array		return ARRAY;
as			return AS;
cast		return CAST;
declare		return DECLARE;
explain		return EXPLAIN;
function	return FUNCTION;
help		return HELP;
into		return INTO;
of			return OF;
pointer		return POINTER;
returning	return RETURNING;
to			return TO;

char		{ yylval.dynstr = ds(yytext); return CHAR; }
double		{ yylval.dynstr = ds(yytext); return DOUBLE; }
enum		{ yylval.dynstr = ds(yytext); return STRUCTUNION; }
float		{ yylval.dynstr = ds(yytext); return FLOAT; }
int			{ yylval.dynstr = ds(yytext); return INT; }
long		{ yylval.dynstr = ds(yytext); return LONG; }
short		{ yylval.dynstr = ds(yytext); return SHORT; }
struct		{ yylval.dynstr = ds(yytext); return STRUCTUNION; }
union		{ yylval.dynstr = ds(yytext); return STRUCTUNION; }
unsigned	{ yylval.dynstr = ds(yytext); return UNSIGNED; }

{A}{AN}*	{ yylval.dynstr = ds(yytext); return NAME; }
{N}+		{ yylval.dynstr = ds(yytext); return NUMBER; }

[\t ]		;
[*[\]()\n]		return *yytext;
X.			{
				printf("bad character '%s'\n",visible(*yytext));
				return *yytext;
			}
%%
char *
visible(c)
{
	static char buf[5];

	c &= 0377;
	if (isprint(c)) {
		buf[0] = c;
		buf[1] = '\0';
	} else
		sprintf(buf,"\\%02o",c);
	return buf;
}
//go.sysin dd *
if [ `wc -c < cdlex.l` != 1273 ]; then
	made=FALSE
	echo error transmitting cdlex.l --
	echo length should be 1273, not `wc -c < cdlex.l`
else
	made=TRUE
fi
if [ $made = TRUE ]; then
	chmod 644 cdlex.l
	echo -n '	'; ls -ld cdlex.l
fi
echo Extracting cdsupp.c
sed 's/^X//' <<'//go.sysin dd *' >cdsupp.c
#include <stdio.h>
char *malloc();

main()
{
	yyparse();
}

unsupp(s)
char *s;
{
	printf("Warning: Unsupported in C -- %s\n",s);
}

unport(s)
char *s;
{
	printf("Warning: Non-portable construction -- %s\n",s);
}

yyerror(s)
char *s;
{
	printf("%s\n",s);
}

yywrap()
{
	return 1;
}

X/*
 * Support for dynamic strings:
 * cat creates a string from three input strings.
 * The input strings are freed by cat (so they better have been malloced).
 * ds makes a malloced string from one that's not.
 */

char *
cat(s1,s2,s3)
char *s1,*s2,*s3;
{
	register char *newstr;
	register unsigned len = 0;

	if (s1 != NULL) len = strlen(s1) + 1;
	if (s2 != NULL) len += strlen(s2);
	if (s3 != NULL) len += strlen(s3);
	newstr = malloc(len);
	if (s1 != NULL) {
		strcpy(newstr,s1);
		free(s1);
	}
	if (s2 != NULL) {
		strcat(newstr,s2);
		free(s2);
	}
	if (s3 != NULL) {
		strcat(newstr,s3);
		free(s3);
	}
	return newstr;
}

char *
ds(s)
char *s;
{
	register char *p;

	p = malloc((unsigned)(strlen(s)+1));
	strcpy(p,s);
	return p;
}

static char *helptext[] = {
	"[] means optional; {} means 1 or more; <> means defined elsewhere\n",
	"command:\n",
	"  declare <name> as <english>\n",
	"  cast <name> into <english>\n",
	"  explain <gibberish>\n",
	"english:\n",
	"  function [( <name> )] returning <english>\n",
	"  array [<number>] of <english>\n",
	"  pointer to <english>\n",
	"  <type>\n",
	"type:\n",
	"  [{<modifier>}] <C-type>\n",
	"  {<modifier>} [<C-type>]\n",
	"  <sue> <name>\n",
	"name is a C identifier\n",
	"gibberish is a C declaration\n",
	"C-type is int, char, double or float\n",
	"modifier is short, long or unsigned\n",
	"sue is struct, union or enum\n",
	NULL
};

help()
{
	register char **p;

	for (p=helptext; *p!=NULL; p++)
			printf("\t%s",*p);
}
//go.sysin dd *
if [ `wc -c < cdsupp.c` != 1759 ]; then
	made=FALSE
	echo error transmitting cdsupp.c --
	echo length should be 1759, not `wc -c < cdsupp.c`
else
	made=TRUE
fi
if [ $made = TRUE ]; then
	chmod 644 cdsupp.c
	echo -n '	'; ls -ld cdsupp.c
fi
-- 
For comp.sources.unix stuff, mail to sources@uunet.uu.net.
And if C is the assembly language of the 80's, why is ACP written in MIX?