[comp.std.c] restrictive linkers

scs@adam.pika.mit.edu (Steve Summit) (07/16/89)

Two minor points:

In article <547@cybaswan.UUCP> iiit-sh@cybaswan.UUCP (Steve Hosgood) writes:
>In article <2619@yunexus.UUCP>, davecb@yunexus.UUCP (David Collier-Brown) writes:
>> 0) 6 characters is a lower limit of significance. You're allowed to use more.
>Yeah, but though you may be able to use more, a 'strictly conforming' program
>can't, otherwise it won't port to sites with 6-character linkers.

Presumably the posters and most readers of this group understand
it, but there are apparently many who interpret statements like
the above to mean that external identifiers may not have more
than six characters at all.  I'm always seeing subroutine
packages that have incredibly strained, abbreviated identifier
names, to keep them six characters long.  You're allowed to use
extra characters for readability; the only problem is that if you
say

	int identifier1, identifier2;

you may get

	"identi: multiply defined."

It should be perfectly legal to say

	int identifier, anotheridentifier;

>If Ada becomes important due to the
>military backing, can't 'C' ride its wake (so to speak) and insist on a
>linker with sensible namewidth?

I believe X3J11 is on record as strongly encouraging better
linkers, and indicating that the six-character significance limit
is likely to disappear in future revisions.  (Happily, this would
be a backwards-compatible change, except for programs with
typoes.)  No one (not even the maintainers of the systems that
have them) likes the situation with respect to old-fashioned
linkers.  However, the compromise had to be made to ensure a
standard that could and would be used.

                                            Steve Summit
                                            scs@adam.pika.mit.edu

P.S. I think that ADA places much more complicated demands on a
linker than simply that it have more than six characters' worth
of significance.  (That is, ADA tends to require a new linker
anyway, not just for longer names.)  Every ADA implementation
I've seen (not many) comes with its own linker, at least
partially superseding the operating system's default one, even on
systems such as VMS which already have remarkably powerful
standard linkers and object file formats.  A language that
supports object/package/cluster concepts typically requires
complicated name resolution and binding in the link phase which
many (most) existing linkers simply aren't equipped to do.
(C++ has the same problem.)

ps@celerity.UUCP (Patricia Shanahan) (07/26/89)

In article <12711@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit) writes:
>Two minor points:
>
>In article <547@cybaswan.UUCP> iiit-sh@cybaswan.UUCP (Steve Hosgood) writes:
>>In article <2619@yunexus.UUCP>, davecb@yunexus.UUCP (David Collier-Brown) writes:
>>> 0) 6 characters is a lower limit of significance. You're allowed to use more.
>>Yeah, but though you may be able to use more, a 'strictly conforming' program
>>can't, otherwise it won't port to sites with 6-character linkers.
>
>Presumably the posters and most readers of this group understand
>it, but there are apparently many who interpret statements like
>the above to mean that external identifiers may not have more
>than six characters at all.  I'm always seeing subroutine
>packages that have incredibly strained, abbreviated identifier
>names, to keep them six characters long.  You're allowed to use
>extra characters for readability; the only problem is that if you
>say
>
>	int identifier1, identifier2;
>
>you may get
>
>	"identi: multiply defined."
>
>It should be perfectly legal to say
>
>	int identifier, anotheridentifier;

I strongly dislike systems that accept extra characters in
identifiers and ignore them. This feature creates a difference
between the program as the programmer reads it, and the program as
the compiler reads it. It can in fact move porting problems from
compile time detection to run time errors.

Suppose I have a large and complicated program that uses two
libraries, libA and libB. Suppose also that identifier1 is the name
of a function in libA and identifier2 is the name of a function in
libB. 

A short identifier system that treats ignored characters in
identifiers as an error will flag the use of more than six
characters in the function names during the library make, and will
also flag the use of the long names for external references during
the compile. The problem will be found and easily fixed before the first
successful make.

A short identifier system that simply ignores excess characters will
silently resolve references to both identifier1 and identifier2 to
whichever appears in the first library examined. If you are
lucky and have good tests, it will be found during program test.
Even after the existence of the bug is known because of a crash or
wrong result, it may still be difficult to fix. A programmer
reading the code will see the correct function being called. If the
system is sufficiently large, the programmer who is trying to fix a
failure in a  piece of code using identifier1 may not even be
conciously aware of the existence of identifier2. If they do a grep
for identifier1 it will not find identifier2.

This is especially serious if the significant identifier length is
scope dependent. The same type of problem can happen because the
scope of a function has been changed from static to external and it
has been moved to a library.
	Patricia Shanahan
	uucp : ucsd!celerity!ps
	arpa : ucsd!celerity!ps@nosc
	phone: (619) 271-9940

david@psitech.UUCP (david Fridley) (07/28/89)

Following is free code which anybody may use, it demonstrates how to build
symbols with out having to place and restrictions on their length.
I hope that everbody who
writes a compiler, linker, assembler, etc will look at this. Beleive me, there
don't need to be arbitrary limits, and I preffer products that do not have
arbitrary limits.

david.
DISCLAIMER: If it's important have a backup.  If it ain't broke don't fix it.
Proceed at your own risk.  My oponions are MY own.  Spelling does not count.
My fondest dream is to leave this planet.

----cut here---
/*****************************************************************************
* getsym.c
*
* This module implements a get sym(bol) function which reads the next
* symbol in from the input file descriptor, and returns a pointer to it.
*
* if TEST is defined at compile time, the following two routines are provided
* in order to test, and demonstrate getsym()
*
* put_sym_in_table() add symbols to a static symbol table.
*
* main() is a simple program to read words from the standard input, build
* a symbol table, print out the symbol table, free the symbols created, 
* and then free the symbol table.
*
* DISCLAIMER: This code worked the first time I tried it, so obviously there
* is something wrong with it.  Assume it is defective until proven otherwise.
*
* if BULLET_PROOF is defined at compile time, additional bullet proofing is
* added, that is not required for normal operation.
*
* BASIC_SYMBOL_SIZE can be defined on the command line to override the basic
* symbol size assumed by this module.
*
* HINT: tab 4,9
*
* First Created: 25 July 1989 by david 
* Last Modified: 25 July 1989 by david 
*****************************************************************************/

#include <stdio.h>

extern char *malloc(),*realloc();
				 					 
/*
* these values, BASIC_SYMBOL_SIZE and BASIC_TABLE_SIZE are given small
* values for debugging purposes.
*/
#ifndef BASIC_SYMBOL_SIZE
#define BASIC_SYMBOL_SIZE	4	/* this value defines the first guess*/
										/* as to the size of the symbol.  If that */
										/* guess is wrong, it is guessed that that */
										/* much more space will be required. The */
										/* actual value effects the speed of */
										/* the routine, that's all */
#endif
										
#ifndef BASIC_TABLE_SIZE
#define BASIC_TABLE_SIZE	4	/* this value defines the first guess */
										/* as to the number of table entries. */
										/* if this guess is wrong, it is guessed */
										/* that that many more entries will be */
										/* required */
#endif
/*****************************************************************************
* char *getsym(f)
*
* INPUT:
* f is the file descriptor to read the next symbol from.
*
* OUTPUT:
* a pointer to the next symbol on the input is returned.  This buffer has
* been malloc()ed and should be free()ed when it is nolonger needed. if
* NULL is returned there was an error getting the next string, if (-1) is
* returned there were no more symbols.
*
* First Created: 25 July 1989 by david 
* Last Modified: 25 July 1989 by david 
*****************************************************************************/
char *getsym(f)
FILE *f;
{	char *tmp;
	int actual_symbol_size;
	int symbol_size;
	char c;
	actual_symbol_size=BASIC_SYMBOL_SIZE;	
	symbol_size=0;
	/* get the initial buffer for the symbol */
	if((tmp=(char *)malloc(actual_symbol_size))==NULL)
	{
#ifdef BULLET_PROOF	
		fprintf(stderr,"getsym: malloc() returned NULL\n");
		exit(0);
#endif
		return NULL;
	}
	while((c=getc(f))!=EOF)
	{	if( (c>='A' && c <='Z') || (c>='a' && c <='z') )
		{	if(symbol_size>=actual_symbol_size)
			/* if we need more buffer area */
			{	actual_symbol_size+=BASIC_SYMBOL_SIZE;
				if((tmp=(char *)realloc(tmp,actual_symbol_size))==NULL)
				{
#ifdef BULLET_PROOF	
					fprintf(stderr,"getsym: realloc() returned NULL\n");
					exit(0);
#endif
					return NULL;
				}
			}
			tmp[symbol_size++]=c;
		}else
		{	if(symbol_size)
			/* if there is atleast one character in the symbol */
			{ 	if((tmp=(char *)realloc(tmp,symbol_size+1))==NULL)
				{	
#ifdef BULLET_PROOF	
					fprintf(stderr,"getsym: realloc() returned NULL\n");
					exit(0);
#endif
					return NULL;
				}
				tmp[symbol_size]='\0'; /* terminate the symbol */
				return(tmp); /* return the pointer to the symbol */
			}else
			/* eat up separators until there is atleas one non separator */
			{	continue;
			}
		}
	}
	/* we got an EOF */
	if(symbol_size)
	/* if we read in a symobol, return it */
	{	return(tmp);
	}else
	/* if we did not find a symbol, return (-1) */
	{	return((char *)(-1));
	}
}
#ifdef TEST
/*****************************************************************************
* put_sym_in_table(sym)
*
* put the symbol in the symbol table.  If there is a problem, print a useful
* message and exit();
*
* First Created: 25 July 1989 by david 
* Last Modified: 25 July 1989 by david 
*****************************************************************************/
static char **symbol_table;
static unsigned int table_size=0;
static unsigned int actual_table_size=0;

put_sym_in_table(sym)
char *sym;
{	if(actual_table_size==0)
	{	if((symbol_table=(char **)
			malloc(actual_table_size*sizeof(char *)))==NULL)
		{	fprintf(stderr,"put_sym_in_table: malloc() returned NULL\n");
			exit(0);	
		}
	}
	if(table_size >= actual_table_size)
	{	actual_table_size+=BASIC_TABLE_SIZE;
		if((symbol_table=	(char **)
					realloc(symbol_table,actual_table_size*sizeof(char *)))==NULL)
		{	fprintf(stderr,"put_sym_in_table: realloc() returned NULL\n");
			exit(0);	
		}
	}
	symbol_table[table_size++]=sym;
}
/*****************************************************************************
* main()
*
* This is a simple demonstration program for getsym and put_sym_in_table.
*
* First Created: 25 July 1989 by david 
* Last Modified: 25 July 1989 by david 
*****************************************************************************/
main()
{	char *tmp;
	unsigned int i;
	while((tmp=getsym(stdin))!=((char *)(-1)))
	{	if(tmp==NULL)
		{	fprintf(stderr,"main: getsym returned NULL\n");
			exit(0);
		}
		put_sym_in_table(tmp);
	}
	for(i=0;i<table_size;i++)
	{	fprintf(stdout,"symbol_table[%d]=%s\n",i,symbol_table[i]);
		free(symbol_table[i]);
	}
	free(symbol_table);
}
#endif


-- 
david.
DISCLAIMER: If it's important have a backup.  If it ain't broke don't fix it.
Proceed at your own risk.  My oponions are MY own.  Spelling does not count.
My fondest dream is to leave this planet.