[net.lang.c] Identifier significance CHALLENGE

davidson@sdcsvax.UUCP (12/05/83)

It has been possible for most of us to ignore the C compilers which
used only one case, and less than 7 characters for identifier
significance, simply because they only cause trouble when porting into
such an environment, and few of us have had to do that.  However, now
that Berkeley has removed the identifier length restriction,
non-Berkeley UNIX programmers are faced with the ever more frequent
onerous task of squeezing Berkeley programs into using reduced
identifier significance.  Hence this CHALLENGE:

CHALLENGE:  To write a program (for the public domain) which converts
a program designed for a given identifier significance of N1 characters,
and with C1 cases (either one or two), into an equivalent program
designed for identifier significance N2 and cases C2.  These would be
given as program arguments.

Note that when relaxing restrictions, it may be necessary to truncate
identifiers and regularize case, e.g. with N1 = 6, N2 = 100,
foobarhere, foobarthere -> foobar, foobar.  When tightening
restrictions, it may be necessary to change the identifiers, e.g.,
this_one_here, this_one_there -> this1, this2.  The latter case is more
difficult, and requires avoiding collisions with other identifiers.

I believe it very important that this program be written, and ideally by
someone with C compiler experience, although anyone is welcome to try.
We are forced to clean up someone else's mess here, as Berkeley should
never have released their compiler without providing such a compatibility
filter.

-Greg

guy@rlgvax.UUCP (Guy Harris) (12/07/83)

Rumor has it that System VI C may implement long variable names.

If you don't like the C or UNIX environment, wait a minute, it'll change... :-)

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

mem@sii.UUCP (Mark Mallett) (12/09/83)

b
Once upon a time, in the middle ages, I worked with a compiler on a
TOPS-10 system which supported arbitrarily long names and yet had to
reduce these names to 6 characters apiece for the TOPS-10 linker.  It
made its reductions (as I recall) by making use of the fact that
people tend to punctuate long names; it selected particular characters
from each syllable and strung them together in some predictable way.
Unfortunately I don't remember what its rules were, but if anybody
wants to they could try to look it up (see below).  A symbol such as

	MARKS_MAGIC_NUMBER

might turn into

	MSMCNR

I never saw it get confused; I don't know what it would do if it did.

Second part:  I wonder if anybody has ever heard of the above-mentioned
compiler.  It compiled a subset of PL/I, was written in LISP sometime
around the late 1960s to early 1970s, and was called PL/E (PL/I for
Eastman's Exec).  It was written, I think, at Applied Logic Corp which
used to be in NJ.  I rather liked it; I wish that the people I 
resurrected it for hadn't lost it.

Mark E. Mallett
decvax!sii!mem

jim@ism780.UUCP (12/12/83)

I have posted to net.sources a simple program (shortc.c) which produces a
mapping from arbitrary-length identifiers to identifiers which are
unambiguous in the first N (default 7) characters.  It produces them as
#defines; all it takes is a flexnames version of cpp to compile the original
source files with a non-flexname compiler, with no modification beyond
including the output of shortc into the sources.  If such a cpp is not
available or creatable (not hard given any cpp source), the shortc output can
be turned into a sed script and the sources can be compiled after being
modified.  Such modification does not fill the sources with
identifiers like X12345 or MuWdIdr (for MultiWordIdentifier); rather it
converts

MultiWordIdentifier             MultiWordIdentifier
MultiWordThingy         into    AMultiWordThingy
MultiWordyProgrammer            BMultiWordyProgrammer

Not so tough.

-- Jim Balter (decvax!yale-co!ima!jim) Interactive Systems Corp

--------

ldl@genix.UUCP (12/13/83)

  In light of the fact that we (programmers) cannot depend on having
identifiers of more than 'n' characters, I have taken to writing code
somewhat differently.  I 'beat up' on cpp's features.  I'm not too sure
if this plan will really work, but it does on our V7 environment.

In a header:

  #define foobarthere   snm00		/* routine description */
  #define foobarhere    snm01		/* routine description */

In code:

  foobarthere(...)
  {
      ...
    foobarhere(...);
      ...
  }

  foobarhere(...)
  {
  }

Notice that in the source code, the 'long ids' are used. The macro
processor 'remaps' the names into something that the compiler can
handle, but that the user doesn't have to think about it.

A further use of this (needed in my case) is that there are several
'support' routines (our stuff is divided into libraries) that are
called by the 'main' routines in the library.  Using this technique,
there is no need to be concerned about having routines in one library
uniquely named from all other routines.  The 'real' name is controlled
by the 'remapped' name that is handled by cpp.

At first, I was rather concerned about the 'limits' of C external
tags, etc, but using the above technique, no problems have been 
encountered to date (and over 80000 lines of lex, yacc, and C).

Use cpp! 

P.S.  I have hit the limit of 'too much defining' in one area. I took
      care of this by building a simple interface that uses m4 (yuck! 
      for C).  I still work in 'pure' C, and let a couple of scripts
      and make work out the details.

-- 
Spoken: Larry Landis
USnail: 5201 Sooner Trail  NW
        Albuquerque, NM 87120
MaBell: (505)-898-9666
  UUCP: {ucbvax,gatech,parsec}!unmvax!genix!ldl

eric@whuxle.UUCP (12/14/83)

#R:sdcsvax:-6000:whuxle:23200003:000:659
whuxle!eric    Dec 13 17:22:00 1983

In reference to the suggestion to using

#define massively_long_name short

and then using

massively_long_name();, or
massively_long_name = 3;,

etc....

THIS IS A BITCH TO DEBUG, ESPECIALLY IF SOMEONE DOESN'T READ
THE HEADER FILES.......

i.e., i printout the code above, have the hard copy next to me,
and call "adb" (how arcane!!) to figure out why massively_long();
causes "Memory fault -- core dumped".

To my suprise, massively_long_name() is NEVER called in the program.
Instead, all i see are short(), or worse xx3ef();...... ARGHHH!!!
If someone ever did that to me and I wasted time chasing it i would
KILL!!!

				From the world of adb,
					eric

lee@unmvax.UUCP (12/14/83)

Larry,
when was the last time you used ADB (you don't have SDB or its kin)?
If I was attempting to debug using ADB I would sure find
it inconvenient to have to go look up "smn00" to find
it was really "foobarandbedamned" in my source. Think
I would just try to be imaginative in the standard number
of characters.

				Ready to get my buns torched,
					--Lee

			{ucbvax,gatech,parsec}!unmvax!lee

rpw3@fortune.UUCP (12/14/83)

#R:sdcsvax:-6000:fortune:16200011:000:804
fortune!rpw3    Dec 14 04:01:00 1983

I don't know which PDP-10 compiler was meant, but there was an abbreviation
standard for squeezing 6-character program names into three characters,
as need for file extensions and per-job tempfiles. As I recall, it was
derived from some work done at Bell Labs on place-name abbreviations,
and went like this:

	Take the first letter, the next consonant, and the last
	consonant, duplicating as necessary,

	EXCEPT, if the word is already 3 chars, leave it alone
	(so PIP => PIP, not PPP)

Examples:	LOGIN => LGN
		ALGOL => ALL
		FORTRAN => FRN
		BASIC => BSC
		FREE => FRR

There was something about "Y" as a consonant, but I forget.

Rob Warnock

UUCP:	{sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3
DDD:	(415)595-8444
USPS:	Fortune Systems Corp, 101 Twin Dolphins Drive, Redwood City, CA 94065

lee@haddock.UUCP (12/15/83)

#R:sdcsvax:-6000:haddock:12400001:000:111
haddock!lee    Dec 12 22:01:00 1983

In the recently announced System V.2, flexnames are supported.  The
past is, finally, behind us on this issue.

fair@dual.UUCP (Erik E. Fair) (12/15/83)

For the gentleman who was `beating on cpp' for flexnames:

	god help you if you have to use adb on a core file!

	None of the compiled variables will make any sense w/o
	the header file...

	Erik E. Fair	{ucbvax,amd70,zehntel,unisoft,onyx,its}!dual!fair
			Dual Systems Corporation, Berkeley, California

ado@elsie.UUCP (12/19/83)

In particular, regarding--
	In the recently announced System V.2, flexnames are supported.  The
	past is, finally, behind us on this issue.

Well, maybe.  As I recall, though, the UNIX (Bell Labs trademark) gurus have a
history of abandoning support for machines they no longer feel like supporting--
like the PDP 11/40 when Version 7 was originally released.  This has typically
been done to "enhance portability"; how an operating system that can't run on
systems it used to run on is "more portable" than its predecessor is beyond me.

And of course there are folks using C compilers other than ones supplied by
Berkeley or Bell; given that "flexible names" aren't "required" in "The C
Programming Language," such compilers may lack support for them.

I think the challenges of "historic" dialects of C are still with us.
-- 
UUCP:	decvax!harpo!seismo!rlgvax!cvl!elsie!ado
Phone:	(301) 496-5688

msc@qubix.UUCP (Mark Callow) (12/21/83)

Regarding --
	In a header:

	  #define foobarthere   snm00		/* routine description */
	  #define foobarhere    snm01		/* routine description */

	In code:

	  foobarthere(...)
	  {
	      ...
	    foobarhere(...);
	      ...
	  }

	  foobarhere(...)
	  {
  }
----------

This would make symbolic debugging awfully hard...
Of course I don't know many v7 systems with symbolic debuggers.
-- 
From the Tardis of Mark Callow
msc@qubix.UUCP,  decwrl!qubix!msc@Berkeley.ARPA
...{decvax,ucbvax,ihnp4}!decwrl!qubix!msc, ...{ittvax,amd70}!qubix!msc

ldl@genix.UUCP (02/29/84)

I realize that this is a bit late, and that this discussion has died down
to some extent, but out news source was in the process of upgrading to 4.2
and we were out of contact for a while.  Please forgive this late of a 
response.  This will clarify how it can be done without too much hassle.

First of all, the names (short ones) MUST be chosen with care.  If you 
divide the routines up into libraries, then all short names should indicate
where the name originates.  For example, I have a library named 'runtime'.
In this library, there are both user calls, and support routines.  The names
of all 'user calls' are tagged 'rtgnn' and 'supports' are tagged 'rtlnn', for
'RunTime Global' and 'RunTime Local', respectively. Similarly, any variables
used (common) by these routines are named 'rtgxnn' and 'rtlxnn' for 
'RunTime Global eXternal' and 'RunTime Local eXternal'. (The 'nn' is a number
sequence like '01', '02', etc).  In general, there are less than 20 user calls
and 30 supports, and no more than 10 variables (of each, global and local).

Secondly, I have rarely had to resort to going to adb.  I guess that it comes
from having worked on systems that did not allow highly interactive debugging
(i.e. communications front-ends, etc), but a long session of desk checking
is worth 10 times that amount of time in debugging.  (Typically, the 'debug'
time, running the new code, is less than 10% of the total coding time, for me).
Those few times that I have resorted to using adb, I was able to quickly track
down what was happening and correct the fault.

I guess that I should mention that I have put a preprocessor in front of the
actual 'cc' that does the real remapping, thus the mapped names are easily
accessible mechanically (have not interfaced to adb, thus the objections are
valid from the standpoint of adb or even sdb).  Considering the advantages
of name conflicts between library support routine names (i.e. routines that
are never accessed, nor accessible, outside of the library proper), the funky
names have aided rather than hindered the conflicts that could otherwise
result (like if the wrong routine was linked to the right spot because the
name was the same).  There have been many good suggestions, but apart from
the language being changed (universally, on all Unix environments), there
are disadvantages to all shortening approaches.  The solution (apart from
changing the definition of the language) appears to be deciding which poison
tastes least bad.  :-)

-- 
Spoken: Larry Landis
USnail: 5201 Sooner Trail  NW
        Albuquerque, NM 87120
MaBell: (505)-898-9666
  UUCP: {ucbvax,gatech,parsec}!unmvax!genix!ldl