[comp.lang.misc] Request For Comment About Handling Of Globals

rh@smds.UUCP (Richard Harter) (11/15/90)

This is really a request for comment (and a digression from your
regularly scheduled pointer wars.)

Background:

I am working on a language called Lakota.  This is an interpreted
language with procedures and functions.  It is feasible and natural
to write moderately large programs in this language.  In so far as
is feasible the features of the language are very simple; i.e the
idea is that you don't have to be a language guru to use it.

The Problem:

The issue at hand is globals.  In C there are three levels of scope --
program global, source file global, and block.  Fortran also has
three, blank common, labelled common, and subroutine/function. In
many languages with block structure inner blocks inherit variables
from outer blocks.  And so on.  The common (you should excuse the
expression) thread is that one wants to share access to data across
procedures.  In the course of doing this one wants to avoid nasty
things like name space pollution and ensure nice things like data
hiding and restricted access.

The Request:

What I am looking for is ideas.  What are some of the approaches that
can be used, and what are the pros and cons of these approaches.
Actually what I am fishing for is an approach that combines elegance
and simplicity with the constraint that it should not be confusing
or daunting to a naive user.  It seems to me that this is a worthy
topic for this group.
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

chl@cs.man.ac.uk (Charles Lindsey) (11/16/90)

In <242@smds.UUCP> rh@smds.UUCP (Richard Harter) writes:


>The issue at hand is globals.  In C there are three levels of scope --
>program global, source file global, and block.  Fortran also has
>three, blank common, labelled common, and subroutine/function. In
>many languages with block structure inner blocks inherit variables
>from outer blocks.  And so on.  The common (you should excuse the
>expression) thread is that one wants to share access to data across
>procedures.  In the course of doing this one wants to avoid nasty
>things like name space pollution and ensure nice things like data
>hiding and restricted access.

What you want is variables whose "extent" (i.e. lifetime) is forever, but
whose "scope" is restricted to the bodies of the procedures which need to
share them.

What you want, therefore, is a modules facility, as in Modula-2 or Ada (where
they are called packages) - not that those particular languages have
necessarily made a perfect job of modules.

My feeling is that, with a decent modules system, you can do away with
classical block structure altogether.

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/17/90)

On 15 Nov 90 08:43:27 GMT, rh@smds.UUCP (Richard Harter) said:

	[ ... on the issue of scoping rules ... ]

rh> What I am looking for is ideas.  What are some of the approaches that
rh> can be used, and what are the pros and cons of these approaches.
rh> Actually what I am fishing for is an approach that combines elegance
rh> and simplicity with the constraint that it should not be confusing
rh> or daunting to a naive user.  It seems to me that this is a worthy
rh> topic for this group.

Oh yes. And I have decided to come out of the closet with a profound and
utterly revolutionary secret that I have been holding in my conscience
for so many years. Alas, no more. I have to pull this weight off my
chest (I am also pulling your leg a bit here :->).

Traditional Algol inspired scope rules are completely wrong. Most of
the evils and difficulties in modularization and reuse are because of
this catastrophic mistake. This is that lower level modules can be made
to depend on the details of higher level ones (e.g. global variables)
which is crazy.

In other words, the mistake is that currently scopes nest in the wrong
direction; the right idea is that an higher level entity should be able
to see the names in a lower level one, not viceversa. In other words,
the right model is not contours and visibility from inside to outside,
but a tree and visibility from top to bottom.

It must also be possible for an higher level module to manipulate a
lower level one's name table. For example argument passing should be
defined in these terms; but also this must include the ability to append
naming subtrees under a lower level entity. In other words a module
should be able to manipulate instances of lower level modules both as to
the values that names assume in that module and to the shape of the
naming tree beneath it. Modules should be able to be parametric with
respect to non only the values of their names, but also those beneath
them.

Also, it must be possible to create several instances of a module, and
bind names and subtrees differently in each instantiation (and this
subsumes generators, closures, generics, overloading, polymorphism, ...).

There is one possible objection to top bottom (instead of inside
outside) scoping and it is that a module can then access unitialized
variables of lower lever modules. Well, if it correct, it will not -- it
is an error to do so.

	Please note that top bottom is *not* exactly the reverse of
	inside outside! A top module can have many bottom modules at the
	same level of nesting, but an inside module can have only one
	outside modules at the same level of nesting (multiple
	inheritance is a poor attempt to get around this).

I do believe that this view of scoping is as "powerful" as the other one
and is the most elegant and flexible one -- and its has obvious and
large modularity advantages. In particular it subsumes inheritance
etc...

Similar research? That I know of, only BETA, which has a notion of
'pattern' with very similar properties, even if I don't think that the
notion that their scopes nest top to bottom is explicit.

Disclaimer: the above is an atrocious presentation of some very unusual
idea -- plase make allowance for the extremely elliptical style of
presentation...
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

peter@ficc.ferranti.com (Peter da Silva) (11/17/90)

I have been facing a similar problem in a series of extension packages
for TCL. The technique used in modula, where every "global" object actually
has module scope and must be explicitly imported into another module seems
attractive, but I've not had occasion to use Modula much to see how well
it handles in practice (basically, different Modula compilers all seem to
have unique and contradictory runtime libraries... so portable modula code
becomes pretty damn hard to write).

I've been thinging of doing something like this (in TCL):

	module modulename {
		import othermodule symbol...
		...
		export symbol...
	}

Because of compatibility considerations, all existing symbols are assumed
to be in a root module "tcl" which is imported implicitly. But this brings
up the question of what this means:

	module modulename {
		module newmodule {
			...
		}
	}

My first reaction is that in this submodule all symbols in the outer module
are defined. This means that the root module is just another example of a
module. The downside is that symbol table lookups could become quite hairy,
and because this is an interpreted language (though I've got ideas for
compiled TCL) this would negatively impact runtime efficiency.

Input?
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com 

forsyth@minster.york.ac.uk (11/17/90)

You might find the following paper interesting.  I don't know
whether it has been published elsewhere.  It supports
the view that modules and packages are redundant: use praiseworthy
(first-class) procedures instead.

%T In Praise of Procedures
%A I. F. Currie
%I Royal Signals and Radar Establishment
%C Malvern
%M 3499
%D 1982
%K RSRE,flex

rh@smds.UUCP (Richard Harter) (11/23/90)

In article <Y+=69_G@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter da Silva) writes:
> I have been facing a similar problem in a series of extension packages
> for TCL...  I've been thinging of doing something like this (in TCL):

> 	module modulename {
> 		import othermodule symbol...
> 		...
> 		export symbol...
> 	}

So far, so good.  However I don't know how you are using the word "module".
Module is one of those words which is used in a number of different senses
in different contexts.  I like "procedure" and "subroutine" because there
is no doubt about what is meant.

In any case the principle here seems to be that names within a procedure
are strictly local unless explicitly imported or exported.  Upon reflection
I think that this is a good idea in a typeless language.  If one doesn't
do this there is a horrible vagueness about what a procedure is doing.

> Because of compatibility considerations, all existing symbols are assumed
> to be in a root module "tcl" which is imported implicitly. 

Ouch.  Does this mean that there can only be one instance of a symbol?
Or does it mean that there can only be one external instance?

> But this brings up the question of what this means:

> 	module modulename {
> 		module newmodule {
> 			...
> 		}
> 	}

> My first reaction is that in this submodule all symbols in the outer module
> are defined. This means that the root module is just another example of a
> module. The downside is that symbol table lookups could become quite hairy,
> and because this is an interpreted language (though I've got ideas for
> compiled TCL) this would negatively impact runtime efficiency.
> Input?

If I understand this correctly you are working with a single namespace.
Each module can add symbols to the name space dynamically; the names
vanish when the module exits.  This is sort of like the context idea.
Does your export verb mean that symbols are passed down or up?

Symbol table lookup may not be all that bad.  Here is what I do in Lakota.
Symbols are mapped into integers which are indices in a lookup table.
When a raw symbol is processed it is hashed into an index into an array
of list pointers which, in turn, point into the lookup table.  The lookup
table structure is something like this:

	struct symtab {
		struct symtab *hash_link;
		int            hash_index;
		char          *symtext;
		int            length;
		int            refcount;
		};

Procedures are memory resident in the form of arrays of integer lists.
This means that the symbols in the procedure stay resident for the
lifetime of the availability of the procedure whence the reference
count stays positive.  The result is that symbol table lookup is fast
most of the time.
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

rh@smds.UUCP (Richard Harter) (11/23/90)

In article <PCG.90Nov16161830@odin.cs.aber.ac.uk>, pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

 	[ ... on the issue of scoping rules ... ]

> Oh yes. And I have decided to come out of the closet with a profound and
> utterly revolutionary secret that I have been holding in my conscience
> for so many years. Alas, no more. I have to pull this weight off my
> chest (I am also pulling your leg a bit here :->).

	Pull on the right one, please.  The left one is already
	47 feet long.  

Piercarlo introduces the interesting notion of inverting the operation
of scoping.  The idea is, of course, utterly unsettling to those of us
for whom the traditional rules are engrained.  I am going to have to
think about this one.  I would be enchanted to see some further development
of this idea.
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

rh@smds.UUCP (Richard Harter) (11/23/90)

In article <chl.658750215@m1>, chl@cs.man.ac.uk (Charles Lindsey) writes:

> What you want is variables whose "extent" (i.e. lifetime) is forever, but
> whose "scope" is restricted to the bodies of the procedures which need to
> share them.

	Agreed.  [Lifetime need not be forever.]

> What you want, therefore, is a modules facility, as in Modula-2 or Ada (where
> they are called packages) - not that those particular languages have
> necessarily made a perfect job of modules.

Well, yes.  However I can't say I'm all that happy with what either
language does (I am more familiar with ADA than Modula-2.)  For example,
suppose that in Fortran 2001++ we have the code

	package foobar
		public x y z proc1 proc2
		private a b c proc3
		from another_package d proc4
		....

This sort of thing seems plausible.  Now come some questions.  Can we
nest packages?  If so, can nested packages at different locations in
the nesting tree refer to each other?  Can packages contain code which
is not in procedures?  If so, when is this code executed?  What about
this situation?

	package foobar
	program xyzzy
		....
		use foobar
		....
	program plugh
		....
		use foobar

Are these two separate name spaces or do the two programs share the
name space?  Suppose we have two different libraries, each with a
foobar package?

One of the complications is that, for my purposes, I want to avoid
additional syntax.  Solutions that use qualified names with special
characters to separate fields lose.
		....
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

peter@ficc.ferranti.com (Peter da Silva) (11/24/90)

In article <251@smds.UUCP> rh@smds.UUCP (Richard Harter) writes:
> In article <Y+=69_G@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter da Silva) writes:
> > I have been facing a similar problem in a series of extension packages
> > for TCL...  I've been thinging of doing something like this (in TCL):

> > 	module modulename {
> > 		import othermodule symbol...
> > 		...
> > 		export symbol...
> > 	}

> So far, so good.  However I don't know how you are using the word "module".

Good point. "Module" here refers to a collection of symbols with a common
scope. No control flow commonality is implied. Perhaps I should say "package"?
Or borrow from forth and call it a "vocabulary".

> > Because of compatibility considerations, all existing symbols are assumed
> > to be in a root module "tcl" which is imported implicitly. 

> Ouch.  Does this mean that there can only be one instance of a symbol?

No, it means that symbols already defined in existing TCL code (such as proc
or ErrorInfo) are in the root vocabulary.

> If I understand this correctly you are working with a single namespace.

Existing TCL has a single namespace. I'm extending this to a group of
namespaces. Control flow doesn't come into it.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com 

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/27/90)

On 23 Nov 90 07:45:51 GMT, rh@smds.UUCP (Richard Harter) said:

rh> In article <PCG.90Nov16161830@odin.cs.aber.ac.uk>, pcg@cs.aber.ac.uk
rh> (Piercarlo Grandi) writes:

  	[ ... on the issue of scoping rules ... ]

pcg> Oh yes. And I have decided to come out of the closet with a profound and
pcg> utterly revolutionary secret that I have been holding in my conscience
pcg> for so many years. Alas, no more. I have to pull this weight off my
pcg> chest (I am also pulling your leg a bit here :->).

rh> Pull on the right one, please.  The left one is already 47 feet
rh> long.

Ohhhh. I can imagine how badly you must limp (unless you keep it coiled
like a flamingo :->). Hazards of reading News...

rh> Piercarlo introduces the interesting notion of inverting the
rh> operation of scoping.  [ ... ] I am going to have to think about
rh> this one.  I would be enchanted to see some further development of
rh> this idea.

Ah, a customer! Welcome sir, here we have a full range of interesting
reasons for which top down scope rules are better than inside outside
ones.

Which one would you like to see first? Would you mind "reuse" to start?

If you use top down scope rules you have a tree (picture it as in
traditional CS fashion with the root at the top, like a genealogical
tree) of closures. Each piece of code sees only the closure in whose
context it executes and its descendants.

This means that modules are perfectly reusable; the same subtree can be
linked in transparently in many places in the overall tree (which
becomes a DAG, or even a general directed graph, actually).

Consider the alternative with inside outside; modules that use globals
cannot be reused as easily. In particular, code that uses globals to
fake generators (subroutines that need to remember state between
invocations) is obnoxiously bad to reuse.

WIth top down scoping, two subtrees that need to communicate, instead of
being nested in the same scope, can just share subtrees of their
environment.  In a top down scoping language a library is just a subtree
which contains the bindings for all the library entities; it is easy to
reuse the library everywhere without bothering about possible name
clashes at the global level (try to do that in C++ with three libraries
which both define an Object class!).

There is *no* global level. This does not mean that the library or any
functions in it needs to be stateless, because its naming subtree can
well contain bindings for persistent entities -- indeed all the
subroutines in the library will be lumps of code statically bound to
their identifiers.

rh> The idea is, of course, utterly unsettling to those of us
rh> for whom the traditional rules are engrained.

Actually some of top down scope rules is not entirely new. Consider Ada
or many other modular languages; modules nest BOTH inside out and top
down; you can refer to the names of the enclosing module, but also the
enclosing module can refer to the entities in the enclosed ones (dot
notation). I maintain that only the latter should be allowed. If the
enclosing module wants to "show" some names to an enclosed module, this
should be done by the enclosing module appending the relevant naming
subtree under the enclosed one, not the latter importing those names
explicitly (and thus making itself dependent on where it is linked in
the environment tree).

This is the really different aspect of top-down naming. It also implies
that the modules need not be textually nested, and this means that each
module can be compiled on its own, and that shaping the required naming
substrees for a module is the responsibility of the upper (but not
really enclosing) modules. In other words envirionment tree shaping is
totoally decoupled from module implementations, allowing mix and match
between module interfaces and implementations, in a transparent way.

If you want another example in which things may be familiar, consider a
UNIX like filesystem tree in which each directory contains a single
module, which is linked against modules in lower directories, and where
you can use (symbolic) links to share subtrees (or even, and this may
make sense at times, loops in the naming structure).

Unfortunately current linkers enforce a linear name resolution system, so
that if you use a current linker you must produce a preorder traversal
of the tree of module directories, and this may greatly reduce the value
of the tree, because it makes for conflicts and ambiguities.

The above arrangement is often used in actual C programming practice,
because C mercifully essentially disallows nested (inside outside)
scoping (in a C source file the static entities are the module's
closure, the extern references are the roots of lower subtrees and the
extern definitions are the leaves of upper subtrees). What I am saying
is that this (incomplete) arrangement is the one that one wants to use,
not just at the intermodule level, but also thruout the language.

I can easily see possible implementations; they become easy to design
once one gets into the habit of visualizing environment trees (graphs,
actually), whether they are linearized on a stack or not, e.g. see
Baker's landmark paper on shallow binding in Lisp, CACM 1.5.

The fact that most algorithmic languages linearize both the environment
tree and the control tree on a stack has made most people regrettably
assume that it is the only possible way, blinding them to things like
closures, partial application, generators, i.e. fully general (and
orthogonal!) environment and control graph shapes.

As soon as you picture the environment tree you start to realize that
searching for names upwards is madness, because it makes subtrees
context dependent, and kills reuse.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

Chris.Holt@newcastle.ac.uk (Chris Holt) (12/01/90)

In article <PCG.90Nov27153408@odin.cs.aber.ac.uk>
 pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
[lots of interesting reasons why scoping should be changed from
the standard way]

Purists have long maintained that anything in one part of a
program should be explicitly imported or exported if it is to
be visible in another (remember Dijkstra's glovar declarations?).
This has never taken off because in real life the number of things
that need to be redeclared over and over again is enormous; and
really important things can be remembered anyway (at least when
writing the code :-).

Well, I tend to view objects as 3D structures, e.g. spheres, that
have windows in them.  Two spheres can be linked together by
connecting their windows; then they can see into each other, to
a greater or lesser extent (since windows only allow certain things
to be seen; they show restricted views).  A window may be one-way,
rather than two, so you can only see out, or only see in.  That is,
the user should be able to specify which way scoping works; there
may be a default, but it should be under the programmer's control.

Furthermore, windows don't have to be just on the surfaces of
objects, linking them with their direct (3D) environments.  If a
module A is linked with B, and B is linked with C, then B should
be able to introduce A and C so they each have a window directly
to the other, without going through B (this is useful if B
decides to terminate before A and C are done).  So, once a module
has been introduced to a given library object, they can talk
without going through the entire hierarchy; they go through
the "hyperwindow" connecting them.

This approach encourages small interfaces; no object wants
another object looking directly at its variables (unless they're
in love :-), and each object only sees those windows of
other objects that it has been introduced to (what's a nice
object like you doing in an environment like this?).  The
scope of a method becomes a graph, with each method having
a different scope.  Oh well; just a thought.
-----------------------------------------------------------------------------
 Chris.Holt@newcastle.ac.uk      Computing Lab, U of Newcastle upon Tyne, UK
-----------------------------------------------------------------------------
 "He either fears his fate too much, or his programming tools are small..."

artg@arnor.uucp (12/08/90)

In article <1990Nov30.191454.29030@newcastle.ac.uk> Chris.Holt@newcastle.ac.uk (Chris Holt) writes:
>In article <PCG.90Nov27153408@odin.cs.aber.ac.uk>
> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>[lots of interesting reasons why scoping should be changed from
>the standard way]
>
>Purists have long maintained that anything in one part of a
>program should be explicitly imported or exported if it is to
>be visible in another (remember Dijkstra's glovar declarations?).
>This has never taken off because in real life the number of things
>that need to be redeclared over and over again is enormous; and
>really important things can be remembered anyway (at least when
>writing the code :-).
>
>Well, I tend to view objects as 3D structures, e.g. spheres, that
>have windows in them.  

Many good, practical reasons argue against global variables.
Modules (objects or processes) sharing variables are constrained to
run on the same machine unless the distributed system running
under them implements shared memory.    So sharing variables
can prevent a distributed program from being reconfigured, and
can make it impossible to do process migration.  Sharing, and unconstrained
module interface design, explains why computer software functionality
and construction has progressed so slowly.

My favorite analogy that suggests how software should be decomposed
is electronic hardware.   Electronics could not have made great strides 
without the clearly defined INTERFACES between components.  Anyone
who builds a memory chip that satisfies the interface specs -- physical
dimensions (ie pinout positions), voltage levels, and logical
behavior -- and improves on PERFORMANCE characteristics of the chip --
power consumption, failure rate, cost and response time -- will get 
others to use the chip.  This lets loads of people and organizations
specialize in memory chips and improve their quality.  Digital
logic's full of widespread and sensible interface requirements, from
binary logic levels to addressing that uses a power of two bits.

Software, meanwhile, seems to be ruled by the law "First interface
to get market share wins", independent of quality.  See MS-DOS, 
OS/360, kermit, Lotus-123, dBase, Unixes, etc.  (If I haven't picked
on you please dont feel left out.  :-) )  

My belief is that if programmers and system designers spent more
time defining and PROGRAMMING the interfaces between separate modules,
(and only shared data specified in the interfaces), then programs would
plug together much more easily than they do today and programmers
would benefit much more from each other's work.

Languages therefore should be able to specify, independently of any
program, the data that modules pass between themselves - its type,
its degree of initialization, and other representation independent
characteristics.  Then modules should be composed, much in the way
that chips are combined into ALUs, ALUs into
boards, and boards into machines, etc., so that each composition can
be defined by its set of interfaces with the outside, and the outside
doesn't care about the implementation of a given composition.

Arthur

artg@ibm.com
IBM Research
Yorktown Heights

turner@lance.tis.llnl.gov (Michael Turner) (12/11/90)

In article <1990Dec7.195140.3022@arnor.uucp> artg@arnor.uucp writes:
>
>Many good, practical reasons argue against global variables.
>Modules (objects or processes) sharing variables are constrained to
>run on the same machine unless the distributed system running
>under them implements shared memory.    So sharing variables
>can prevent a distributed program from being reconfigured, and
>can make it impossible to do process migration.  Sharing, and unconstrained
>module interface design, explains why computer software functionality
>and construction has progressed so slowly.

I've explained to many programmers my view that global variables are to
data structures what the GOTO is to control structures (worse, in many
ways, if you ask me--I'll use the occasional GOTO, but my references to
global variables tend to be to someone else's, not to any that I invented.)
A lot of people give me this response: that they only use them when
"necessary".  When I go look at their code, there are always tons of
gratuitous global variables.

When is a global necessary?  I think the answer is: almost never.
I have used them when I needed to speed up code, but in almost every
case, I was speeding up code that I hadn't written, and that used
algorithms and data structures that I wouldn't have chosen.  I never
liked what I was doing to the code in the process.

To me, the worst part of unconstrained use of global variables is the
uncertainty: when reading the code, you find yourself looking at something
that has no readily-available context or meaning; you don't know what is
going to change it and when.

Not only does variable-sharing "prevent a distributed program from being
reconfigured", I've found that it can prevent ANY significant program from
being reconfigured!  It has probably prevented a great many programs from
being understood by anyone except the original programmer.  Forget about
GOTO-phobia, how about "`extern' variables considered harmful?
----
Michael Turner
turner@tis.llnl.gov

rang@cs.wisc.edu (Anton Rang) (12/11/90)

In article <1189@ncis.tis.llnl.gov> turner@lance.tis.llnl.gov (Michael Turner) writes:
>I've explained to many programmers my view that global variables are to
>data structures what the GOTO is to control structures (worse, in many
>ways, if you ask me--I'll use the occasional GOTO, but my references to
>global variables tend to be to someone else's, not to any that I invented.)

  Global variables : data structures :: LongJmp : control structures

The problem with global variables is that they can affect anything,
anywhere in the program.  Thus, it's difficult when you see an
assignment 'FLAG = 1' to figure out what in the world this might
affect.  (Comments help, too, but....)

>When is a global necessary?  I think the answer is: almost never.

  They're almost never necessary, but they can be convenient when
working with languages which don't have nesting.  If I read in N, M,
and two N-by-M arrays, I'm likely to use a global variable to hold
this, instead of passing an extra four parameters to 90% of my
procedures.  It's sloppy, but....

  I prefer working with languages which have nested scope; in that
case, I can pass parameters in once, and use them in subprocedures
without needing to explicitly pass them.  This can be abused, just as
globals can, but often it's more clear, IMHO, than explicitly passing
parameters--it may be more clear to call a function 'edge(X,Y)' in a
test than 'edge(G,G2,N,M,X,Y).'

>Forget about GOTO-phobia, how about "`extern' variables considered
>harmful?

  And static variables!  And global error statuses!  <grin>

	Anton
   
+---------------------------+------------------+-------------+
| Anton Rang (grad student) | rang@cs.wisc.edu | UW--Madison |
+---------------------------+------------------+-------------+

rhys@batserver.cs.uq.oz.au (Rhys Weatherley) (12/11/90)

In <RANG.90Dec10164012@nexus.cs.wisc.edu> rang@cs.wisc.edu (Anton Rang) writes:

>>When is a global necessary?  I think the answer is: almost never.
>  They're almost never necessary, but they can be convenient when [...]

Also, in hardware and interrupt programming (pretty vertical area,
but necessary) you can almost guarantee that the data you want to
operate on will NOT be passed as a parameter to the interrupt
handling function's entry point, but you have to get it from
somewhere!  Globals are convenient here.

>>Forget about GOTO-phobia, how about "`extern' variables considered
>>harmful?

How about "current programming methodologies and programmer thought
processes considered harmful" :-) .

>  And static variables!  And global error statuses!  <grin>

And procedures and functions since they are also globally declared!

Rhys.

+===============================+==================================+
||  Rhys Weatherley             |  The University of Queensland,  ||
||  rhys@batserver.cs.uq.oz.au  |  Australia.  G'day!!            ||
+===============================+==================================+

new@ee.udel.edu (Darren New) (12/13/90)

In article <1189@ncis.tis.llnl.gov> turner@lance.tis.llnl.gov (Michael Turner) writes:
>When is a global necessary?  I think the answer is: almost never.
>To me, the worst part of unconstrained use of global variables is the
>uncertainty: when reading the code, you find yourself looking at something
>that has no readily-available context or meaning; you don't know what is
>going to change it and when.

Actually, when you have no readily available context or meaning, a global
can be most helpful when properly done.  I was managing a few-person 
project in which globals were almost vital.  One person was writing
the "main" function and a few levels of nesting, another was writing
the lowest levels of nesting, and others were writing various parts
of the intermediate levels.  It was done this way because the intermediate
levels needed to be rewritten (to some extent) for each customer and 
software product, but the top-level menus would be the same for each
type of software product and the bottom layers would be the same for
all products. For example, the top level displayed the "print report"
option on the main menu, the intermediate level calculated what was
to go in the report and which columns went where, and the lowest
level got the characters out to the printer.  (There were actually
four levels, but that's beyond the scope of my point.)

Anyway, globals were needed because we could not afford to change the
middle layers of every product when something needed to be passed from
the top level to the bottom level. For example, if customer 17 needed
two different printers supported, and customers 1 thru 16 only needed
one, then getting programmers for 1 thru 16 to add the "which printer"
parameter to their intermediate levels becomes a maintainance
nightmare. 

My solution was to have all global variables actually be global; i.e.,
every programmer would know about them, and they would maintain only
information that was truely global to all routines.  To accomplish
this, every global variable had to be in the "globals" header file of the
appropriate layer (i.e., no "hidden" globals between only two modules).
Also, and most importantly, each global had to have a comment describing
what the variable "meant" independant of context. For example:

BOOL file_is_open;
  /* client file is currently open */
BOOL file_is_ro;
  /* client file is open and readonly. false if not file_is_open */

Note that you don't *need* context to understand these globals (given,
of course, that you understand our program model).  Whenever you open
the client file, you must set file_is_open to true and must set
file_is_ro correctly also.  Whenever you close the client file, you
must set file_is_open to false and file_is_ro to false.

Whenever a new global was proposed, the global and its comments were
written up and distributed to all programmers.  The comments were modified
until all programmers thought them unambiguous. For example, the first
comment for file_is_ro was /* client file is read only */ and one
of the programmers said "What if it's closed?"   

In conclusion, global variables are just as useful and dangerous as
global goto labels.  Properly commented and maintained, global variables
are helpful in reusability, not harmful. The biggest restriction is that
global variables should be *GLOBAL* and not just shared invisibly between
some subset of modules.  I rarely notice people complaining about the
global nature of some truely global variables like file handles,
process IDs, file names, userIDs, and so on; I feel that this supports
the idea that GLOBAL variables are safer than "invisible" parameters.

Global gotos are helpful too, *IF* they are actually global, that is,
if you can actually goto them at any time and have the desired effect.
Witness, for example, interrupt vectors, the "main()" entry point, and
software libraries. Each of these global goto-like mechanisms can be
useful but only if calling "sin()" always gives you the expected
documented answer. There are problems with "errno", but not because it
is global, but rather because the data structure is too simple.

             -- Darren
-- 
--- Darren New --- Grad Student --- CIS --- Univ. of Delaware ---
----- Network Protocols, Graphics, Programming Languages, 
      Formal Description Techniques (esp. Estelle), Coffee, Amigas -----
              =+=+=+ Let GROPE be an N-tuple where ... +=+=+=

rh@smds.UUCP (Richard Harter) (12/15/90)

In article <1189@ncis.tis.llnl.gov>, turner@lance.tis.llnl.gov (Michael Turner) writes:

	[... retieration of artg's arguments against globals ...]

> I've explained to many programmers my view that global variables are to
> data structures what the GOTO is to control structures (worse, in many
> ways, if you ask me--I'll use the occasional GOTO, but my references to
> global variables tend to be to someone else's, not to any that I invented.)
> A lot of people give me this response: that they only use them when
> "necessary".  When I go look at their code, there are always tons of
> gratuitous global variables...

> To me, the worst part of unconstrained use of global variables is the
> uncertainty: when reading the code, you find yourself looking at something
> that has no readily-available context or meaning; you don't know what is
> going to change it and when.

I respectfully disagree with these views.  Procedures (which are also
externals) have the same demerits.  Consider the following code:

	procedure foo
		local x
		global FLAG
		......
		x = FLAG
		......
		x = bar()

Do we know what FLAG is?  No.  Do we know what bar does?  No.  Both
are externals supplied from outside the routine.  In fact we might
reasonably say that getting the value of the global, FLAG, at least
does not have any side effects; no such guarantee is given for procedure
calls in most languages.

A blanket condemnation of "globals" is overly simplistic and fails,
in my view, to grasp the nature of the problem.  If we return to the
widely condemned "goto" for an analogy we (should) ask:  What is wrong
with using "goto"s.  The practical answer is not that the goto is sinful
but rather that, in most cases, it is too primitive -- it does not
reflect the actual flow logic being used.

If one is going to discuss the merits and demerits of globals one should
survey the uses that are made of them.  Right off hand I can list some:

(a)	Environment (state) descriptors
(b)	Data transfer between subsystems
(c)	Private data with a subsystem

I'm sure that others can come up with a much more extensive list.

> Not only does variable-sharing "prevent a distributed program from being
> reconfigured", I've found that it can prevent ANY significant program from
> being reconfigured!  It has probably prevented a great many programs from
> being understood by anyone except the original programmer.  Forget about
> GOTO-phobia, how about "`extern' variables considered harmful?

Color me a skeptic.  The thing that keeps programs from being reconfigurable
(to the extent that such a thing is desirable) is that they aren't designed
with reconfigurability in mind.  If that is one of the design objectives
a set of globals which define the configuration can be a very useful thing
indeed.

Having said this, let me make a couple of points against globals.  The 
first is that they disproportionately expand the name space of a program
(as compared to procedures) since one can introduce more names with globals
than with procedures for the same number of lines of code.

The second point against globals is the lack of natural structuring.  Let
me use C as an example.  In C all globals are in the same single flat name
space.  It is conventional to break this name space up using include files.
However this conventional is easily defeated -- one can always go around the
back door by using an "extern" declarations.

-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.