[comp.software-eng] Functions without side effects

jcm@mstr.hgc.edu (James McKim) (06/03/91)

I've broadened the Newsgroups line, because while this thread began
with an Eiffel specific question, it has broadened into a topic
of more general interest. I have a hunch people will let me know
if they feel this is not appropriate. :-)

Let me try to summarize the discussion so far. The question before
the groups is "Should there exist a dichotomy between functions and procedures
within a class? In particular, should functions return information
about the state of an object _without_ changing that state (at least
as far as a user of the object can tell) while procedures are used
to change the state of the object _without_ returning a value.

So far the two most extreme views are held by Craig Chambers and myself.

Craig feels that procedures that return values are often extremely
useful and so should be designed into the class as a matter of course.
He cites Pop and Top in a stack class as a prime example. Top returns
the obvious item from the stack with no side effects. Pop removes the
top item _and_ returns it. His argument being, I believe, that the
vast majority of the time the user will want to use the item she just
popped. (Is that a fair summary Craig?)

I feel that the separation of concerns that the proposed dichotomization
between functions and procedures is too important to give up lightly.
The designer of a class should expect to design procedures that don't
return values and functions without side effects, and only vary from
this heuristic when there is a _compelling_ reason to do so. Thus my
Pop would just remove the top element, not return it.

Now here's the latest salvo, which is a more middle of the road
point of view.


In article <327@alfrat.uucp> roy@alfrat.UUCP (Roy Phillips) writes:
>An importent criteria for me is that code be readable - that it represents
>the solution to the problem or the model of a system with very little
>`superfluous' text.  This criteria is, for me, one of the most importent when
>designing a class or class cluster, and a major reason for using Eiffel over
>other languages.  I refuse to pay homage to the mathematical god of
>`correctness' by sacrificing the understandability of the application - as in
>all things, rules are not the answer, but judgement should be used.  Although
>agreeing in general to the principle of seperation of state-changing and
>state-querying features, I offer the two examples below in support of my
>argument that this should not of necessity be the rule.
Well, I'm not sure how much we disagree. I'm not arguing for an iron-clad
rule, either.

>  If one were to model a CPU, perhaps to create a simulator such as ITSIAC, it
>  would be essential, from the point of view of understandability of the model,
>  that the POP operation act just like the processor's POP - changing the state
>  of the processor and returning the value from the stack top into the specified
>  register.
>  Having to perform two operations would be a unnecessary complication which
>  could lead to misunderstanding and to errors.
So for you this is a compelling reason to combine the two concerns in this
particular application. 
  
>  As a second case for functions with side effects, consider the interface of
>  a mutual exclusion class, loosely based upon Djistra's P&V algorithm:
Well, Dijkstra's P operation has been criticized precisely because it
combines the two concerns we're discussing. I'll provide a reference
or two if there's interest.
>  
>  --
>  -- MONITOR  --  Resource Allocation Monitor providing mutual
>  --              exclusion facilities
>  --
>  -- %Z%%Y%:%M%   %I%
>  
>  indexing
>  
>     date: "$Date: %D% %T% $";
>     revision: "$Revision: %R%.%L% $";
>  
>     authors: roy;
>     names: monitor;
>     access: exclusive;
>  
>  deferred class MONITOR export
>  
>     free,
>     test_and_set,
>     request,
>     release,
>     got_resource
>  
>  feature
>  
>     free: BOOLEAN is
>        -- is the resource we are managing available?
>     deferred
>     end; -- free
>  
>     test_and_set: BOOLEAN is
>        -- is the resource we are managing available to us?
>     deferred
>     ensure
>        Result = True implies got_resource;
>        Result = False implies not free and not got_resource
>     end; -- test_and_set
Why not
      ensure
        old free = got_resource ;
        got_resource implies not free;

This allows test_and_set to be a pure procedure. It's every bit as
atomic as it was before since you were setting got_resource anyway.
Do you really feel that the extra return value makes your code more
readable? Or am I missing something basic?
>  
>     request: BOOLEAN is
>        -- return True when we have control of the resource
>     require
>        -- not free implies "wait until resource free"
>     deferred
>     ensure
>        got_resource
>     end; -- request
Lost you here. Does this feature ever return False? If so, when? If not
what good is the return value? Also ensure not free?
>  
>     release is
>        -- make resource available again
>     require
>        free = False and then got_resource
>     deferred
>     ensure
>        not got_resource
>     end; -- release
Also ensure free? Otherwise when is free ever True?
>  
>     got_resource: BOOLEAN
>        -- return True if we have successfully obtained 
>        -- exclusive accesss to resource
>     deferred
>     ensure
>        Result = True implies not free
>     end; -- got_resource
This ensure clause seems unnecessary and out of place. Surely I
don't have to call got_resource in order to get free set to False! 
>  
>  end -- class MONITOR
>  
>  Seperate `request' and `release' operations are an obvious
>  requirement (implemented as a function and a procedure respectively)
>  but it is essential to provide a `test_and_set' routine, that returns
>  a boolean `success/fail' status and grabs the resource if it is free
>  - hysteresis between calling `free' and `request' could cause the application
>  to perform an undesired wait inside of `request'.
Well I don't see how but it's been a long time since I worked with monitors.

>  Care should be taken
>  to provide comments, or better still, assertions that clearly outline
>  any state changes that a routine may perform, be it a function or a
>  procedure.
Certainly if a function has a side effect it should be included in the
postconditions.

>  Observing pre- and post-conditions is a lot more helpful than
>  to rely on the rule:
>  
>     "it's a function so it won't change the object's state"
My argument is that most classes will be more readable, simpler, and
better specified if _heuristics_ such as "functions should not have
side-effects unless there's a compelling reason" are followed.

>  
>Roy
>--
>Roy Phillips
>A+F SystemEntwicklung
>D-4030 Ratingen
>West(ern) Germany
>roy@alfrat.UUCP

This is fun!
-- Jim



*------------------------------------------------------------------------------*
Jim McKim  (203)-548-2458   _Give_ people fish and they eat for a day.  
Internet:  jcm@mstr.hgc.edu _Teach_ people to fish and they eat for a lifetime.

mbenveni@irisa.fr (Marc Benveniste) (06/04/91)

From article <1991Jun3.145924.28406@mstr.hgc.edu>, by jcm@mstr.hgc.edu (James McKim):
> Let me try to summarize the discussion so far. The question before
> the groups is "Should there exist a dichotomy between functions and procedures
> within a class? In particular, should functions return information
> about the state of an object _without_ changing that state (at least
> as far as a user of the object can tell) while procedures are used
> to change the state of the object _without_ returning a value.

Recalling the link between Algebraic Abstract Data Types and O.O.P, 
I would say that the notions of "generator" and "observer" operations
should have corresponding entities in classes. Generators are used 
to structurally build terms and observers are used to relate them.
Objects can be seen as representations of terms although the 
structure is coded into a state and some information is lost. 
We can identify methods that change the state and methods that do not.
I don't think a dichotomy should exist between procedures and
functions, but between state preservers and state changers. 

 In a parallel framework, this distinction at the language level
together with the interface-body separation allows a separate
compilation scheme that ensures that observers don't change
the "global state" (values and other objects through references;
note that we don't consider acquaintances as part of the state) 
and legitimates the generation of code suitable for exploiting
a safe parallelism within an object. The parallel class-based
language ARCHE we are currently implementing adopts this view.
We provide a symetric concurrent read exclusive write synchronisation
policy at the object interface level. A conditional synchronisation
is also provided at the interface level, but I don't show it here.
  The stack example looks as follows:

TYPE StackT = VIEW 
               Push : (item : INT) ();
               Pop  : () (item : INT)
              OBSERVER
               Top  : () (item : INT);
               Empty: () (maybe: BOOLEAN)
              END;

CLASS ArrayStack IMPLEMENTS Lib.StackT =
 VAR State : SEQ OF INT := <>;
     TopPtr: INT := 0;
 
 PROCEDURE Push  = ... END Push;
 PROCEDURE Pop   = ... END Pop;
 PROCEDURE Top   = ... END Top;
 PROCEDURE Empty = ... END Empty;
END ArrayStack;


--Marc Benveniste                mbenveni@irisa.irisa.fr
IRISA
Campus de Beaulieu
35042 Rennes Cedex
FRANCE

ogden@seal.cis.ohio-state.edu (William F Ogden) (06/05/91)

In article <1991Jun3.145924.28406@mstr.hgc.edu> jcm@mstr.hgc.edu (James McKim) writes:
   ...
>Let me try to summarize the discussion so far. The question before
>the groups is "Should there exist a dichotomy between functions and procedures
>within a class? In particular, should functions return information
>about the state of an object _without_ changing that state (at least
>as far as a user of the object can tell) while procedures are used
>to change the state of the object _without_ returning a value.
>
>So far the two most extreme views are held by Craig Chambers and myself.
>
>Craig feels that procedures that return values are often extremely
>useful and so should be designed into the class as a matter of course.
>He cites Pop and Top in a stack class as a prime example. Top returns
>the obvious item from the stack with no side effects. Pop removes the
>top item _and_ returns it. His argument being, I believe, that the
>vast majority of the time the user will want to use the item she just
>popped. (Is that a fair summary Craig?)
>
>I feel that the separation of concerns that the proposed dichotomization
>between functions and procedures is too important to give up lightly.
>The designer of a class should expect to design procedures that don't
>return values and functions without side effects, and only vary from
>this heuristic when there is a _compelling_ reason to do so. Thus my
>Pop would just remove the top element, not return it.

There are a number of considerations that bear upon this issue, but
comprehensibility and efficiency are probably the most important.

Functions enjoy a prominent position in twentieth century mathematics, and
thus in the educational background of most software professionals. It
makes sense to exploit this shared knowledge when designing software.
Now the principle feature of functions is that they can be composed
into arbitrarily large expressions. If you don't anticipate clients
forming expressions, then functions offer no particular advantage
over procedures as class operations.

The answer to the question of whether functions should have side-effects
which are visible at the client level is clear when you observe that
side-effects will make the value of experessions dependent upon the
order of evaluation of subexpressions, and hence quite unapparent.

On the other hand, side-effects which only affect the internal representationsof objects (such as changing the shape of a tree being used to realize a set)should be permitted for efficiency reasons.

The pop/top discussion raises an important efficiency issue in class
design. While we now give lip service to object-oriented design, we
often fail to recognize some of the consequences of what we're advocating.
If the object based software thesis is correct, then our software will be
organized around increasingly large and sophisticated objects, yet when
we design things like stacks, we persist in thinking of the objects
being stored in stacks as being small objects like integers. Instead,
we should recognize that the really serious uses of stacks will involve
storing large objects ( sets, polygons, other stacks, etc. ), and that
to achieve real reusability, our stack operations should be chosen with
an eye to making them efficient for storing large objects.

Probably the most significant feature of large objects, as opposed to
small objects like integers, is that they are usually quite expensive (if
not impossible) to replicate. Consequently operations like Top make poor
choices as primary operations, since their implementation inevitably
involves making a copy of a data structure entry. The proper choice
of primary operation for the stack then is the procedure (not function
-- see above) Pop which does return the top value, since this does not
entail replicating an entry. Top can be provided as a secondary operation,
but clients should be warned that it is potentially an expensive operation.

--

/Bill