[comp.lang.ada] aliases/function return values

jcallen@Encore.COM (Jerry Callen) (08/28/90)

I've run across a curious anomaly with the value returned by a function;
if the function value is returned by reference, an alias can be created.
The following program illustrates the problem. NOTE: I will not attempt
to justify the following code; it's boiled down from some code supplied
by a user. Surely _I_ would never write such bizarre code. :-)

-----------------------------------------------------------------
with report; use report;

procedure alias1 is
    package p1 is
        ELEMENTS: constant := 100;
        subtype range_t is integer range 1..ELEMENTS;
        type array_t is array(range_t) of boolean;

        function make_set (which: range_t) return array_t;
        function transform (a, b: array_t) return array_t;
    end p1;

    use p1;

    v1, v2, v3: array_t;

    package body p1 is
        global: array_t;
        null_array: constant array_t := (others => FALSE);

        function make_set (which: range_t) return array_t is
        begin
            global := null_array;
            global(which) := TRUE;
            return global; -- may create an alias for "global"
        end make_set;

        function transform (a, b: array_t) return array_t is
        begin
            -- if the value returned by make_set is used as a
            -- parameter to this function, AND the compiler
            -- created an alias, the next statement will tromp
            -- on the parameter.
            global := null_array;
            global := (a xor b) and a;
            return global;
        end transform;
    end p1;
begin
    test ("alias1", "see if an alias is created for function return");
    v1 := (others => TRUE);
    v2 := make_set (20);
    v3 := transform (v1, v2);

    if v3(20) then
        failed ("base case failed - compiler is TOTALLY broke");
    end if;

    v3 := transform (v1, make_set (20)); -- interesting call

    if v3(20) then
        failed ("compiler used 'return by value' and created an alias");
    end if;

    result;
end alias1;

-----------------------------------------------------------------------

This is obviously a pretty contrived test case, but it should get the
point across. Is a compiler permitted to return objects "by reference"
(with the possibility of creating aliases) or must it create a copy
of the return value? I can't find anything in the LRM on this point,
although there is some gobble-de-gook in 6.2.13 that might be relevent.

Aside: this program is also dependent upon the mechanism used to
pass parameters; to fail, "return by reference" and "call by reference"
must BOTH be used.

I would LIKE to think that objects may be returned by reference; 
otherwise potentially large objects would have to be copied.
I would call the program above erroneous, since it is dependent upon
the mechanism used for parameter passing and value return.

-- Jerry Callen
   jcallen@encore.com

davidr@inmet.inmet.com (08/29/90)

It's a compiler bug, but a useful one, and there are no ACVC's which
test for it; so it's probably best to assume that the program
is erroneous.

I couldn't find an obvious language rule that proves
it's a compiler bug, but 5.8:5 states that a "value" is returned
by a return statement, and the note in 5.2.1 implies that
values are always independent of each other.  

David Rosenfeld
Intermetrics Inc.
-- davidr@inmet.com

NCOHEN@IBM.COM ("Norman H. Cohen") (08/30/90)

Jerry Callen presents a program that fails if the statement

   return Global;

is implemented by returning a REFERENCE to Global, rather than a COPY of
Global, to the caller.  He then writes:

 >I would LIKE to think that objects may be returned by reference;
 >otherwise potentially large objects would have to be copied.
 >I would call the program above erroneous, since it is dependent upon
 >the mechanism used for parameter passing and value return.

David Rosenfeld, referring to a compiler that "returns by reference,"
responds:

 >It's a compiler bug, but a useful one, and there are no ACVC's which
 >test for it; so it's probably best to assume that the program
 >is erroneous.

Hold on there!  It is not up to the programmer or the implementer to
decide what will be considered erroneous.  Dependence on the PARAMETER-
passing mechanism is erroneous because the reference manual explicitly
states (6.2(7)):  "For a parameter whose type is an array, record, or
task type, an implementation may likewise achieve the above effects by
copy, as for scalar types....  Alternatively, an implementation may
achieve these effects by reference....  The execution of a program is
erroneous if its effect depends on which mechanism is selected by the
implementation."

No such choice is given for passing function results back to the caller.
Indeed, RM 5.8(6) specifically states, "For the execution of a return
statement, the expression (if any) is first evaluated," and 5.8(5)
states, "The value of the expression determines the result returned by
the function."

The offenses constituting erroneous execution are specifically
enumerated in the RM (with a few more added by AIs).  Dependence on
the result-passing mechanism is not such an offense.

Switching my perspective from language legalities to implementation
practicalities, the implementation approach suggested by Callen will
very rarely be useful.  Most of the time, the expression in a return
statement does not name a global variable, but a variable local to the
function body, or some expression such as a catenation or aggregate.
A reference to a local variable is a pointer into the stack frame that is
about to disappear.  Indeed, in a context like

   if Array_Func_1(X) = Array_Func_2(X) then ...

the stack frame will likely be overwritten by the second function call
before the value of the first function call is dereferenced.

When the function result subtype is CONSTRAINED, the caller can determine
an appropriate destination for the function result and pass a pointer to
that destination into the function body.  The return statement is then
implemented by copying the result value into the specified destination.
In a context like

   A := Array_Func(X);

the caller can pass the address of A, so that both the return and the
assignment statement are achieved with one copy of the array value.  In
a context like

   Proc(Array_Func(X));

the caller can pass an address in the stack frame it is setting up for
the call on Proc, or the address of a temporary whose address will be
placed in that stack frame.

Usually, the only useful context for a call on a function with an
UNCONSTRAINED array result is as a parameter to some outer subprogram
call--as in

   Text_IO.Put(Text_IO.Name(Input_File));

--or as the initial value of an allocator--as in

   String_Pointer := new String'(Text_IO.Name(Input_File));

--or as PART of the initial value for an allocator for some record type
with discriminants--as in

   String_List :=
      new List_Node_Type'
         (Size => Name_Length,
          Text => Text_IO.Name(Input_File),
          Link => String_List);

In the first case, the implementation can allocate temporary space at the
other end of storage, construct the result value there (entailing a copy
of each array component value), and return a pointer into that space for
the caller to use in passing the array by reference to the outer
subprogram call.  (It is essential that the caller free the allocated
space at the end of the outer call, to prevent storage leaks.)  In the
second case, the implementation can proceed as above, but allocating
space for the function result directly in the collection for the
corresponding access type, so that a pointer to the allocated space can
be used directly as the value of the allocator.  The third case is more
complicated.  The most reasonable approach is probably for the function
to yield a pointer as in the first case and for evaluation
of the record aggregate to perform the check that the array pointed to
satisfies the index constraint for the corresponding array component,
then copy the array yet again, then free the temporary copy constructed
by the function.  The extra copy can be avoided by passing the address
and index constraint of the record component, which is allocated on the
basis of discriminant values before the function is called, and
proceeding as in the constrained case; however, potential benefits may be
outweighed by the cost and complexity of adding an extra implicit
parameter to functions with unconstrained array-type results, indicating
which return method to use for a given call.

Callen bemoans the fact that potentially large objects have to be
copied, but returning an array as a function result is inherently a
potentially expensive operation, especially if the result subtype is
unconstrained.  Fortunately, clever compilation can keep the cost
reasonable in most cases, while preserving the semantics required by the
reference manual.

Norman H. Cohen

sampson@cod.NOSC.MIL (Charles H. Sampson) (08/31/90)

In article <12594@encore.Encore.COM> jcallen@encore.com (Jerry Callen) writes:
>I've run across a curious anomaly with the value returned by a function;
>if the function value is returned by reference, an alias can be created.
>
> [Example showing one of the usual aliasing problems deleted.]
>
>          ... Is a compiler permitted to return objects "by reference"
>(with the possibility of creating aliases) or must it create a copy
>of the return value? I can't find anything in the LRM on this point,
>although there is some gobble-de-gook in 6.2.13 that might be relevent.

     I don't remember this problem being explicitly covered in the LRM,
I can't think of any place to look, the ARG appears never to have dis-
cussed it, and 6.2(13) doesn't apply because the subject there is sub-
program parameters and a function's value is not one of its parameters.

     However, there's no need for an explicit statement in the LRM.  The
implementation is wrong because it violates the semantics of a function
invocation.  Each invocation of a function determines an associated value
and that value cannot mysteriously change after the invocation has com-
pleted.

     Another respondent has said this might be a useful compiler error,
presumably on the basis of efficiency of the implementation.  I don't
want my function values to change after they've been calculated, no
matter how efficient the code is.  It should just be accepted that func-
tions that return large structures are not going to do so with the same
kind of code as a Boolean-value function, but this "inefficiency" might
well be offset by other considerations, for example the avoidance of un-
pleasant side-effects.

(Does anybody else remember the quote from The Psychology of Computer
Programming: "If you're not interested in correct answers, I can write
a very fast program for you.", or words to that effect?)

                            Charlie Sampson

eachus@linus.mitre.org (Robert I. Eachus) (09/08/90)

     I'll bring this up informally at the URG adn ARG meetings next
week, but it looks like a confirmation...the return statement returns
the value of an expression, evaluated before the return.

				
--

					Robert I. Eachus

with STANDARD_DISCLAIMER;
use  STANDARD_DISCLAIMER;
function MESSAGE (TEXT: in CLEVER_IDEAS) return BETTER_IDEAS is...

mfeldman@seas.gwu.edu (Michael Feldman) (09/08/90)

In article <EACHUS.90Sep7150018@aries.linus.mitre.org> eachus@linus.mitre.org (Robert I. Eachus) writes:
>
>     I'll bring this up informally at the URG adn ARG meetings next
>week, but it looks like a confirmation...the return statement returns
>the value of an expression, evaluated before the return.
>
Let's talk about abstraction vs. implementation here. Does this mean that
a value that happens to be large - say a large object of an unconstrained
array type - MUST be copied twice, i.e. once to return it to the caller and
again to store it in its final resting place? Or can an implementer store it
in heap space, for example, returning its address (or a dope vector only) to
the caller? Since some implementers build all unconstrained array types 
this way, by allocating only a dope vector and acquiring object space
dynamically, it would seem that returning function values in the same way
makes sense, right? Then the "value" of the expression is the dope vector
which points to the actual data in the heap, instead of the data itself.

Since the LRM generally leaves storage management issues to the implementer,
especially for arrays, there seems to be no violation here. Am I right?
---------------------------------------------------------------------------
Prof. Michael Feldman
Department of Electrical Engineering and Computer Science
The George Washington University
Washington, DC 20052
202-994-5253
mfeldman@seas.gwu.edu
---------------------------------------------------------------------------