kelvin@arizona.UUCP (Kelvin Nilsen) (01/07/86)
i am designing a language for use in programming data communications applications such as file transfer, network prototyping, logon scripts, more sophisticated scripts, ... although its forefathers would probably disavow any association with this beast, it has strong similarities to both SNOBOL and ICON. one feature which is particularly dangerous is the idea of "assignment-by- reference." any assignment of the form (a <- b) simply gives 'a' a pointer to the same data that 'b' is pointing to. so in the following sequence of code: b <- 3; a <- b; b <- 5; write(a); the number 5 is written to stdout. of course, :-) this is often not desirable behavior. more traditional behavior could be obtained by rewriting the code: b <- 3; a <- `b; /* the grave accesnt causes a copy of 'b' to be made */ b <- 5; write(a); the proposed semantics for this wierdness is such that each time a new item joins a group of aliased variables, the entire group takes on the value represented by the right-hand-side of the assignment. so if we append to the first version of the above code, the statements: c <- 7; a <- c; now 'a', 'b', and 'c' all represent the number 7, and changing any one of them will change all of them. To back variables out of this shared relationship, nullify them one by one. implementation of this relies on several levels of indirection existing between a variable and the data that it represents. this is more costly than what is done for standard compiled languages, but typical(?) of high level languages such as ICON and SNOBOL. motivation for this is several fold. in ICON, there are inconsistencies in the treatment of structured objects and scalars. for example, in the code: t := s s[5] := 'c' the variable 't' will feel the effect of the second line's assignment if 's' represents a list, but will not if 's' is a string. my intention is to unify such behavior. also, as CommSpeak was designed with real-time programming on microcomputers in mind, it is strongly typed. Assignment-by-reference was proposed in hopes of buying back some of the flexibility of late binding and deferred evaluation which are available to varying degrees in both ICON and SNOBOL. my quandary: have any other languages been plagued by this same blunder? any papers describing the experiences, feelings of the language designers in retrospect? does anyone have a suggestion for revised semantics of assignment that might satisfy the same needs with less "astonishing results" to the unsuspecting programmer. thanks for your thoughts, kelvin nilsen
kurt@fluke.UUCP (Kurt Guntheroth) (01/14/86)
The CLU language (Barbara Liskov et al) at least describes this all formally and gives a name to the objects that are assigned by copying vs those that are assigned by copying reference. You can or could once obtain the CLU Reference Manual from Springer Verlag from the lecture notes in computer science series.
ka@hropus.UUCP (Kenneth Almquist) (01/19/86)
> Any assignment of the form (a <- b) simply gives 'a' a pointer to the > same data that 'b' is pointing to. So in the following sequence of > code: > b <- 3; > a <- b; > b <- 5; > write(a); > > the number 5 is written to stdout. Of course, :-) this is often not > desirable behavior. The problem is that assignment does not have consistent semantics. Let us assume that, in the first assignment, the reference to 3 creates a new object which is set to the value of 3. After the assignment, b points to this object. After the second assignment, a also points to this object. Now we come to the third assignment. I gather that the language starts to create a new object which it plans to set to the value of 5, but then notices that b already points to an integer object and decides to simply modify this object rather than going to the trouble of creating a new object. This is not what the programmer was expecting the language to do! The solution is simple. You already require that an assignment of the form (a <- b) simply give 'a' a pointer to the object pointed to by b. There is no justification for making the semantics of assignment change when the right hand side is a more complex expression than a simple variable. Require that any assignment of the form (a <- expression) simply give 'a' a pointer to the object computed by the expression. As an aside, the traditional way to handle numbers in an "assignment- by-reference" language is to have each number refer to a fixed object. Thus the assignment "a <- 5" would set 'a' to point to the object '5' rather than creating a new object. Naturally there is no reason for the implementation to actually set aside storage to hold the object '5'. This approach gives different results that the one given above if you permit pointer comarison. It also gives different results if you allow numbers to be modified, but I would recommend not providing such a feature. > Motivation for this is several fold. In ICON, there are inconsistencies > in the treatment of structured objects and scalars. for example, in the > code: > t := s > s[5] := 'c' > > the variable 't' will feel the effect of the second line's assignment if > 's' represents a list, but will not if 's' is a string. The previous example used integers, and it is pretty clear that there is no reason for users to modify integers. In the case of structured objects, the situation is more complex. You could insist that users copy structured objects every time they want to modify them, but if you do this circular structures cannot be constructed, and as structures get more complex they get more difficult to copy. Therefore you have to provide means by which structured objects may be modified. Character strings are structured objects, but their structure is simple and unchanging; thus it is *not* clear that you need to provide functions for modifying character strings. In the example given, the confusion is caused by the fact that s[5] refers to s substring of s if s is a string, but it does not refer to a sublist of s if s is a list. Thus the semantics of s[5] vary depending upon whether s is a string or a list, *regardless* of which side of the ":=" the s[5] appears. This could be fixed by making a string be a list of characters; I expect that this was not done in ICON for reasons of efficiency. It may still be reasonably claimed that the assignment s[5] := "c" should assign the string "c" to the result of the expression s[5]. Since the expression s[5] generates a new string, which is not bound to any variable, this assignment should generate a run time error. But then ICON has never claimed to be an assignment-by-reference language. You can convert it to one by disallowing the above statement and providing a similar capability in a subroutine which generates a news character string: s := replace_character(s, 5, "c") Kenneth Almquist ihnp4!houxm!hropus!ka