[comp.lang.smalltalk] literals

bmc@argus.UUCP (Bob Czech) (07/03/87)

	I've been working on a variation of the smalltalk model and in studying
the VM I've found that if you have a string constant in a method and assign it
to an instance variable and somewhere down the line you make a modification to
that string, that the constant in the method would hence change.  Is this
correct?  And if so, I see it as a major flaw that you would be able to modify
a constant!

rentsch@unc.cs.unc.edu (Tim Rentsch) (07/04/87)

In article <938@argus.UUCP> bmc@argus.UUCP (Bob Czech) writes:
> I've been working on a variation of the smalltalk model and in
> studying the VM I've found that if you have a string constant in a
> method and assign it to an instance variable and somewhere down the
> line you make a modification to that string, that the constant in the
> method would hence change.  Is this correct?  And if so, I see it as
> a major flaw that you would be able to modify a constant!  

That is correct, even though somewhat surprising when first
encountered.  The easiest way to correct the undesired behavior is
to use
			iv  <-  'foo' copy.
rather than
			iv  <-  'foo'.
to do the assignment.


This feature may indeed be a flaw from the programming language
design point of view, but it is not wrong from the programming
language definition point of view.  The token between single
quotes is a string *literal*, not a string *constant*.  The
distinction is shown clearly as far back as System/360 assembly
language, as consider the code fragment

	LA	1,5                put the number 5 in register 1
	ST	1,=F'123456'       store the result over the literal
	L	2,=F'123456'       load the value in the literal cell
	...

Pretty obviously this code will result in the value 5 ending up
in register 2.  (No argument about whether this is good or bad
coding style, the programmer obviously should be shot.)

As long as you remember that objects are actually out there for
literals, this all makes perfect sense, and applies to other literals
such as Symbols and Arrays.  

All of the problems mentioned so far are basically with at:put:,
either on the literal itself or its contained objects in the case of
Arrays.  But this is not the only problem -- use of become: on a
literal will also proceed as usual, and almost certainly wreak havoc
of some sort, if not on the program then on the sanity of the person
working on the code.

My conclusion:  not an altogether satisfactory mechanism, literals
are still very useful, and I can't immediately suggest anything
better.  Any ideas?

cheers,

Tim

steele@unc.cs.unc.edu (Oliver Steele) (07/04/87)

From steele Sat Jul  4 10:29:54 EDT 1987

In article <938@argus.UUCP> bmc@argus.UUCP (Bob Czech) writes:
>
>	I've been working on a variation of the smalltalk model and in studying
>the VM I've found that if you have a string constant in a method and assign it
>to an instance variable and somewhere down the line you make a modification to
>that string, that the constant in the method would hence change.  Is this
>correct?  And if so, I see it as a major flaw that you would be able to modify
>a constant!

Except for the term "string _constant_", it is correct.  Strings and other
arrays (you will get the same behavior if you use #(this is a test) in a
method and later 'at: 1 put: #that' it) are not constants in Smalltalk.

This behavior is true of most languages that I can think of.  Try:

TRS-80 Disk BASIC:
	10 A$ = "Hi"
	20 LSET A$ = "Ho"
	RUN
	LIST

C on any machine without an MMU:
	char a[] = "Hi";
	main()
	{
	  a[1] = 'o';
	  puts(a);
	}

PDP-11 FORTRASH:
	    CHARACTER	A(2)
	    DATA	A/'H','I'/
	    A(2) = 'O'
	    TYPE 10, A(1), A(2)
    	10  FORMAT(2A1)

CSI FORTH:
	: t " Hi" ;
	121 t 2+ !
	t COUNT PRINT

Franz Lisp:
	(defun a
	       ()
	       '(this is a test))
	(rplaca (a) 'that)
	(pp a)


It may be confusing, but it seems very much the norm among languages.
What you want is 'instance _ 'Hello' copy', or 'stream _ WriteStream on:
String new'.

Many BASICs do this copy automatically for you in the case of an
assignment unless the lvalue is an LSET, RSET, or INSTR, so strings really
do act as constants for most purposes in that language except when use an
LSET.

------------------------------------------------------------------------------
Oliver Steele				  ...!{decvax,ihnp4}!mcnc!unc!steele
							steele%unc@mcnc.org

	"They're directly beneath us, Moriarty.  Release the piano!"

stevev@tekchips.UUCP (07/06/87)

In discussing problems have a user modifying an object that is used
as a string literal ...

In article <740@unc.cs.unc.edu>, rentsch@unc.cs.unc.edu (Tim Rentsch) writes:
> 
> literals
> are still very useful, and I can't immediately suggest anything
> better.  Any ideas?

I see two possibilities:
  * Have the compiler generate code to make a copy of any string literal
    whenever it is accessed.  This obviously slows things down.
  * Define both mutable and immutable versions of String (and Array) and
    have the compiler produce immutable ones to represent literals.

The latter does not address the problem of a programmer using
"instVarAt:put:" or "become:", but I see that as a different issue:
the language does not have the facility to hide "dangerous" operations
from the programmer.

		Steve Vegdahl
		Computer Research Lab
		Tektronix Labs
		Beaverton, Oregon

johnson@uiucdcsp.cs.uiuc.edu (07/07/87)

*	I've been working on a variation of the smalltalk model and in studying
*the VM I've found that if you have a string constant in a method and assign it
*to an instance variable and somewhere down the line you make a modification to
*that string, that the constant in the method would hence change.  Is this
*correct?  And if so, I see it as a major flaw that you would be able to modify
*a constant!

That is correct.  It is one of several flaws in the language design.  It is
no real consolation, but this particular problem can be avoided by using
symbols instead of strings, since symbols are not modifiable.  However, the
problem resurfaces with "constant" arrays.  There probably need to be more
unchangeable classes, and all constants need to be one of them.  If you
want to use a constant as the initial state of an object then you should
make a copy of it.  Note that constants like SmallIntegers cannot be
modified.

jans@tekchips.UUCP (07/09/87)

>
>
>
>*	I've been working on a variation of the smalltalk model and in studying
>* the VM I've found that if you have a string constant in a method and assign
>* it...  I see it as a major flaw that you would be able to modify a constant!
>
>That is correct.  It is one of several flaws in the language design...  Note that >constants like SmallIntegers cannot be modified.

*That* is the flaw in the language!  SmallIntegers are not bona fide objects!

>...this particular problem can be avoided by using symbols instead of strings,
>since symbols are not modifiable.

Wrongo.  Try

	x _ #flubber.
	x basicAt: 3 put: $i.

Symbols are objects, and like all other objects, they may entertain requests
to change their contents.  The flaw in this case is not with the language, but
with the understanding of what is happening.  As someone else pointed out,
there is no such thing as a constant object in Smalltalk.  (SmallIntegers aren't
really objects.)

One could easily "demonstrate" that English is "flawed" with respect to
certain Eskimo tongues, since English lacks 30 odd words for describing
frozen water.  However, excepting Alaska, English has little use for such
a facility, having instead thousands of words for technical terms, words
that are borrowed by other languages around the globe.

Smalltalk suffers not from it's lack of constants, but rather from biased
notions of what a language should be.  You want a constant?  Subclass and
override all the accessing protocol!

johnson@uiucdcsp.cs.uiuc.edu (07/11/87)

I claimed that it was a flaw in Smalltalk that "constant arrays" were not
constants, and offered symbols as an example of a real constant. jan@tekchips
claimed that I was wrong, since one could change a symbol using basicAt:put:.
This is entirely beside the point.  If one mistakenly hands a symbol to an
object that thinks it is a string, the object will send at:put: to it and
find out that it is a "constant", since it doesn't understand at:put:.
basicAt:put: is really only for debuggers and the like.  If someone comes
complaining to me that an object using basicAt:put: changed some other
object (like a dictionary or symbol) in unforseen ways, I will NOT be
sympathetic.  

jan@tekchips says that if I want constants, I should make them by subclasses.
That is exactly my complaint.  The flaw in Smalltalk is that I cannot.
So-called constant arrays are by default of class Array, which is not
constant.  They should instead be in class ConstantArray, which has no
at:put: message.  It will probably have a basicAt:put: message, but that
won't bother me.

"Symbols are objects, and like all other objects, they may entertain requests
to change their contents."  However, symbols are usually very reluctant to
change their contents.  Other "constants" should be, too. 

jans@tekchips.TEK.COM (Jan Steinman) (07/14/87)

johnson@uiucdcsp:
>jan@tekchips says that if I want constants, I should make them by subclasses.
>That is exactly my complaint.  The flaw in Smalltalk is that I cannot.
>So-called constant arrays are by default of class Array, which is not
>constant.  They should instead be in class ConstantArray, which has no
>at:put: message.  It will probably have a basicAt:put: message, but that
>won't bother me.

Again, the flaw is not in Smalltalk.  Adding this code to your image will add a 
new class that exhibits array accessing behavior similar to that of Symbols.  I 
don't know what "so-called constant arrays" are (literals?), but now you have 
"real" ConstantArrays!  (I still maintain that there is no such thing as a 
constant in Smalltalk!)  The compiler could be hacked to cause the "#()" 
notation to generate ConstantArrays, but that is left as an exercise for the 
reader.  This took less than one minute to write in Smalltalk.  After adding 
this code, try evaluating:

	c _ #('test' #foo 456 1.01) asConstantArray.
	c at: 2 put: #bar.

--------------  ConstantArrayHack.st --------------------
Array variableSubclass: #ConstantArray
	instanceVariableNames: ''
	classVariableNames: ''
	poolDictionaries: ''
	category: 'Collections-Hacked'!

ConstantArray comment: 'This class removes Array storage protocol for those who 
feel that Smalltalk constants are needed.'!

!ConstantArray methodsFor: 'accessing'!

at: index put: anObject
	"ConstantArrays do not allow modification of their contents."

	self error: self class name, 's cannot be modified!!'! !

!Array methodsFor: 'converting'!

asConstantArray
	"Return an unmodifiable copy of the receiver."

	| constant |
	constant _ ConstantArray new: self size.
	1 to: self size do:
		[:i| constant basicAt: i put: (self at: i)].
	^constant! !

steele@unc.cs.unc.edu (Oliver Steele) (07/15/87)

In article <80500010@uiucdcsp> johnson@uiucdcsp.cs.uiuc.edu writes:
>
>I claimed that it was a flaw in Smalltalk that "constant arrays" were not
>constants, and offered symbols as an example of a real constant.

I agree.  In an earlier article I tried to show that Smalltalk acted the
same as many other languages in this respect, but I think that this just
shows that many other languages are flawed too.

An example of why it is a flaw is that code such as

    squares

	| stream between |
	stream _ WriteStream on: ''.
	between _ '('.
	1 to: 10 do:
	    [:i |
		stream nextPutAll: between.
		i*i printOn: stream.
		between _ ', '.
		]
	stream nextPutAll: ')'.
	^stream contents

will only work correctly once, and that after it has been run once the
source no longer reflects the compiled method.  It's obvious to an
experienced Smalltalk programmer what is going on, but a run-time error
would preferrable to a subtle piece of self-modifying code.

>jan@tekchips says that if I want constants, I should make them by subclasses.
>That is exactly my complaint.  The flaw in Smalltalk is that I cannot.
>So-called constant arrays are by default of class Array, which is not
>constant.  They should instead be in class ConstantArray, which has no
>at:put: message.  It will probably have a basicAt:put: message, but that
>won't bother me.

This isn't the problem.  I just created ConstantString and ConstantArray,
which work as you would expect, and added conversion messages to go
between Constant<x> and <x>.  Then I changed Scanner|xStringLit and
Scanner|scanVector (I've got the names wrong, but the're pretty easy to
find) to return their contents asConstantString and asConstantArray, and
everything worked fine, up to a point.  Two points.

The point was that some Smalltalk code on my system (MacPlus with version
0.3) assumed that it would be able to modify strings passed to it.
Evaluating strings such as
			FileStream fileNamed: 'foo'
would give an error when a message tried to modify 'foo'.  This is just
sloppy programming, and could probably be eliminated without too much
trouble (if it's even present in other systems).

The other point is that code that does
		fee _ #(fi fo fum) copy.
will expect fee to be mutable.  I suspect I would have run into this if I
had recompiled much of the system after changing Scanner.  The only
workaround I can see is to let Constant<x>|copy be the same as
Constant<x>|as<x>, but this is very misleading and changes the semantics
of copy (copy no longer always returns an object of the same class).
Comments?

------------------------------------------------------------------------------
Oliver Steele				  ...!{decvax,ihnp4}!mcnc!unc!steele
							steele%unc@mcnc.org

	"They're directly beneath us, Moriarty.  Release the piano!"

allenw@tekchips.TEK.COM (Brock) (07/17/87)

In article <805@unc.cs.unc.edu>, steele@unc.cs.unc.edu (Oliver Steele) writes:
> The other point is that code that does
> 		fee _ #(fi fo fum) copy.
> will expect fee to be mutable.  I suspect I would have run into this if I
> had recompiled much of the system after changing Scanner.  The only
> workaround I can see is to let Constant<x>|copy be the same as
> Constant<x>|as<x>, but this is very misleading and changes the semantics
> of copy (copy no longer always returns an object of the same class).
> Comments?
> 
> ------------------------------------------------------------------------------
> Oliver Steele				  ...!{decvax,ihnp4}!mcnc!unc!steele
> 							steele%unc@mcnc.org
> 
> 	"They're directly beneath us, Moriarty.  Release the piano!"

The use of the "species" message  to determine what the class of a copy of a
ConstantArray should be might be appropiate.  Alternately,
copy for a constant could be defined to return self (this
is what "immutable" objects (Symbol and Character) in
Smalltalk-80 currently do).  The latter would  result in
fee _#(fi ... above not doing what was intended.  

The simplest answer to the issue raised above may be that Smalltalk
already has a syntax for accomplishing what was intended by the "fee"
expression.  It is:

	fee _ Array with: #fi with: #fo with: #fum.


	Allen Wirfs-Brock
	Software Productivity Technologies
	Tektronix, Inc
	allenw@spt.TEK.COM

franka@mmintl.UUCP (Frank Adams) (07/24/87)

In article <805@unc.cs.unc.edu> steele@unc.UUCP (Oliver Steele) writes:
>In article <80500010@uiucdcsp> johnson@uiucdcsp.cs.uiuc.edu writes:
>>jan@tekchips says that if I want constants, I should make them by subclasses.
>>That is exactly my complaint.  The flaw in Smalltalk is that I cannot.
>
>This isn't the problem.  I just created ConstantString and ConstantArray,
>....  Then I changed Scanner|xStringLit and Scanner|scanVector ...
>to return their contents asConstantString and asConstantArray ...

This is fine for Smalltalk/80[TM].  Unfortunately, the folks at Digitalk saw
fit to hide their compiler.  This is the only serious complaint I have about
Smalltalk/V[TM].

Not so incidently, I have reverse-engineered their compiler, and now have
source code for everything on my system.  At some point, I will try to
separate this from all the other changes I have made, and maybe even post it
(if there is sufficient demand).  Don't expect it any time real soon, though.

Some hints if you wish to try this yourself:

The compiler is in a group of classes whose names are all blank (e.g, 
"'  ' asSymbol").  There are various places in the system which skip over
all classes with such names -- notably the Class Hierarchy Browser; also a
method in SystemDictionary called (if I remember right) "getSourceClasses".

The first 3 bytes of a CompiledMethod are header information.  The first
byte is the primitive number.  The second byte represents the number of
local variables, including block arguments.  This byte is ones-complemented
if the method contains blocks which are not optimized away.  The third byte
is the number of arguments to the method.

The last 3 bytes of a compiled method are the pointer to the source code.
They will be zero for those methods for which the source code is unavailable.
If the 4th byte from the end is less than 134, it is the size of the literal
table immediately preceding; otherwise there is no literal table.  The
literal table is used only by the memory manager; literals used by the
method are encoding directly in the byte code stream as necessary.

The Smalltalk/V[TM] virtual machine is similar to that of Smalltalk/80[TM],
but uses entirely different byte codes.  The byte code set is sparse.  Note
that different byte codes are used for accessing local variables and for
returning in methods which have blocks.
-- 

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108