[comp.lang.scheme] Bug in MIT-Scheme

muenx@heike.informatik.uni-dortmund.de (Holger Muenx) (03/21/90)

I think I found a small bug in MIT-Scheme V7.0. Look at the following
transcript:

	1 ]=> (string->symbol "loadConst")

	;Value: loadConst

The resulting symbol consists of upcase und lowercase letters. It's a bit
disturbing if you look at these lines:

	1 ]=> (eq? (string->symbol "loadConst") 'loadConst)

	;Value: ()

	1 ]=> (eq? (string->symbol "loadConst") 'loadconst)

	;Value: ()

Any idea?

                                                   -Holger

=============================================================================

     Holger Muenx                 muenx@heike.informatik.uni-dortmund.de
     IRB, UniDo
     4600 Dortmund                  "My opinions are shareware. Send $10
     West-Germany                    if you like them."

net@tub.UUCP (Oliver Laumann) (03/22/90)

In article <2063@laura.UUCP> muenx@heike.informatik.uni-dortmund.de (Holger Muenx) writes:
> I think I found a small bug in MIT-Scheme V7.0. Look at the following
> transcript:
> 
> 	1 ]=> (eq? (string->symbol "loadConst") 'loadConst)
> 
> 	;Value: ()

It ain't no bug.  The standard says (in the section explaining
symbol->string):

   "If the symbol was part of an object returned as the value of a literal
   expression [..] then the string returned will contain characters
   in the implementations preferred standard case [..].

   If the symbol was returned by string->symbol, the case of the
   characters in the string returned will be the same as the case in
   the string that was passed to string->symbol."

Thus, since 'loadConst is a literal, the characters are transformed into
the "preferred case" by the reader, which I think is upcase in C-Scheme,
i.e. the name of the symbol actually is LOADCONST.

On the other hand, the case of the characters in the name of the symbol
created by the above call to string->symbol is preserved, which is the
reason why eq? returns () (well, I wish it would return #f, but that's
another story...).

Now let me turn your attention to a real bug in C-Scheme:

The C-Scheme reader refuses to parse numeric constants like #o-10
(i.e. a radix followed by a sign followed by digits).  On the other
hand, it happily accepts something like -#o10.  This must be a bug,
since the standard clearly says that the radix, like the exactness
specification, is a prefix (see page 37 of the P1178/D3 or page 32
of the R^3.99RS), so #o-10 is the correct form, while -#o10 should
be parsed into two objects, the symbol "-" followed by #o10.

Regards,
--
    Oliver Laumann, Technical University of Berlin, Germany.
    pyramid!tub!net   net@TUB.BITNET   net@tub.cs.tu-berlin.de

jinx@ZURICH.AI.MIT.EDU ("Guillermo J. Rozas") (03/22/90)

   Date: 21 Mar 90 13:46:55 GMT
   From: Holger Muenx <mcsun!unido!laura!heike.informatik.uni-dortmund.de!muenx@uunet.uu.net>

   I think I found a small bug in MIT-Scheme V7.0. Look at the following
   transcript:

	   1 ]=> (string->symbol "loadConst")

	   ;Value: loadConst

   The resulting symbol consists of upcase und lowercase letters. It's a bit
   disturbing if you look at these lines:

	   1 ]=> (eq? (string->symbol "loadConst") 'loadConst)

	   ;Value: ()

	   1 ]=> (eq? (string->symbol "loadConst") 'loadconst)

	   ;Value: ()

   Any idea?

						      -Holger

The R3RS report, in the section for symbols (sectin 6.4) reads:

"The string->symbol procedure, however, can create symbols for which
this write/read invariance may not hold because their names contain
special characters of letters in the non-standard case."

Given that MIT Scheme uses lower case as the standard case, the print
name for symbol 'loadConst is "loadconst", and (string->symbol
"loadConst") returns a symbol whose print name is "loadConst", so they
are clearly distinguishable.

In other words, there is no way to type the symbol whose print name
is "loadConst", since READ canonicalizes 'loadConst to 'loadconst and
(eq? 'loadConst 'loadconst) -> #t

It would be nice if the reports included a procedure called INTERN (or
something like that) which when given a string, would canonicalize it
(transform it to the standard case) and then use string->symbol on it.
The report does not have such functionality, although it can be
defined portably (using some inessential procedures), albeit somewhat
awkwardly:

(define intern
  (let ((standard-is-lower?
	 (string=? "a" (symbol->string 'A))))
    (lambda (string)
      (string->symbol
       (list->string
	(map (if standard-is-lower?
		 char-downcase
		 char-upcase)
	     (string->list string)))))))

MIT Scheme (>= 7.0) has INTERN pre-defined.

cmaeda@A.GP.CS.CMU.EDU (Christopher Maeda) (03/24/90)

   Date: Thu, 22 Mar 90 07:01:42 est
   From: "Guillermo J. Rozas" <jinx@zurich.ai.mit.edu>
   Reply-To: jinx@zurich.ai.mit.edu

   It would be nice if the reports included a procedure called INTERN (or
   something like that) which when given a string, would canonicalize it
   (transform it to the standard case) and then use string->symbol on it.
   The report does not have such functionality, although it can be
   defined portably (using some inessential procedures), albeit somewhat
   awkwardly:

It should be mentioned that Common Lisp doesn't do this either.
Calls to intern (in Common Lisp) need to upcase the symbol names.

ie 
(intern "ZiPpY") ==> |ZiPpY|
(intern (string-upcase "ZiPpY")) ==> ZIPPY ;; depends on print case variable

jinx@ZURICH.AI.MIT.EDU ("Guillermo J. Rozas") (03/26/90)

    Now let me turn your attention to a real bug in C-Scheme:

    The C-Scheme reader refuses to parse numeric constants like #o-10
    (i.e. a radix followed by a sign followed by digits).  On the other
    hand, it happily accepts something like -#o10.  This must be a bug,
    since the standard clearly says that the radix, like the exactness
    specification, is a prefix (see page 37 of the P1178/D3 or page 32
    of the R^3.99RS), so #o-10 is the correct form, while -#o10 should
    be parsed into two objects, the symbol "-" followed by #o10.


This is not a bug.  The syntax of numbers has changed between R^3RS
and R^3.99RS.  In R^3RS, the sign preceded the radix specifier.  Look
at the production for <number> on page 30.  In R^3.99RS (and
pressumably in the final version, R^4RS), the radix specifier precedes
the sign.  Since R^4RS hasn't been released, nor has the IEEE standard
been approved, you can hardly expect a 1-year-old release of MIT
Scheme to match them.
C-Scheme beta release 7.1 (to be released sometime in the not-too-far
future) matches R^3.99RS in the syntax and semantics of numbers.

cph@ZURICH.AI.MIT.EDU (Chris Hanson) (03/27/90)

   Date: 22 Mar 90 09:50:24 GMT
   From: Oliver Laumann <mcsun!unido!tub!net@uunet.uu.net>

   Now let me turn your attention to a real bug in C-Scheme:

   The C-Scheme reader refuses to parse numeric constants like #o-10
   (i.e. a radix followed by a sign followed by digits).  On the other
   hand, it happily accepts something like -#o10.  This must be a bug,
   since the standard clearly says that the radix, like the exactness
   specification, is a prefix (see page 37 of the P1178/D3 or page 32
   of the R^3.99RS), so #o-10 is the correct form, while -#o10 should
   be parsed into two objects, the symbol "-" followed by #o10.

If you had read a little more carefully you would have found that the
syntax changed between R3RS and R3.99RS.  MIT Scheme 6.1.2 and 7.0
conform to R3RS, accepting `-#o10' and refusing `#o-10'.  The current
version of MIT Scheme, sometime to be released as version 7.1,
conforms to R3.99RS and the standard.

Given that neither the standard nor R4RS is yet finished, I think it's
fair to say that the current MIT Scheme conforms to current standards.