[comp.lang.eiffel] class STRING model needs fixing

shelley@atc.sps.mot.com (Norman K. Shelley) (02/15/90)

New Desired Upgrade for Eiffel 3.0 (14Feb90)

Currently Using; Eiffel 2.2B 

Class upgrade: STRING


Problem Statement: 
1.) The current Eiffel model of the class "STRING" is based on the C model of
strings found in the C subroutine library. The model requires an allocation of
one more byte than the number of characters in a string and this byte is the
last byte in a string and it is NULL signifing the end of the string. String
lengths are dynamically computed and there is no idea of saving/updating this
state.

2.) Eiffel needs a better model that reflects the unique features Eiffel offers
without sacrificing execution speed. Assertions heavily rely on checking the
length of the string. This is a function and not an attribute thereby demanding
much more CPU resources then should be required.

Givens:
1.) The feature "count" of class "STRING" is an Eiffel function that
calls the C
function "strlen". When "strlen" is called it takes the input pointer
and starts
a count of the number of characters pointed to by the input pointer. It stops
the count when it finds a NULL character. Therefore this function is expensive
in time. This can become compounded when it is called repeatedly or for long
strings.

2.) The class "STRING" uses the function "count" in MANY of its assertions
therefore increasing the number of calls to this expensive function.

3.) Not counting uses of the function "count" in assertions it is called in
seven different places (mainly from append, prepend, extend, and precede)

Desired Features:
1.) Until a more suitable model is found and implemented the function "count"
should be implemented as an attribute.


Problems in modifing class STRING:
1.)  Only an I.S.E. employee can change I.S.E.'s compiler and the compiler
treats class "STRING"  specially e.g. a constant STRING never calls the normal
STRING create therefore the count field can not be set for STRING constants.

2.) Other C calls like "rm_all" change the string with no concept of updating a
dynamic count field.

Norman Shelley
Motorola - ATC
2200 W. Broadway AZ09/M350
Mesa, AZ 85202
...!uunet!dover!atc!shelley
shelley@atc.sps.mot.com
(602) 962-2473

alonzo@microsoft.UUCP (Alonzo GARIEPY) (02/20/90)

In article <48a5572e.12c9a@digital.sps.mot.com> shelley@atc.sps.mot.com (Norman K. Shelley) writes:
> Problem Statement: 
> 
> 2.) Eiffel needs a better model that reflects the unique features Eiffel 
> offers without sacrificing execution speed. Assertions heavily rely on 
> checking the length of the string. This is a function and not an attribute 
> thereby demanding much more CPU resources then should be required.

I'm not convinced that the speed performance of assertions is of primary 
concern.  Assertions will be turned off in high-performance versions of 
a program.  The speed and memory performance of production programs on 
production machines is more important.

Zero-terminated strings are convenient because they save memory and 
interface well with runtime libraries intended for use with C.  Many 
algorithms execute faster with zero-terminated strings because a separate 
index need not be maintained.  If your style of programming uses lengths 
and indexes more than pointers or iterators, counted strings are probably 
better.  

I tend to agree that Eiffel is such a lengths-and-indexes language, and that 
the use of zero-terminated strings isn't buying much performance except to 
avoid continual conversions when dealing with the operating system.  But 
conversion can require a very annoying amount of memory management.  The 
ideal would be a string type that has both a length attribute and zero- 
termination.  You can keep the length and termination updated all the time 
or calculate and cache one of them as necessary.  We have been using strings 
of this kind at Microsoft for years.

Zero terminated strings are a more primitive type, so it doesn't surprise
me that languages rely on the programmer to optimize the length calculation
themselves.  A little more support for compile time calculation would make
subclassing a viable solution to this kind of problem: constant objects could 
be fully constructed at compile time, making a programmer's classes equal in
performance to the built-in ones.

Alonzo Gariepy				// The opinions expressed above are not
alonzo@microsoft			// necessarily those of Microsoft Corp.

bbadger@x102c.harris-atd.com (Badger BA 64810) (02/24/90)

In article <51017@microsoft.UUCP> alonzo@microsoft.UUCP (Alonzo GARIEPY) writes:
[...]
>I tend to agree that Eiffel is such a lengths-and-indexes language, and that 
>the use of zero-terminated strings isn't buying much performance except to 
>avoid continual conversions when dealing with the operating system.  But 
>conversion can require a very annoying amount of memory management.  The 
>ideal would be a string type that has both a length attribute and zero- 
>termination.  You can keep the length and termination updated all the time 
>or calculate and cache one of them as necessary.  We have been using strings 
>of this kind at Microsoft for years.
[...]
>Alonzo Gariepy				// The opinions expressed above are not
>alonzo@microsoft			// necessarily those of Microsoft Corp.

In a recent issue of ACM SIGPlan notices, a hybrid technique was
described which puts a string length indicator at the end of the
string buffer.  That is, when a fixed-length buffer is allocated, the
*last* byte tells how many bytes to subtract to find the
zero-terminator byte.  By design, this is exactly 0 when the last byte
itself must serve as the zero terminating byte.  

I don't recall if this could be extended to buffers of size > 256.  

Something like buf[last] = number of bytes before buf[last] which 
contain the count.  Might do it:
"ABCDEFGHIJ" =  buf[10] = 0
"ABCDEFGHI." =  buf[10] = 1, buf[9]= 0
"ABCDEFGH.." =  buf[10] = 1, buf[9]= 1, buf[8] = 0

Length of 300 - 259 = 

"0123 ... BCDEFGH.." =  buf[300] = 2, buf[299]= 1(*256), buf[298] = 3
 
----
Bernard A. Badger Jr.	407/984-6385 |"Get a LIFE!" --- J.H. Conway
bbadger@x102c.ess.harris.com         |Buddy, can you paradigm?
bbadger%x102c@trantor.harris-atd.com |'s/./&&/g' Tom sed expansively.