[comp.lang.smalltalk] Smalltalk scaleability & IdentityDictionaries

CWatts@BNR.CA (Carl Watts) (06/27/91)

Gray Huggins of Texas Instruments recently sent me mail asking me
questions and expressing concerns about Smalltalk scaleability.  The
example given was the limitation of the number of elements in an
IdentityDictionary.  My response (which follows), I think, would be of
interest to other members of comp.lang.smalltalk and comp.object:

Gray,

Smalltalk scales better than other other language I know of.  The
reason is simple, you can model everything from the Space Shuttle to an
Integer the same way.  As an object (that perhaps uses other objects)
that provides an excapsulated interface to some behavior.  Now a Space
Shuttle object may have thousands of other objects that it uses to
provide its behavior, and an Integer may use no other objects to
provide its behavior, but the ways in which you treat the two objects
are pretty much the same.  They both are Objects.  And they both have a
message interface allowing interaction with it.

Smalltalk was designed from the start to not have any problems
scaling.  What other language do you now of that can as easily handle 2
+ 3 as it can evaluate:

958685745856745747465784756948574847584737384877584 +
9293458823458723745734875847574874875748874374373646

Smalltalk tells you the answer to the first is 5 and the answer to the
second is  10252144569315469493200660604523449723333611759251230

Smalltalk was the first general purpose language I know of that allowed
infinite precision Integer arithmetic.  Its thanks to the objects
Integer, LargePositiveInteger, and LargeNegativeInteger.

The IdentityDictionary problem is extremely easy to solve.  Just get
out of the mindset that the Classes that come with Smalltalk are
somehow holier-than-holy and are by definition perfect and complete.

They aren't.  Every Class in Smalltalk is implemented to fullfil a
certain purpose.  And certain implementation compromises and design
choices are always made.  When IdentityDictionary was implemented some
of the compromises and design choices that were made were:

1)  Speed over Space.  Hashing was used for fast speed at the cost of
additional space.

2)  A single object with indexable instance variables rather than some
'tree' based representation that costs more objects and where the
benefits only become apparent when a large number of items are in the
Dictionary.

The class was not meant to necessarily be the perfect implementation of
IdentityDictionary for all possible uses.

The answer to your problem should be obvious now.  Just change
IdentityDictionary or make a new kind of IdentityDictionary.  Simple.

Now, of course the question comes up, what new design compromises and
design choices do you want to make for this new class?

a)  Do you want to change the implementation of IdentityDictionary
itself to use, say a balanced tree type representation?

b)  Do you want IdentityDictionarys to turn into
LargeIdentityDictionarys when they get a certain number of elements in
them (much like a SmallInteger turns into a LargePositiveInteger when
it gets big enough).  Then the LargeIdentityDictionary can use a
different representation for storage that is more appropriate when you
have large numbers of elements.

Regardless of the first these choices, what new representation do you
want?  Multiple Arrays?  Trees of IdentityDictionarys  Self balancing
splay trees of node elements?

With each decision you are making new design decisions for your new
kind of IdentityDictionary.  Your new class will serve some particular
need.  IdentityDictionary was constructed to serve the most common need
for a class like this.  A fast, relative efficient IdentityDictionary.
It didn't promise to be all things to all people.

We chose to write a new kind of IdentityDictionary called
LargeIdentityDictionary.  IdentityDictionarys turned themselves into
LargeIdentityDictionarys when they've got move than 16000 elements in
them.  And the converse also happens, if a LargeIdentityDictionary gets
less than 8000 elements in it, it turns itself back into a normal
IdentityDictionary.

A LargeIdentityDictionary uses a different method of storing elements
which has no limit on the number of elements.  What you pay for this is
slightly slower access times.  Access time goes from O(constant) to
O(log n).  Still vary fast but slower than the hashing scheme employed
by normal IdentityDictionarys.

Hope this sheds some light.

scott@coyote.trw.com (Scott Simpson) (06/28/91)

In article <1991Jun26.193441.28581@bqnes74.bnr.ca> CWatts@BNR.CA (Carl Watts) writes:
>Smalltalk scales better than other other language I know of.  The
>reason is simple, you can model everything from the Space Shuttle to an
>Integer the same way.  As an object (that perhaps uses other objects)
>[ More comments about how integers and dictionaries are unbounded in
>  Smalltalk. ]

It that sense of scaleability, Smalltalk fares well. But when I hear
the term scaleability, I think of bigger issues such as how does the
Smalltalk *environment* scale up to large projects of the hundreds of
thousands or millions of lines. In this case, Smalltalk fares very
poorly. Large projects require multiple user concurrency, distributed
data, schema evolution, versioning, support for persistent storage and
managability of non-code artifacts, support for testing and
maintenance and support for process. In all of these areas the
Smalltalk environment performs poorly or not at all.
-- 
Scott Simpson			TRW			scott@coyote.trw.com

dlw@odi.com (Dan Weinreb) (06/28/91)

In article <1991Jun26.193441.28581@bqnes74.bnr.ca> CWatts@BNR.CA (Carl Watts) writes:

	     What other language do you now of that can as easily handle 2
   + 3 as it can evaluate:

   958685745856745747465784756948574847584737384877584 +
   9293458823458723745734875847574874875748874374373646

Since you ask, Lisp.  Whether Lisp or Smalltalk was doing arbitrary
precision integer arithmetic first would require some careful
research, but I don't think it's worth worrying about.

None of which has anything to do with the point you were making.

objtch@extro.ucc.su.OZ.AU (Peter Goodall) (06/28/91)

scott@coyote.trw.com (Scott Simpson) writes:

>In article <1991Jun26.193441.28581@bqnes74.bnr.ca> CWatts@BNR.CA (Carl Watts) writes:
>>Smalltalk scales better than other other language I know of.  The
>>reason is simple, you can model everything from the Space Shuttle to an
>>Integer the same way.  As an object (that perhaps uses other objects)
>>[ More comments about how integers and dictionaries are unbounded in
>>  Smalltalk. ]

>It that sense of scaleability, Smalltalk fares well. But when I hear
>the term scaleability, I think of bigger issues such as how does the
>Smalltalk *environment* scale up to large projects of the hundreds of
>thousands or millions of lines. In this case, Smalltalk fares very
>poorly. Large projects require multiple user concurrency, distributed
>data, schema evolution, versioning, support for persistent storage and
>managability of non-code artifacts, support for testing and
>maintenance and support for process. In all of these areas the
>Smalltalk environment performs poorly or not at all.
>-- 
>Scott Simpson			TRW			scott@coyote.trw.com
 
As distributed the Digitalk and Parcplace Smalltalks certainly don't have
the tools for large projects. There is however, no intrinsic problem with
multi-user development. Extend the environment to provide the tools.
Instantiations, OTI and SoftPert Systems all have Smalltalk environment
extensions for team configuration management.


-- 
Peter Goodall - Smalltalk Systems Consultant - objtch@extro.ucc.su.oz.au
      ObjecTech Pty. Ltd. - Software Tools, Training, and Advice
162 Burns Bay Rd, LANE COVE, NSW, AUSTRALIA. - Phone/Fax: +61 2 418-7433

Will@cup.portal.com (Will E Estes) (06/28/91)

<It that sense of scaleability, Smalltalk fares well. But when I hear
<the term scaleability, I think of bigger issues such as how does the
<Smalltalk *environment* scale up to large projects of the hundreds of
<thousands or millions of lines. In this case, Smalltalk fares very
<poorly. Large projects require multiple user concurrency, distributed
<data, schema evolution, versioning, support for persistent storage and
<managability of non-code artifacts, support for testing and
<maintenance and support for process. In all of these areas the
<Smalltalk environment performs poorly or not at all.
<--
<Scott SimpsonTRWscott@coyote.trw.com

I think these are all extremely good points.  I have yet to meet a
programmer who has worked on a project of any size who would not agree
with the statement that the image is far too large a granularity at
which to save changes to a system.

When developing C programs for MS-DOS or UNIX, would we accept any
scheme that required us to save changes to code modules in a monolithic
library with core operating system routines and required us to save
the entire OS to effect a change?  Smalltalk is much like an
operating system, and if we would not accept such a large granularity
of change for another OS, why do we accept it for Smalltalk?

I for one am very concerned about this issue, and I feel it is the one
and only impediment to wide-scale commercial use of the Smalltalk language.
What concerns me most is that Digitalk and ParcPlace seem to be doing very
little to address this issue.  On the CompuServe forum, when I and others
have raised this issue it either goes without any response at all from
Digitalk, or it gets the response "we can't fix these problems without
changing the Smalltalk definition."

I have two problems with that response: 1) it tells me that Digitalk is
more concerned with strict adherence to a standard than it is with solving
its customers' business problems; 2) it tells me that Digitalk doesn't have
a proper sense of proportion about just how bad this problem really is, and
just how much it cuts into their potential sales.  I think if someone could
prove to them that their sales might increase by factors of five or more 
if they addressed this issue effectively, then they might start to sing a 
different tune.

I know that there are third-party products to address version control
and multi-user, networked use of an image by programmers making changes
to that image, but frankly that just isn't good enough.  This whole area
points to problems with the language definition, and the solution
needs to come from the language vendor and be integrated thoroughly into
the core product.  I do not know any responsible MIS manager who is going
to risk a large-scale, long-term project on a third-party utility for a 
language which is itself considered leading-edge and somewhat risky.

What can we do to convince the Smalltalk vendors that this problem
merits significant, and immediate, attention?  I think the primary
reason that the vendors do not see the magnitude of the problem is that
most of the early adopters of the Smalltalk technology are single-person
shops or small prototyping groups in larger companies that only require
a tool that one person can use effectively.  But to define the customer
as one person working on his own is to miss out on the real mass-market
opportunity for Smalltalk: the hordes of large-team Cobol programmers
who weigh down most MIS shops.  Smalltalk as it stands today is not
a substitute for Cobol on very large projects, and that is a damn shame,
because it could and should be.


Will Estes          Internet: Will@cup.portal.com
                    UUCP: apple!cup.portal.com!Will

marti@mint.inf.ethz.ch (Robert Marti) (06/28/91)

In article <1991Jun26.193441.28581@bqnes74.bnr.ca> CWatts@BNR.CA
(Carl Watts) writes:
>Smalltalk was designed from the start to not have any problems
>scaling.  What other language do you now of that can as easily
>handle 2 + 3 as it can evaluate:
>958685745856745747465784756948574847584737384877584 +
>9293458823458723745734875847574874875748874374373646

How about
- Lisp
- Scheme (OK, so Scheme is a Lisp dialect)
- some implementations of Prolog
- Mathematica
- Maple?

If you have a C++ library which includes something like class arbint
in Tony L. Hanson's "The C++ Answer Book" (pp.276-308), you could
add the following member function to class arbint:

 arbint(char *string);  // convert string into a newly created arbint

so that you could at least write something like

"958685745856745747465784756948574847584737384877584" +
"9293458823458723745734875847574874875748874374373646"

Robert Marti                      |  Phone:    +41 1 254 72 60
Institut fur Informationssysteme  |  FAX:      +41 1 262 39 73
ETH-Zentrum                       |  E-Mail:   marti@inf.ethz.ch
CH-8092 Zurich, Switzerland       |

CWatts@BNR.CA (Carl Watts) (06/28/91)

"... the image is far too large a granularity at which to save changes
to a system..."

"... one and only impediment to wide-scale commercial use of the
Smalltalk language..."

"This whole area points to problems with the language definition ..."

My, my, my, Will, what an expert you've become on Smalltalk, the
problems with it, and what needs to be done considering that only two
days ago I had to explain to you what the difference between a String
and a Symbol was and how a Dictionary worked.

But the concerns you mentioned are very common concerns with neophyte
Smalltalk users.  I had many of the same problems when I first started
using Smalltalk 4 years ago.

One has to remember is that Smalltalk is a fundamentally different kind
of beast from languages like Fortan, COBOL, Pascal, C, C++, etc.
Smalltalk has a different point.  Comparing the two is like the
proverbial comparing of Apples and Oranges and lamenting the faults of
Oranges in that they have a thick outer covering you have to remove
before you can eat it and you can't make applesauce with them.

Smalltalk has much more in common with Lisp and APL.  Both languages
have virtual machines and the idea of an image (called a Workspace in
APL).

The problems you are lamenting with Smalltalk are not things that any
other language (that I can think of) integrates solutions into itself.
C is perhaps the classic example.  It has no integrated version control
or multi-user development environment support inherent in the
language.  It defines no granularity of change inherent in the
language.  The language itself has no inherent development support at
all.  All of these must be provided by outside applications (typically
in Unix if at all).

Smalltalk was meant to be its own operating system, language,
development environment, everything!  Thats why there is a "Virtual
Machine".    An image is not an inherent concept of the Smalltalk
language.  It is what is needed when operating under a foreign
operating system.  When Smalltalk was developed, no machine could
provide the support needed so a Virtual machine was built to emulate
the real machine.  The virtual image represents a snapshot of what the
memory of this virtual machine looked like when it was last running.
This is common in other languages like Lisp, APL, etc.

It almost makes me want to cry when someone says they want to turn
Smalltalk into something like C or Pascal; where you write canned
applications that run under a foreign operating system like Unix or
OS/2.  That was not the point of Smalltalk.  While that is the point of
something like C.  If thats what you want to do, then you can do it but
you will be pounding Smalltalk's round peg into a square hole.  You can
do it if you pound hard enough.  And considering the beauty of
Smalltalk compared to C, its still usually better than using C's square
peg.

This from someone whose department supported 50 developers working on
the same application, all in Smalltalk with less
multiple-developer-conflict problems than I would have thought possible
using any development environment.  This thanks to the encapsulation
that Smalltalk classes support so well, and to some small development
environment tools (based on ChangeSets) that we developed in Smalltalk
to manage many concurrent developers.  As Peter Goodall pointed out
there are several excellent commercial products available to provide
extremely sophisticated support for large multi-developer groups.  And
these are far more sophisticated than Unix and all the standard Unix
development tools combined.