[comp.lang.misc] flexible arrays

mart@euteal.UUCP (Mart van Stiphout) (01/31/89)

Some time ago I initiated a discussion on the Turing Language syntax,
mainly the loops and list parts.
In a reply to one of my critics, I suggested some more support in
high level programming languages for list processing.
Some other replyer suggested me to use Lisp.
Well, I don't want to use Lisp but I have some ideas on
list and I would like to know what other people think of them.

My main point is: I REALLY HATE LISTS.
There are several reasons why I hate lists (and then I mean
lists I can use in languages like Pascal, Modula or C; not Lisp):

1. the use of lists requires me to create structures with pointers
   to the next structure (record) of the same type and I think
   this is a rather trivial action that I don't wish to repeate
   for every list I create.
2. I have to allocate memory (new, malloc) for every new list element.
   How tedious.
3. Adding, deleting or doing anything else than processing the entire
   list leads to error prone pointer manipulations that I don't want to
   use.
4. accessing elements in the list is costly because I have to scan
   the entire list until I find the element I want.

What I really like are arrays. They have numerous advantages:
1. they can be processed easily.
2. elements can be accessed randomly in a cheap manner.
3. you don't have to mess around with pointers.
4. you don't have to allocate memory.

The main disadvantage I can think of is the often unknown array
length.  Oh yes, and some dumm languages
(Pascal) don't allow me to dynamically allocate arrays.
The C programming language provides me with realloc to lengthen
arrays if necessary.

I hereby propose to do away with linear linked lists.
Instead we should use flexible arrays or sparse arrays.
A sparse array can be thought of as a linked list but not implemented
as one.
The only thing I have to do is to declare a flexible array.
The system does:
1. allocation of new array elements as soon as I refer to a non-existing
   one.
2. efficient storage of the used array elements: for example by some kind
   of dynamic hashing technique.

The result could be something like the arrays you can use in the
AWK programming language. These are very flexible and easy to use.
The advantages are: no messing around with pointers, no memory
trouble, easy acessing, easier to read programs
and all this at the cost of (I think) a reasonable amount op cpu time.

Please post yout comments,

Mart van Stiphout
Email: mart@euteal.eutrc3.hp4nl.mcvax.uucp

gary@milo.mcs.clarkson.edu (Gary Levin) (02/01/89)

In article <21@euteal.UUCP> mart@euteal.UUCP (Mart van Stiphout) writes:
   Some time ago I initiated a discussion on the Turing Language syntax,
   mainly the loops and list parts.
   ...
   What I really like are arrays. They have numerous advantages:
   1. they can be processed easily.
   2. elements can be accessed randomly in a cheap manner.
   3. you don't have to mess around with pointers.
   4. you don't have to allocate memory.
   ...
   I hereby propose to do away with linear linked lists.
   Instead we should use flexible arrays or sparse arrays.
   A sparse array can be thought of as a linked list but not implemented
   as one.
   The only thing I have to do is to declare a flexible array.
   The system does:
   1. allocation of new array elements as soon as I refer to a non-existing
      one.
   2. efficient storage of the used array elements: for example by some kind
      of dynamic hashing technique.

   The result could be something like the arrays you can use in the
   AWK programming language. These are very flexible and easy to use.
   The advantages are: no messing around with pointers, no memory
   trouble, easy acessing, easier to read programs
   and all this at the cost of (I think) a reasonable amount op cpu time.

There are existing, readily available languages that provide what you
describe.  SNOBOL tables can be indexed by objects of ANY type, and
grow dynamically at need.  Icon provides both tables and lists in just
the sense that you described, even permiting you to insert or delete
elements, adjusting the indices accordingly.

The language SETL also provided this power in terms of tuples (origin
1 arrays, adjustable length) and smaps (sets of ordered pairs to
represent arbitrary mappings).  I have implemented much of SETL as
ISETL, which is an interactive version that runs under Unix, VMS,
MS-DOS, and MacIntosh systems.  Those interested in ISETL can FTP
source from clutx.clarkson.edu in directory pub/ISETL, or you can
write to me.  If you FTP the source, just drop me a note listing your
school and the type of machines that you intend to run it on.  I am
trying to keep track of such details.
--
Gary Levin/Dept of Math & CS/Clarkson Univ/Potsdam, NY 13676/(315) 268-2384
BitNet: gary@clutx   Internet: gary@clutx.clarkson.edu

gateley@m2.csc.ti.com (John Gateley) (02/01/89)

In article <21@euteal.UUCP> mart@euteal.UUCP (Mart van Stiphout) writes:
>[...]
>In a reply to one of my critics, I suggested some more support in
>high level programming languages for list processing.
>Some other replyer suggested me to use Lisp.
>Well, I don't want to use Lisp but I have some ideas on
>list and I would like to know what other people think of them.

I do not know why you don't want to use Lisp, but it has the data
abstractions you want to use (the flexible arrays you described below).

>My main point is: I REALLY HATE LISTS.
>[...]
>1. [Lists use pointers]
>2. [Lists require allocation of memory]
>3. [Pointer manipulations are error prone]
>4. [Accessing the nth element takes n units of time]

The only thing I can say is: use a nicer language. Lisp, for example,
hides the use of pointers, so that unless you are Joe Lisp Programmer,
you are not going to be worried about pointers or pointer manipulations.
Allocation is easy and convenient. Accessing the nth element is not
something normally done with lists. 

>
>What I really like are arrays. They have numerous advantages:
>1. they can be processed easily.
>2. elements can be accessed randomly in a cheap manner.
>3. you don't have to mess around with pointers.
>4. you don't have to allocate memory.

You don't mention the primary disadvantage of arrays: they are
second class data structures: you cannot return them as the result
of a function in most languages. You cannot write a function in
Pascal which allocates and returns an array for use (without using
Pascal's lists). A lot of the things you hate about lists are caused
by the fact that they are first class citizens (you can return a
pointer to a list).

>
>The main disadvantage I can think of is the often unknown array
>length. [...]
>I hereby propose to do away with linear linked lists.
>[...]
>Mart van Stiphout
>Email: mart@euteal.eutrc3.hp4nl.mcvax.uucp

A list is a data structure which is easy to change the size of. An array
is a data structure which it is easy to access the nth element. Both of
these data structures are easy to implement and easy to use (especially
if you use Lisp's lists). What you are suggesting has the 'best' features
of both. It also has the costs of both.

Where a normal array reference takes a couple of instructions: an index
computation and a load, what you are suggesting takes much more. First you
must determine if the array index is in bounds. Second, if it isn't you must
reallocate, which requires copying or rehashing (or slower lookups). Then
lookup may be slower, depending on your method of storing arrays. Even
hashing is many times slower than the index computation and load.

List performance is also degraded if a list is implemented as a `dynamic'
array. Lists are (almost) always processed sequentially, this is even
faster than an array reference: load indirect from a pointer.

Many problems are better thought of as `list' problems than `array'
problems: reversing the order of a list is 'easier' than reversing
the order of an array (you reverse the pointers instead of copying the
elements), or appending two lists (this just requires a single pointer
change). Dynamic arrays would eliminate these benefits of using lists.

I do not think the `dynamic' arrays you like are a bad idea, but I do
think getting rid of lists in favor of them is bad since that incurs
a very large performance penalty.

John
gateley@tilde.csc.ti.com

marc@hpfcdc.HP.COM (Marc 'Sphere' Sabatella) (02/01/89)

/ hpfcdc:comp.lang.misc / mart@euteal.UUCP (Mart van Stiphout) / 10:57 am  Jan 30, 1989 /

>What I really like are arrays. They have numerous advantages:
>1. they can be processed easily.

This is subjective.

>2. elements can be accessed randomly in a cheap manner.

This is not always a necessity - for many applications, sequential access is
sufficient.

>3. you don't have to mess around with pointers.
>4. you don't have to allocate memory.

There are disadvantages too.  Say you have an array of 1000 elements, and you
want to insert a new one at position 10.  You have to copy 990 elements to
do this (OK, you could move the first 10 down, if your language supports the
memory management necessary).  Unless you use pointers, this means moving 990
objects which may be of arbitrary size.

Even appending to an array via realloc() may be expensive if it has to copy the
old data into the new block.

The pain of using linked lists is lessened by Lisp, which does all the right
allocation, 'next' (cdr) pointer, etc. for you.  However, you can right a
fairly nice linked list manager in C, Pascal, or Ada that will make them
almost as painless, and certainly more efficient than arrays for many tasks.

djones@megatest.UUCP (Dave Jones) (02/01/89)

From article <GARY.89Jan31114300@milo.mcs.clarkson.edu>, by gary@milo.mcs.clarkson.edu (Gary Levin):

...

>    I hereby propose to do away with linear linked lists.

   Anybody wanna check that bathwater for babies before we
   throw it out?  BTW, "linear" and "list" seem redundant.

>    Instead we should use flexible arrays or sparse arrays.

I've got plenty of "container-class" packages in C and C++.
Some are based on links, some are "flexible", as the application
requires.  All have caching memory-allocation mechanisms which
are much more efficient than raw malloc().

(BTW, Pascal's type-system would make this somewhere between
 tedious and impossible.)


	  List             ... linked lists
          Hash             ... hash-tables with switchable
                                 ("virtual") hash and equiv function ptrs
          Avl              ... avl-trees
          PQ               ... priority queue (array, balanced heap)
          Fifo             ... queue
          Lifo             ... stack
          Vbuf             ... flexible byte-buffer
          Flex             ... flexible array
          Dig              ... directed graph

robison@m.cs.uiuc.edu (02/01/89)

>My main point is: I REALLY HATE LISTS.

Anyone prejudiced towards arrays and lists should look at
the Smalltalk language and its `collection' class.  The internal
representation (array vs. links) should be separate from what
it's representing (set vs. indexed tuple vs. linear sequence).
The programmer should first decide what operations are necessary
and then pick an implementation.  Smalltalk provides quite a
few kinds of collections, such as : sets, multisets, indexed collections,
linear lists, and hash tables to name a few.  Corresponding operations
in different kinds of collections have the same name, 
so the implementation can be changed easily.

> Instead we should use flexible arrays or sparse arrays.

A language should not provide any arrays or lists, except
that provided by common hardware.  All other data-structures
should be defined by libraries.  Why limit the user to
two particular representations of a collection, when there 
are lots out there, each with its own efficiency niche.  
E.g. balanced trees, 2-3 trees, red-black trees, tries, 
arrays, hash tables, B-trees, linked-lists, and bit-vectors.

To summarize, languages should let the user make the list vs. array
decision last, and not limit the choice to just those two.

- Arch

hafer@infbs (Udo Hafermann) (02/01/89)

Two powerful, interactive array handling languages (they MUST have been
mentioned) are NIAL (developed, I believe, in Ontario, Canada) and APL
(sometimes even read as Array Processing Language).  For both, PC
implementations exist.

ken@aiai.ed.ac.uk (Ken Johnson) (02/02/89)

In article <21@euteal.UUCP> mart@euteal.UUCP (Mart van Stiphout) writes:
>
>I hereby propose to do away with linear linked lists.
>Instead we should use flexible arrays or sparse arrays.
>A sparse array can be thought of as a linked list but not implemented
>as one.

An obvious follow up question (which doubtless everyone else knows the
answer to): Are there data structures that give the same operations as
linked lists but which are more efficient for random access and for
operations like sorting?

Please reply by e-mail, there is so much stuff in this group that I
can't read all of it.