[comp.lang.apl] Problem with )COPY

paul@moore.UUCP (Paul Maclauchlan) (04/20/89)

When trying to clean up a heavily used workspace on the system
we attempted to create a new "clean" version by:

1. )CLEAR, to make a fresh space for the new ws

2. )SYMB 3000, to match the symbol table size in the old ws

3. )COPY OLDWS

4. Assign QuadLX and QuadELX

5. )WSID OLDWS

6. )SAVE

This had been our "traditional" way of generating clean copies
of old workspaces on IPSA and Comshare's APL.

At step 3, we got an error message:

	NOT COPIED: VAR1 VAR2

This was unusual, the documentation claims this can only happen
if the active workspace was full, the symbol table was full or
the saved workspace was damaged.  We ruled out one and two, but
didn't know what to do about three.

Since the variables that weren't copied were small ('',QuadCR, etc)
we recreated them and continued.  Now we thought we had a clean,
new version of the original ws.

Out of curiousity about the error message, we tried the procedure
again.  This time starting with our second generation ws.  The same
thing happened, but this time with different variables not being 
copied.

We stopped, cleared the active ws and did another )COPY.  Same
error, different variables.

We tried QuadCOPY and received a result value of -3, "cannot copy
while name is defined as a label".  None of the variables share
names with labels in any of the functions.

Further experimenting revealed that if we eliminated step 2 of
our procedure, the )SYMB 3000, all of the variables were copied
correctly.

Our questions are as follows:

1. Why does setting the symbol table size affect the )COPY command?

	1a. Shouldn't matching the symbol table size to that of
	    the original workspace be the "right thing" to do?

	1b. Have we encountered some kind of limit to the size
	    of the symbol table?  The documentation claims the
	    Symbol-limit is 32767.

2. Are the )COPY and QuadCOPY commands reliable?

3. Is our procedure for "cleaning up" old workspaces appropriate?

	3a. Does it make a difference in system performance to
	    have a "clean" workspace?

	3b. Does it make a difference in performance to use a
	    workspace saved with a value assigned to QuadDM?

	3c. How are we supposed to fix a "damaged" workspace?


Any help we can get answering these questions will be appreciated.


Technical notes:
Some background, we are running STSC APL*PLUS/UNX 3.1.1 for the
Spectrix computer, under SCO XENIX Version 3.1.  The workspace we
are copying is in a Xenix file 176,000 bytes long.  We start APL
with a workspace size of 500,000 bytes.

-- 
--
.../Paul Maclauchlan
Moore Corporation Limited, Toronto, Ontario (416) 364-2600
paul@moore.UUCP  -or-    ...!uunet!attcan!telly!moore!paul
"And in the end, the love you take is equal to the love you make."/JL&PMcC'69

jaxon@uicsrd.csrd.uiuc.edu (04/21/89)

The only cleaning-up that such a procedure does is to remove the )SI state,
(You could )RESET), and to give all system variables their default values.

"Heavy use" of a library workspace only affects it when you )SAVE.

)COPY is THE most complex system command to implement (P.S., I claim to
have the fastest version for nested-array WSs)  Perhaps IPSA is "cleaning-up"
on CPU and I/O billings by encouraging its use.

jaxon@uicsrd.csrd.uiuc.edu (04/26/89)

There is no  )SYMB or )SYMBOL command in the ISO standard.
There is none in Unisys'  APLB.

In general, any physically limited resource ought to be made virtual.

In APLB we made the "binding" between a name and a value be represented
by an object in the virtual workspace.  The symbol table is just a function
that maps names into bindings, and the memory used to store or compute that
function is entirely its own concern.  Now the tokenizer (and other parts 
of our incremental compiler), and the system functions that take names
as arguments lookup names in the symbol table.  BUT what they get back is
a "binding cell", not a "symbol table index".

All the usual APL computations act on these binding cells:
    Assignments directly modify the "value" component.
    References read the current "value" component.
    Localization hides the old "value" somewhere and replaces it with NIL.
    
None of these incur a symbol table lookup, and none depend upon the
arrangement or size of the symbol table.  Knuth (vol. 1) describes a few
ways to expand and contract hash tables that are now safe to use.

The only wrinkle is deciding how long a "binding cell" object must be kept
alive.  Ordinary objects can be collected when there are no more references
in the live workspace.  But the symbol table always has a reference, so it
looks alive even when it's not.  The extra cost is not very significant, and
there are many ways to keep these costs out of high frequency control paths.
Basically wherever you delete a reference that may be a "binding cell",
if the new reference count is 1, and the value is NIL, then the symbol table
can forget about the binding cell, and it can be collected.  

)ERASE does not remove a binding cell from the symbol table unless there
are no more references to that cell (e.g. in function lines).

In APLB we managed to essentially eliminate symbol-table lookup cost using
these techniques.  But we did not provide "ambivalent user functions".  In
fact we are seriously opposed to the APL2 notation for ambivalent function
templates because their scheme requires a use of #NC (Name Class), which
costs 1 symbol table lookup, to determine the valence of the current call
(something which the interpreter just figured out a microsecond ago).  If 
you add in a collection of dyadic functions that have been modified to
protect themselves from IBM's silly extension by testing #NC 'LEFT' on
each line [1], then lookups become more frequent, and the implementer has
to find a very low cost resizeable hash scheme.

Isn't anyone angry about APL2-style ambivalence?

regards - greg jaxon -
jaxon@uicsrd.uiuc.edu

stripes@wam.UMD.EDU (04/26/89)

In article <575@moore.UUCP> paul@moore.UUCP (Paul Maclauchlan) writes:
>
>When trying to clean up a heavily used workspace on the system
>we attempted to create a new "clean" version by:

Does "clean" mean "no extranious globals", or internal junk?

>2. )SYMB 3000, to match the symbol table size in the old ws

I'm not shure what version of APL*PLUS/Unx you are running, but current
versions have a dynamic symbol table, the (but I think it only grows,
you have to shrink it yourself, and if you are adding ALOT of syms it's
faster [by a tad] to )SYMB the size yourself).

>1. Why does setting the symbol table size affect the )COPY command?

The author of the code (my Dad) was perplexed by this; the code has been
since re-written, I don't think the bug exists on new systems.
His only idea was that you re-sized the sym. table that somehow the
label bit remained set...

>	1a. Shouldn't matching the symbol table size to that of
>	    the original workspace be the "right thing" to do?

Not if the system will re-size on the fly for you.

>2. Are the )COPY and QuadCOPY commands reliable?

Yes, but they are not a placenta for real "WS DAMAGE" (sym table
damage).

>	3a. Does it make a difference in system performance to
>	    have a "clean" workspace?

Yes, you lose some.  Copy does not copy the P-code, so on the first execution
of each line it must be re-created.

>	3b. Does it make a difference in performance to use a
>	    workspace saved with a value assigned to QuadDM?

It shouldn't.

>	3c. How are we supposed to fix a "damaged" workspace?

Copy is the best method short of a disk editor and system level documation
(never created, let alone distributed).  As stated before it's not
fool-proof.
You should also try "Quad-VR Each Quad-NL 3 Quad-FAPPEND tie-number"
(each is shift-1, I think), ")CLEAR", and "Quad-DEF Each Quad-READ Each
tie-number,Each Iota #of-comps".

>Any help we can get answering these questions will be appreciated.

You are welcome, but let it be known that I am not an expert, I have never
seen most of the code.

>Technical notes:
>Some background, we are running STSC APL*PLUS/UNX 3.1.1 for the
>Spectrix computer, under SCO XENIX Version 3.1.

Hate to tell you this version is no longer supported, nor is the
Spectrix computer.
-- 
           stripes@wam.umd.edu             Disclamer:
      Josh_Osborne@Real_World,The           I no longer work for STSC, but my
      "Just another dyslexic porgramer"     Dad does....

stripes@wam.UMD.EDU (04/27/89)

>Isn't anyone angry about APL2-style ambivalence?
>
>regards - greg jaxon -
>jaxon@uicsrd.uiuc.edu

"Mad" isn't exactly the term I would use, "unhappy".  And I *think* it is an
ISO mis-feechure.  The APL*PLUS NARS had braces around the left argument
(in the defn of the fuction), and it as well as the current VM/SP version
had Quad-Monadic & Quad-Dyadic for determing how the function was invoked...

-- 
           stripes@wam.umd.edu
      Josh_Osborne@Real_World,The
      "The dyslexic porgramer"

prins@prins.cs.unc.edu (Jan Prins) (04/29/89)

In article <49700010@uicsrd.csrd.uiuc.edu>, jaxon@uicsrd.csrd.uiuc.edu writes:
 ... 
   (intelligent remarks concerning use and management of symbols 
    in APL interpreters)
 ...
> 
> In APLB we managed to essentially eliminate symbol-table lookup cost using
> these techniques.  But we did not provide "ambivalent user functions".  In
> fact we are seriously opposed to the APL2 notation for ambivalent function
> templates because their scheme requires a use of #NC (Name Class), which
> costs 1 symbol table lookup, to determine the valence of the current call
> (something which the interpreter just figured out a microsecond ago).  If 
> you add in a collection of dyadic functions that have been modified to
> protect themselves from IBM's silly extension by testing #NC 'LEFT' on
> each line [1], then lookups become more frequent, and the implementer has
> to find a very low cost resizeable hash scheme.
> 
> Isn't anyone angry about APL2-style ambivalence?

It sure seems hokey and, as you point out, seems to require run-time 
symbol-table lookups ... even if you do not want to use it!

A simpler way to support ambivalence in user-defined functions is to permit 
multiple (well, two or three) definitions for the same name, and use the 
interpreter to resolve which definition is invoked.  This requires no run-time 
overhead (modulo the usual caveats about changes in valence and name class
of functions during execution by redefinition -- something which APLB rightly 
treats as the exception rather than the rule and handles accordingly only as
it happens).

I'm sure I've seen some APL system that worked this way.  Or am I thinking of 
direct-definition?  I suppose the objection is that typical definitions
will look like something this:

   DEL Z <- PLUS B    (monadic defn)
[1] Z <- 0 PLUS B
   DEL 

   DEL Z <- A PLUS B  (dyadic defn)
[1] z <- A + B
   DEL

and one might be worried about the additional function call overhead.  I bet
it's not much worse than all the horsing around with #NC, and you could always
remove it by copying in the code from the called functions, I suppose (Yech).

So what APL system supports ambivalence this way?

Jan Prins  (prins@cs.unc.edu)

> jaxon@uicsrd.uiuc.edu (Hi Greg!)