[net.math.stat] S seems incredibly slow...

dan@rna.UUCP (Dan Ts'o) (11/17/84)

Hi,
	S on our 11/750 - 4.2BSD seems very slow. I know that this situation
can be improved by binding commonly used functions into S's executive. How
much of an improvement can reasonably be expected ? Any hints as to which
functions are good candidates ?
	Here are two S examples which take 10-30 minutes to execute:

	s _ matrix(0, 20, 20)
	for (i in 1:20) {
		for (j in 1:20) {
			s[i,j] _ sin(j/10)
		}
	}

and
	m _ matrix(read("/tmp/junk"), 100, 100)		#read 100x100 matrix

	We are beginning S users and are thus naive about the "best" way of
doing things. Thanks.


					Cheers,
					Dan Ts'o
					Dept. Neurobiology
					Rockefeller Univ.
					1230 York Ave.
					NY, NY 10021
					212-570-7671
					...cmcl2!rna!dan

mikem@uwstat.UUCP (11/18/84)

> Hi,
> 	S on our 11/750 - 4.2BSD seems very slow. I know that this situation
> can be improved by binding commonly used functions into S's executive. How
> much of an improvement can reasonably be expected ? Any hints as to which
> functions are good candidates ?

We have EVERYTHING (except some device drivers) in the executive.  This
creates a big executive, but I "feel" a substantial speedup.  I don't
have any hard figures on this.  With virtual memory under 4.2 I see no
reason to have any stand alone functions, except for testing purposes.
Does anyone disagree with this?

-- 

Mike Meyer --  Phone (608) 262-1157

EASY ARPA:	mikem@statistics
CORRECT ARPA:	mikem@wisc-stat.arpa
UUCP	...!{allegra,ihnp4,seismo,ucbvax,
	     pyr_chi,heurikon,uwm-evax}!uwvax!uwstat!mikem

hubert@entropy.UUCP (Steve Hubert) (11/27/84)

I tried the first example S command on our 750 running 4.2.  It took
6 minutes of elapsed time which was about 3 or 4 minutes
of actual cpu time.  Our S is configured to be as much in core
as possible, i.e, I used the biglist when compiling it.  The
load average was only about 2 when I tried this.

Steve Hubert
 Dept. of Stat., U. of Wash, Seattle
 {allegra,decvax,ihnp4,ucbvax!lbl-csam}!uw-beaver!entropy!hubert
 hubert%entropy@uw-beaver

koenker@uiucuxc.UUCP (11/30/84)

On example one try:  s_sin((1:20)%o(1:20))  This should help.
S for loops are notoriously slow and every effort should
be made to find ways toexploit S's natural looping constructs.
The operator %o is a special case ofthe general function outer.
See also apply and sapply for useful functions.

				Roger Koenker
				Department of Economics
				University of Illinois

jiml@uwmacc.UUCP (Jim Leinweber) (12/03/84)

On a lightly loaded VAX 11/780 running 4.2 BSD, with nearly everything
in the S executive,

	write(1:10000, "/tmp/junk")

took about 25 seconds of user+system time (2 minutes real); and a
shortly subsequent

	m <- matrix( read("/tmp/junk"),100,100)

took about 20 seconds of user+system time.  On the other hand,

	for (i in 1:20) for (j in 1:20) m[i,j] <- sin(j/10)

was so slow I didn't wait for it to finish.  I'm not an expert on the
bowels of S, but a cursory glance at $M/lang3.yr shows that each
occurence of `<-' invokes $F/assign, which in turn calls $L/getds,
$P/pcopy, and $L/putds.  Thus one problem with the nested for loop is
that assignment is very expensive in S; apparently each assignment
copies a dataset from one file to another!  Doing O(n^2) complete file
rewrites to initialize a matrix is *bound* to be slow.

Fortunately, S has extensive resources for avoiding loops. Two examples
that spring to mind, taking perhaps .3 and .1 seconds respectively:

    m <- matrix(0,20,20);    m <- sin( col(m)/10)

    m <- matrix( sin(1:20/10), 20,20, byrow=T)

			Jim Leinweber

UUCP:  ...!{allegra,ihnp4,seismo,...}!uwvax!uwmacc!jiml
ARPA:  uwmacc!jiml@wisc-rsch.arpa
POST:  MACC, UW-Madison, 1210 W. Dayton St., Madison, Wi, 53706