[comp.sys.sgi] Weird Fortran i/o?

igraham@SMAUG.PHYSICS.MCGILL.CA (Ian Graham) (05/04/91)

Hello,

A user on our system was checking benchmarks for system i/o on 
IRIS/SUN/HP9000/DECSTATION and came up with the following quirky 
results: (THe code is given at the bottom)  THe program writes
100000 bunches of 4 integers into a direct access fie, rewinds, 
and reads 'em all back.   Here are the results for a 4D-25:

START-TIME         0.0100      0.0700
IO WRITE           8.9400      4.7600
IO READ           26.4600    353.0300 <<---!!!!!!

So what gives?  Why should it take 5 MINUTES to read some data written
in 5 SECONDS!   On a Decstation 3000 (same CPU chipset...)  the OPPOSITE 
is true -- it takes 5 minutes to WRITE the file, and only 5 seconds to 
READ it!  On a Sun-3 things are altogether slower, but again the write 
is much slower than the read:

IO WRITE         100.4800    320.0400
IO READ           50.7400      8.1800

Meanwhile  on an HP9000 (Snake!??!) it takes only 5 seconds to 
read and/or write.  Nice if you are heavily i/o bound and running 
Fortran (as is the case for this user.)   

Anyone know what's going on? 

HEre are the system specifics, as far as I know them, and the test code:

IRIS      4D-25,        IRIX 3.3.1
SUN    SUN3-180,        SUNOS3.5
DEC    Decsation 3100,  ??
HP     HP9000-720       ??

                PROGRAM QUICKTEST
C Date: Fri, 03 May 91 12:56:28 EDT
C From: "Dave Gilkinson, hero at large 398-6511 " <gilk@campy>
                REAL TARRAY(2)             
                A= DTIME(TARRAY)
                WRITE(6,100)TARRAY(1),TARRAY(2)
 100            FORMAT(' START-TIME ',F10.4,2X,F10.4)

                DO I = 1, 100000
                  WRITE(99) I,I+1,I+2,I+3
                ENDDO
                 B = DTIME(TARRAY)
                WRITE(6,130)TARRAY(1),TARRAY(2)
 130            FORMAT(' IO WRITE ', F10.4,2X,F10.4 )
                REWIND(99)                                  

               DO I = 1, 100000
                  READ(99)K,L,M,N
                ENDDO
                 B = DTIME(TARRAY)
               WRITE(6,140)TARRAY(1),TARRAY(2)
 140            FORMAT(' IO READ ', F10.4,2X,F10.4 )
                STOP
                END            


    ___________________________________________ Ian Graham ______________
						igraham@physics.mcgill.ca
					        Tel: (514) 398-6526
					        Fax: (514) 398-8434

dhinds@elaine18.Stanford.EDU (David Hinds) (05/04/91)

In article <9105031951.AA02478@smaug.physics.mcgill.ca> igraham@SMAUG.PHYSICS.MCGILL.CA (Ian Graham) writes:
>
>A user on our system was checking benchmarks for system i/o on 
>IRIS/SUN/HP9000/DECSTATION and came up with the following quirky 
>results: (THe code is given at the bottom)  THe program writes
>100000 bunches of 4 integers into a direct access fie, rewinds, 
>and reads 'em all back.   Here are the results for a 4D-25:
>
>START-TIME         0.0100      0.0700
>IO WRITE           8.9400      4.7600
>IO READ           26.4600    353.0300 <<---!!!!!!
>
>So what gives?  Why should it take 5 MINUTES to read some data written
>in 5 SECONDS!

    I don't know why reads in particular are so slow, but my experience
was that ALL fortran I/O was really slow.  I didn't make detailed timings,
but I remember that one program which took about an hour to read a large
unformatted file ran in about 3 minutes when I replaced a single Fortan
READ statement with a C function to read a short string of bytes.

 -David Hinds
  dhinds@cb-iris.stanford.edu

shc@fred.UUCP (Steve Couturie) (05/04/91)

Well,:
Re: Mail from Ian Graham --

  Re weird Fortran I/O times on IRIS 4D/25:
> 
> START-TIME         0.0100      0.0700
> IO WRITE           8.9400      4.7600
> IO READ           26.4600    353.0300 <<---!!!!!!
> 

where the columns are user time and system time, respectively:

I tried the test code, and sure enough, it's weird.  BUT:
a small modification makes it perform very well indeed:

Replace the REWIND with CLOSE.  Here are timings for 4 machines:

[376]fred:shc/junk> ./quickc			# 4D/20G, 16M mem, BUSY
 START-TIME       0.00        0.05
 IO WRITE         8.97        1.08
 IO READ         13.02        1.42

[21]dino:/tmp> ./quickc				# 4D/25TG, 64M mem, idle
 START-TIME       0.00        0.03
 IO WRITE         5.09        0.58
 IO READ          7.62        0.88

[25]bambam:/tmp> ./quickc			# 4D/240GTX, 56M mem, idle
 START-TIME       0.00        0.02
 IO WRITE         3.66        0.39
 IO READ          5.63        0.28

[103]lhdsy1:shc/junk> ./quickc			# DEC 5400 Server, 64M mem, ?
 START-TIME       0.01        0.04
 IO WRITE        13.69       74.98
 IO READ          7.53        2.24


A 4D/25 runs a bit slower (20 MHz vs. 25 MHz) CPU than does the 5400 or 
4D/240, while the 4D/20 is at 12.5 MHz.  The 20 and 25 are using SCSI
disks; the 240 is using ESDI's; the 5400 is using SCSI (I think).
The 20 and 240 are at IRIX 3.3.2, the 25 at IRIX 3.3.1, the 5400 at 
Ultrix 4.1.

I can only surmise that the IRIX Fortran I/O library is not, well, perfect.
---------------------- :-)
I also note that the READ's still take a bit longer on the IRISes than 
the writes; note that the system time has all but vanished, though.
Seems like a bug to me.

I'd be interested in any comments from the SGI/Mtn.View folks....

*****************************************************************************
Below is the modified code:


                PROGRAM QUICKC
C Date: Fri, 03 May 91 12:56:28 EDT
C From: "Dave Gilkinson, hero at large 398-6511 " <gilk@campy>
C
C Modified by Steve Couturie, COFRC.

                REAL TARRAY(2)             
                A= DTIME(TARRAY)
                WRITE(6,100)TARRAY(1),TARRAY(2)
 100            FORMAT(' START-TIME ',F10.2,2X,F10.2)

                DO I = 1, 100000
                  WRITE(99) I,I+1,I+2,I+3
                ENDDO
                 B = DTIME(TARRAY)
                WRITE(6,130)TARRAY(1),TARRAY(2)
 130            FORMAT(' IO WRITE   ', F10.2,2X,F10.2 )
C                REWIND(99)                                  
		CLOSE(99)

               DO I = 1, 100000
                  READ(99)K,L,M,N
                ENDDO
                 B = DTIME(TARRAY)
               WRITE(6,140)TARRAY(1),TARRAY(2)
 140            FORMAT(' IO READ    ', F10.2,2X,F10.2 )
                STOP
                END            

******************************************************************************

-- 
Steve Couturie                      Voice:     (213) 694-9332	
Chevron Oil Field Research Co.      FAX:       (213) 694-7228
P.O. Box  446                       Internet:  shc@chevron.com
La Habra, CA  90633-0446            UUCP:      ...!uunet!lhdsy1!shc

calvin@dinkum.wpd.sgi.com (Calvin H. Vu) (05/07/91)

In <9105040132.AA10813@fred> shc@fred.UUCP (Steve Couturie) writes:

| Well,:
| Re: Mail from Ian Graham --
| 
|   Re weird Fortran I/O times on IRIS 4D/25:
| > 
| > START-TIME         0.0100      0.0700
| > IO WRITE           8.9400      4.7600
| > IO READ           26.4600    353.0300 <<---!!!!!!
| > 
| 
| where the columns are user time and system time, respectively:
| 
| I tried the test code, and sure enough, it's weird.  BUT:
| a small modification makes it perform very well indeed:
| 
| Replace the REWIND with CLOSE.  Here are timings for 4 machines:
| 
| [21]dino:/tmp> ./quickc				# 4D/25TG, 64M mem, idle
|  START-TIME       0.00        0.03
|  IO WRITE         5.09        0.58
|  IO READ          7.62        0.88
| 
| [103]lhdsy1:shc/junk> ./quickc			# DEC 5400 Server, 64M mem, ?
|  START-TIME       0.01        0.04
|  IO WRITE        13.69       74.98
|  IO READ          7.53        2.24
| 
| 
	I ran the original unmodified program with the IRIX 4.0 release on a
4D/25 and got the following results:

 START-TIME     0.0000      0.0400
 IO WRITE     4.9300      0.2300
 IO READ     8.4100      0.5200

	So writing is slightly better than 3.3.1 and reading is
markedly better than 3.3.1 (but still not as fast as doing a CLOSE instead 
of a REWIND).

| I'd be interested in any comments from the SGI/Mtn.View folks....
| 
	It's a matter of second-guessing a user's intention whether he wants
    all readings or/and then all writings on a file or interspersed 
    reading/writings.  Depending on the guess is correct or not, some 
    implementation will be better than the others in a particular situation.
    This is a case where there is no perfect solution. It's just a matter
    of trying to get the best probability from the most general situations.

	The direct unformatted read bug has been fixed as you see in
    the timing result above.  [It was only a few days late in making
    the 3.3.2 maintenace release schedule :-(].

	As to why reading is still about 60% slower than writing I don't
    have the answer for you right now.  How about the next performance 
    improvement after the 4.0 release %-|.

	Doing a CLOSE when switching between READ and WRITE mode on the
    same file is a good strategy from the user so every time the file is
    opened only one type of I/O operation is done.  It should give the
    best performance available on the plaform when all readings are done
    at the same time and then all writings or vice versa.  Mixing 
    READ/WRITE/BACKSPACE/REWIND (as in transactional type of I/O 
    where people don't read the whole data file into a giant 10
    zillion byte array) would make a lot of systems hiccup in terms of
    performance.   For example, try the following benchmark:

	real tarr(2)
	real tend(2)
	start = etime(tarr)
	do 10 i=1,5000
	write(10) f, g
	rewind(10)
10	continue
	print *, "REWIND/WRITE loop: elapsed time = ",
	1 etime(tend)-start, tend(1)-tarr(1), tend(2)-tarr(2)


	start = etime(tarr)
	do 15 i=1,5000
	read(10) f, g
	rewind(10)
15	continue
	print *, "REWIND/READ loop: elapsed time = ",
	1 etime(tend)-start, tend(1)-tarr(1), tend(2)-tarr(2)

	open (2,file='scr8434.dat', access = 'direct',
     x  form='unformatted', recl=20)
	i1=1
	i2=2
	i3=3
	i4=4
	i5=5
	do 20 i=1,10
20	write(2, rec=i)i1, i2, i3, i4, i5
	start = etime(tarr)
	do 30 i=1,500
	do 40 j=1,10
	read(2, rec=j)i1, i2, i3, i4, i5
40	write(2, rec=j)i1, i2, i3, i4, i5
30	continue
	print *, "READ/WRITE loop: elapsed time = ",
	1 etime(tend)-start, tend(1)-tarr(1), tend(2)-tarr(2)
	end

	This is the timing result for the benchmark above on the 4D/25
    with 4.0 release:

 REWIND/WRITE loop: elapsed time =    4.350000       1.070000    
    3.280000    
 REWIND/READ loop: elapsed time =    2.600000      0.7099999       1.890000    
 READ/WRITE loop: elapsed time =    6.400000       1.720000       4.680000    


	Anyway, we did a minor overhaul on the I/O library for 4.0 release
    for many I/O operations and performance is much better now.  In 
    some cases, it could be as much as 50 times faster.  I created a 
    general I/O benchmark and got the 4.0 results for it which showed 
    super-turbo speed compared to 3.3 release.  But I assume it's proprietary
    so I can't really post it here :-(.   You just have to use your own 
    application/benchmark later and try it out I guess.

| -- 
| Steve Couturie                      Voice:     (213) 694-9332	
| Chevron Oil Field Research Co.      FAX:       (213) 694-7228
| P.O. Box  446                       Internet:  shc@chevron.com
| La Habra, CA  90633-0446            UUCP:      ...!uunet!lhdsy1!shc

- calvin
--
-----------------------------------------------------------------------------
Calvin H. Vu			   | "We are each of us angels with only one
Silicon Graphics Computer Systems  | wing.  And we can only fly embracing
calvin@sgi.com   (415) 962-3679	   | each other."