[comp.arch] File and other disk caches

johnl@iecc.cambridge.ma.us (John R. Levine) (02/03/91)

In article <PCG.91Feb1143600@teachk.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>[OS 360 usually has a buffer pool per open file]
>In way of principle (and depending on the access method, for example
>VSAM is completely different, and more SunOS 4/SVR4 like) OS does not
>have quite a filesystem in the Unix sense, it does not even have logical
>file IO; files are actually slices of a disk (each "file" can have a
>different sector size, and sector size is not invisible to programs!)
>and reading and writing is done via library subroutines that issue
>physical (well, not quite) IO operations.

The last time I looked, the regular access methods actually wrote channel
programs that they handed to the operating system to run.  The OS ran them
unmodified other than prefixing some commands that limited their ability
to seek outside of the range of the file.  From a modern viewpoint, this
seems pretty awful, yet the 360 had high enough level hardware I/O that it
was easy and common to write applications that could equally well read or
write cards, disks, or tapes.  I suppose that under MVS, since the I/O is
to physical rather than logical addresses, the OS has to map the CCW memory
addresses which does require a little extra work, though to do so OS need
only have the dimmest idea of what the channel program is doing.

>In this environment a central shared buffer cache does not make any
>sense; every time you open a file you say how many buffers you want to
>assign to this particular opening of that file, and so on.

Sure it does, but not necessarily under operating system control.  IBM disk
controllers commonly have 128MB of cache RAM in the controller itself.  I
gather that such cacheing is quite effective.  Program buffering is under
control of the program; it's my impression that most programs accept the
default of 3 buffers per file.

People with reasonably long memories will recall that TOPS-10 had per-file
buffers in the program's address space as well.  TOPS-10 was nowhere near
as spiffy as the more advanced TOPS-20 which managed all disk I/O through
the pager, but if your program would run under TOPS-10, it'd run a heck of
a lot faster there.

-- 
John R. Levine, IECC, POB 349, Cambridge MA 02238, +1 617 864 9650
johnl@iecc.cambridge.ma.us, {ima|spdcc|world}!iecc!johnl
" #(ps,#(rs))' " - L. P. Deutsch and C. N. Mooers

pcg@cs.aber.ac.uk (Piercarlo Grandi) (02/05/91)

On 2 Feb 91 20:21:23 GMT, johnl@iecc.cambridge.ma.us (John R. Levine) said:

johnl> In article <PCG.91Feb1143600@teachk.cs.aber.ac.uk>
johnl> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

pcg> [OS 360 usually has a buffer pool per open file] In way of
pcg> principle (and depending on the access method, for example VSAM is
pcg> completely different, and more SunOS 4/SVR4 like) OS does not have
pcg> quite a filesystem in the Unix sense, it does not even have logical
pcg> file IO; files are actually slices of a disk (each "file" can have
pcg> a different sector size, and sector size is not invisible to
pcg> programs!)  and reading and writing is done via library subroutines
pcg> that issue physical (well, not quite) IO operations.

johnl> The last time I looked, the regular access methods actually wrote
johnl> channel programs that they handed to the operating system to run.
johnl> The OS ran them unmodified other than prefixing some commands
johnl> that limited their ability to seek outside of the range of the
johnl> file. From a modern viewpoint, this seems pretty awful,

Precisely what I had hinted at -- that's physical io done via libraries
("access methods"). But I had wanted to spare the distinguished

johnl> I suppose that under MVS, since the I/O is to physical rather
johnl> than logical addresses, the OS has to map the CCW memory
johnl> addresses which does require a little extra work,

You suppose (more or less) right. That is what also VM/370 does with the
channel programs that operate on virtual disks etc.

pcg> In this environment a central shared buffer cache does not make any
pcg> sense; every time you open a file you say how many buffers you want to
pcg> assign to this particular opening of that file, and so on.

johnl> Sure it does, but not necessarily under operating system control.
johnl> IBM disk controllers commonly have 128MB of cache RAM in the
johnl> controller itself.

Ahem, ahem. It does not -- the controller cache is prevalent on
controllers that support devices that have fixed block sizes, designed
to support VSAM. On other devices, the controller is actually a fairly
powerful mincomputer that does all sort of funny tricks. In particular
they do track caching.

A modern (and some old) 370 is internally a kind of dataflow machine,
externally a loosely coupled network of fairly powerful standalone
processors (just to give an example, the channels of some recent 370s
were actually PC/RTs, just like the DEC10s had PDP-11s as their
chennels).

johnl> I gather that such cacheing is quite effective.

A john Smith has published fairly conclusive evidence that the 370 cache
in the controller is not a very good idea, especially for disks that
support paging and temporary files, where LRU is not prevalent. Much
better to put the memory in the machine and use VSAM throughout.

johnl> Program buffering is under control of the program; it's my
johnl> impression that most programs accept the default of 3 buffers per
johnl> file.

Well, yes, but only because nobody bothers, it's like people who buy
Unix workstations and run them with the generic kernel and manufacturer
defaults for tunables, without reconfiguring, and thus losing a lot of
performance.

johnl> People with reasonably long memories will recall that TOPS-10 had
johnl> per-file buffers in the program's address space as well.  TOPS-10
johnl> was nowhere near as spiffy as the more advanced TOPS-20 which
johnl> managed all disk I/O through the pager, but if your program would
johnl> run under TOPS-10, it'd run a heck of a lot faster there.

Another tragedy of not recognizing that LRU is not good for file
accesses. It is an interesting research topic to develop a metpager that
detects maybe dynamically the type of access pattern that each program
does on each segment and then instructs the pager accordingly. For
example of a large indexed file most programs would access its pages
LRU, while the system dumper would access it FIFO, and so on.

As to computer hw architecture, I remember an article by Dijkstra where
he advanced the suggestion that it would be useful not just to have to
have a page fault when the referenced page is not in the working set,
but also an indication of the faults if the working set were smaller or
larger by a page than it is... Not easy in practice to implement, but an
interesting idea.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

jackk@shasta.Stanford.EDU (jackk) (02/05/91)

In article <PCG.91Feb4164856@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>
>A modern (and some old) 370 is internally a kind of dataflow machine,
>externally a loosely coupled network of fairly powerful standalone
>processors (just to give an example, the channels of some recent 370s
>were actually PC/RTs, just like the DEC10s had PDP-11s as their
>chennels).
>Piercarlo Grandi                   

This is not correct. Both the channel engine and the ROMP
microprocessor were derivatives of the Yorktown 801 architecture,
however the implemantations are very different. The channel engine was
implemented in ECL in a Thermal Conduction Module, while the ROMP
was a single chip CMOS microprocessor. The cycle times were very
different.

Jack

pcg@cs.aber.ac.uk (Piercarlo Grandi) (02/06/91)

On 5 Feb 91 01:19:16 GMT, jackk@shasta.Stanford.EDU (jackk) said:

jackk> In article <PCG.91Feb4164856@odin.cs.aber.ac.uk>
jackk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

pcg> A modern (and some old) 370 is internally a kind of dataflow
pcg> machine, externally a loosely coupled network of fairly powerful
pcg> standalone processors (just to give an example, the channels of
pcg> some recent 370s were actually PC/RTs, just like the DEC10s had
pcg> PDP-11s as their chennels).

jackk> This is not correct. Both the channel engine and the ROMP
jackk> microprocessor were derivatives of the Yorktown 801 architecture,
jackk> however the implemantations are very different. The channel engine was
jackk> implemented in ECL in a Thermal Conduction Module, while the ROMP
jackk> was a single chip CMOS microprocessor.

Thanks for the correction. I had believed the 801 had gotten
reimplemented twice more or less identically as the channel and the
workstation. The DEC-10s instead literally had stock PDP-11s housed in
their bays.

jackk> The cycle times were very different.

I can easily imagine! A 370 style channel has to be FAAAAAAAAST. Sure
all the guys that bought the PC/RT would have loved instead having an
ECL TCM implementation instead of the fairly anemic CMOS implementation
of the ROMP. Except for the water cooling, that is :-).
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

rbe@yrloc.ipsa.reuter.COM (Robert Bernecky) (02/09/91)

I don't think any of the /370 channels I have seen have been water-cooled.
The CPUs proper and cache, vector facility, are typically water-cooled,
but I can't recall seeing any coolant tubes on 308x or 3090-class machines.

They don't run that fast -- maximum transfer rate on most of them is either
3e6 bytes/sec or 4.5 bytes/sec. They also tend to overrun on tight data
chaining (scatter-read, gather-write type operations to dasd, for example),
which doesn't say much for their speed. When I was working at I.P. Sharp
on SHARP APL, we gave up on data chaining in mid-record because of the 
inability of the /370 channel to keep up with things. Chaining in an
interrecord gap was OK because you had about 100 or so byte times to
fetch the next channel command word(CCW). This technique effectively
stopped working in the mid-1970's when ibm announced the 370/145.

Still, the system does hum right along: We had sites with concurrent
user load in excess of 1200, offering trivial response time of about
.1 seconds - this on a general-purpose, interpretive APL system!

Now, whoever was it that was saying that compiler writers (interpreter
writers are a related breed) don't know how to exploit hardware?

Bob Bernecky
Snake Island Research Inc.

ps: I see that the origin of all this was something about disk caches.
A few other points: 

a. IBM dasd controllers cache in the controller, which is connected to the
   cpu by that dinky 3 mbyte/sec pipe. The main advantage of dasd cache is
   reduced seek and rollaround delays. If you are reading a lot of data
   at once, the cache advantage becomes minimal. If you're reading 80
   bytes(ibm still believes in card images...) off a 50kbyte track, then
   the cache will help a lot.

b. The access methods allow cacheing sort of, sometimes, if you're a
   sequential access fan, by letting you specify the number of buffers
   for that file. No more than 255 buffers, please. 2 is the default.
   Non-sequential files probably don't cache, but I never used those
   access methods anyway -- for performance, we rolled our own command
   chains.