rs@mirror.UUCP (Rich Salz) (08/27/86)
We will probably be starting a research project in the area of free-text retrieval. Looking around shows that our major bottleneck is going to be how quickly we can read in large amounts of data (eventually, in the Gigabyte range). I'm asking the net for any hints and kinks people have used to increase disk *read* throughput on Unix. We're at an exploratory phase, so I'm open to any and all suggestions, from "increase the block size to 16K" to "buy a Convex." For example, does anyone have any feel for what kind of performance I can get if I open a raw disk on 4.[23]BSD and do vread(2)'s? (I will probably not need the Unix filesystem structure.) I know a great deal depends on the controller -- any word out there on what's "best" and "fastest"? At the start of the project, the group will be sharing the machine with the rest of the company, although they will probably have one or two controllers and disks of their own. I expect they'll need multiple controllers because of the need for multiple simultaneous reads. The initial development machine will probably be something like a Pyramid 98x or a Sequent 21000. This brings me to my second question. The group may eventually move up to a UTS-class machine in two years. Does anyone have any numbers on the overall disk throughput of those class of machines? Or, what other machines are good IO pumps? Please reply to me, and I will summarize for the net. Thanks, /r$ -- ---- Rich $alz {mit-eddie, ihnp4, wjh12, cca, cbosgd, seismo}!mirror!rs Mirror Systems 2067 Massachusetts Avenue Cambridge, MA 02140 Telephone: 617-661-0777 "Hi, mom!"