ee@atbull.UUCP (Erwin Eder) (09/05/89)
In article <265@atbull.UUCP> i write: >_The configuration: > 60830,UNIX V.3,16MB > >_The Situation: > 2MB for BUFFERS > 'sync' stops all terminal/system activity for > some seconds ( until buffers are written to > disk ? ) > > 1MB for BUFFERS > no troubles with sync > > ( of course 2MB BUFFERS is preferable, but users > complain about terminals freezing for no obvious > reason ) > >_The question: > Is it normal/ok for sync to freeze terminal activity ? Answers : 1>From: tuvie!dpmizar!lcz (Lee Ziegenhals) 1> 1>I would appreciate any information you have on this problem. I ran into the 1>same thing on a Motorola 68030 system running SystemV/68. Motorola's response 1>was basically (1) set the file hardening switch (which turns the cache into 1>a write-through cache -- at a tremendous performance hit), or (2) use fewer 1>buffers. I don't really consider either of these an acceptable solution. 1> 1>I'm hoping to get more information from the engineers at Motorola, but I'm 1>not holding my breath... 1> 1>-Lee Ziegenhals 2>From: Bruce Funk <tuvie!osiris.cso.uiuc.edu!funk> 2> 2>Please summarize or send copy of responses! (Ok! Done...) I hope there will be more responses in comp.unix.questions plus comp.unix.wizards plus comp.bugs.sys5 ! Followups go to comp.unix.wizards. Please respond by e-mail. I will post a summary if there is enough interest. thanx Erwin Eder -- +---/~~~~\---------------+--------------------------------------------+ | / \ <-- this is | and this uunet!mcvax!tuvie!atbull!ee | | \____/ a tree. | is me --> In-Real-Life Erwin Eder | +-----||-----------------+--------------------------------------------+
aglew@urbana.mcd.mot.com (Andy-Krazy-Glew) (09/06/89)
Gah! -- Andy "Krazy" Glew, Motorola MCD, aglew@urbana.mcd.mot.com 1101 E. University, Urbana, IL 61801, USA. {uunet!,}uiucuxc!udc!aglew My opinions are my own; I indicate my company only so that the reader may account for any possible bias I may have towards our products.
aglew@urbana.mcd.mot.com (Andy-Krazy-Glew) (09/06/89)
To: tuvie!dpmizar!lcz Subject: Buffers and interactive response Reply-To: aglew@urbana.mcd.mot.com Bcc: Date: Tue, 05 Sep 89 21:14:38 CDT From: Andy 'Krazy' Glew <aglew@chant> I'm trying to post this, but having problems. In-reply-to: ee@atbull.UUCP's message of 5 Sep 89 01:44:01 GMT Newsgroups: comp.unix.wizards Followup-To: comp.unix.wizards Subject: Re: Should 'sync' stop terminal/system activity ? (SUMMARY) References: <265@atbull.UUCP> <266@atbull.UUCP> Distribution: world >In article <265@atbull.UUCP> i write: >>_The configuration: >> 60830,UNIX V.3,16MB >> >>_The Situation: >> 2MB for BUFFERS >> 'sync' stops all terminal/system activity for >> some seconds ( until buffers are written to >> disk ? ) >> >> 1MB for BUFFERS >> no troubles with sync >> >> ( of course 2MB BUFFERS is preferable, but users >> complain about terminals freezing for no obvious >> reason ) >> >>_The question: >> Is it normal/ok for sync to freeze terminal activity ? >Answers : >1>From: tuvie!dpmizar!lcz (Lee Ziegenhals) >1> >1>I would appreciate any information you have on this problem. I ran into the >1>same thing on a Motorola 68030 system running SystemV/68. Motorola's response >1>was basically (1) set the file hardening switch (which turns the cache into >1>a write-through cache -- at a tremendous performance hit), or (2) use fewer >1>buffers. I don't really consider either of these an acceptable solution. >1> >1>I'm hoping to get more information from the engineers at Motorola, but I'm >1>not holding my breath... >1> >1>-Lee Ziegenhals This isn't the correct forum for a formal announcement of functionality, and what I say must not be understood as an official Motorola policy, but I feel a bit bad about Lee Ziegenhals' "not holding his breath" for help from Motorola, so... Yep, we found this performance problem, large buffer caches producing big jerks in interactive response, as soon as we started living on large memory machines. So far the biggest jerk I measured was 13 seconds! The problem was an O(n^2) algorithm in the buffer cache scanning code (standard UNIX), when a lot of buffers were dirty. In Motorola SYSTEM V/68 R3V6 we have provided a different buffer cache scanning algorithm, that is O(n), but, moreover, reduces the "jerk" by scanning the buffer cache in segments. So, if you are scanning the buffer cache (BDFLUSHR) once every second, then we can now split up the work into, say, 1/60 as much on every clock tick. Yes, it performs a lot better. First of all, empirically (it's my job to measure these things). Secondly, "feel" -- we installed the fix on our production machines, and then took it off so that I could provide before and after measurements of jerkiness on a real system. I was almost lynched when I took it off. It's back now (down, down, angry programmers!) There are still a few other O(n^2) algorithms in UNIX (remember, simple, not sophisticated, algorithms? Uh-huh), but I think the buffer cache was the biggy. Tell me if it's still a problem after you update to R3V6 -- I know how to fix some of 'em, just need the time and justification (I cannot go fixing things that we have no evidence are problems - not without real good reason). Motorola System V/68 R3V6 is not, I believe, formally released yet, and you may want to double-check that the "syncfix" functionality is in it. For the moment, if you have a Motorola System V/68 R3V5 or earlier system, and are having trouble with interactive response, you might: -- reduce NBUF (to reduce the number of buffers you need to scan. take a look at your buffer cache hit statistics, to see if you really need all those buffers) -- turn on FILEHARDN (with the problems mentioned above) -- change BDFLUSHR This is one that hasn't yet been mentioned, that you might want to consider. However, it's a bit of a toughie: The O(n^2) behaviour I describe above is really more like O(n*d), where d is the number of dirty buffers (if d=c*n, then O(n^2). If your workload's buffer writing characteristics are such that you dirty a lot of different buffers, you may want to reduce BDFLUSHR (increase the rate at which scanning is done). This way, you'll be writing the data out more frequently, but hopefully fewer buffers will have been dirtied each time, so the scan will take less time - you'll have smaller jerks less frequently). However, if you are constantly redirtying the same buffers, then a higher flush rate will just mean that you're writing out more data - probably not good. In this case, you might increase BDLUSHR (frankly, I would reduce NBUF first - but I'm paranoid about reliability). If you really feel daring, you can patch the value of "bdflushr" on the fly in your kernel: bdflushcnt = tune.t_bdflushr Change tune.t_bdflushr. (In R3V6 you have to change a different variable). This way you could dynamically try out a few values, and see which you prefer. NB. THIS IS NOT MOTOROLA RECOMMENDED STANDARD PROCEDURE!!! We do not recommend changing tuneable parameters except via sysgen, and any potential damage you do is on your own head. Hope this helps. If there was a standard newsgroup for Motorola SYSTEM V/68 (and 88) systems, I'd cross-post to it. ... Now, finally - mind if I be commercial for just a little bit? (I'm normally a really good net.citizen, talking about anything *except* my company's products, but I've got this character flaw: I'm proud of the company I work for (and I only work for companies I'm proud of)): I'm sorry that some of you "don't hold your breath" for help from Motorola, but -- Motorola Microcomputer Division (the part of Motorola that sells computer systems as opposed to parts) is full of people trying to make our products better. Yeah, we've had problems, but we're getting better quickly. We've been challenged to produce the same sort of quality in computer systems, hardware and software, that other parts of Motorola put into chips and communications equipment. What 99.9999% defect free means to software isn't always clear, but it certainly means solving our customers' problems. So keep those bug reports coming - it may take a while to get 'em fixed, but we're gonna. End of inspirational commercial. Please, please, please - report those bugs to your sales office or customer support. I didn't know about this buffer cache scanning problem until I started working on a system with a lot of memory myself. This isn't a commercial - I know that other companies have difficulty getting customers to report problems. Hell - I know that when I was a sysadmin at school I was often too lazy to report bugs. But, believe it or not, people at system shops actually do look at your problem reports. Keep 'em coming!! -- Andy "Krazy" Glew, Motorola MCD, aglew@urbana.mcd.mot.com 1101 E. University, Urbana, IL 61801, USA. {uunet!,}uiucuxc!udc!aglew My opinions are my own; I indicate my company only so that the reader may account for any possible bias I may have towards our products.
df@phx.mcd.mot.com (Dale Farnsworth) (09/08/89)
Andy Glew already gave a very good response on O(n^2) buffer flushing algorithms and the problems this causes for large buffer caches. I would like to elaborate on the Motorola's file system hardening switch. Somebody wrote: > Motorola's response was basically (1) set the file hardening switch > (which turns the cache into a write-through cache -- at a tremendous > performance hit), or (2) use fewer buffers. Have you measured this "tremendous performance hit"? Turning on file hardening does *not* make the cache write through. It enables the standard SVR3 file hardening code which essentially writes through critical directory and inode updates. This file hardening greatly increases the integrity of the file system in the event of a system crash. I strongly recommend that no system run with file hardening disabled. -Dale