mf@ircam.ircam.fr (Michel Fingerhut) (09/30/90)
Since "DEC" is listening to this group, and sometimes even responding, what about a response about this very serious problem (58n0 under Ultrix 4.0)? I.e.: 1. What are the problems as known to DEC today (so that we're less pissed off when we encounter them). That would be *some* help. I am rather upset that the local support tells me, when I call them with this problem "oh yeah we knew this from the start, why don't you just turn all but one CPUs off until further notice...". If they knew it from the start, don't send us 4.0 or warn us. 2. What they do or intend to do in order to solve them (other than suggesting we buy another machine and/or go to another vendor, as has already been suggested here). Michael Fingerhut
alan@shodha.enet.dec.com ( Alan's Home for Wayward Notes File.) (10/01/90)
In article <1990Sep30.123350.14441@ircam.ircam.fr>, mf@ircam.ircam.fr (Michel Fingerhut) writes: > Since "DEC" is listening to this group, and sometimes even responding, > what about a response about this very serious problem (58n0 under > Ultrix 4.0)? I.e.: "DEC" is a very big company and many of us that are listening don't have access to the wide variety of hardware needed to test customer problems. Many of those that respond do so because we happen to know the answer. Please don't confuse us with the people in the development group who's job it is to test and fix these sorts problems. Occasionally someone in Engineering does respond, but usually they're working on the bug fixes and new features of the next version. If you have a problem the appropirate way to report is to submit a Software Performance Report and/or go through the Customer Support Center nearest you. > > 1. What are the problems as known to DEC today (so that we're less > pissed off when we encounter them). That would be *some* help. > I am rather upset that the local support tells me, when I call them > with this problem "oh yeah we knew this from the start, why don't > you just turn all but one CPUs off until further notice...". If > they knew it from the start, don't send us 4.0 or warn us. > Most problems that we know about go into the release notes, but sometimes the problems aren't found until after the release notes have been printed. It would be nice if there were a nice easy way to report verified problems back to you. > 2. What they do or intend to do in order to solve them (other than > suggesting we buy another machine and/or go to another vendor, as > has already been suggested here). Hopefully fix the problem once we know what's wrong. Of course until the people that own the problem know about it they can't do anything. If they don't happen to be reading this newsgroup then it might be a while before they find out about it. I won't report a problem to them until >>>I<<< can verify it. Since I don't have a 58xx to test with there isn't much I can do. One thing that would help is a better description of "slow". What is the program doing? Lots of system calls, disk I/O, network I/O, lots of memory use, paging? I suggest looking at cpustat(1), iostat(1), netstat(1) and vmstat(1). One of these days I'll see if I can put a source archive of monitor for V4 on gatekeeper.dec.com. > > Michael Fingerhut -- Alan Rollow alan@nabeth.enet.dec.com
mf@ircam.ircam.fr (Michel Fingerhut) (10/01/90)
To Alan Rollow (alan@shodha.enet.dec.com): you miss the point. One would assume that DEC would have checked Ultrix 4.0 on 58n0 (n>1) *before* shipping it out, and would have realized that such commands as "ls" take several *seconds* for small directories. This is IMMEDIATELY noticeable. One would also have assumed that if a problem had been found then, it would have appeared either in the release notes, or in mandatory patches, or in a special page added to the release (as sometimes happens). Well, this was not the case. So *either* the software was not tried on such configurations (hard to believe, but this would not be the first time, eh? Remember the GT62?) or *else* customers were not informed (which I believe is the case, since the support center was well aware of the problem when I called them). As to reporting the problem: we can do it only by phone, are given a call number and most of the time hear that it will get in the next release, hopefully. But to this particular problem, I was also told it was a much more serious problem, namely design flaws in the 58n0. That was DEC's response. So you bet I'm worried.
aem@aber-cs.UUCP (Alec D.E. Muffett) (10/01/90)
In article <1990Sep30.123350.14441@ircam.ircam.fr> mf@ircam.ircam.fr (Michel Fingerhut) writes: >1. What are the problems as known to DEC today (so that we're less > pissed off when we encounter them). Here in Aberystwyth we are running 2x DEC 5830's with Rev 179 Ultrix 4.0. We have observed that the Symmetric Multi-Processing behaves badly under a low machine load and are therefore permamently running 2 low-priority cpu-burning jobs which sleep for bursts of 15 seconds if the load average goes >4.0. When these jobs are running, the performance is greatly improved, we believe this is because the presence of the two jobs (1 per spare CPU) solves some sort of ordering problem in the scheduler. DEC definitely DO know about this, it has appeared on a list of SPR's sent to us. No solution is yet forthcoming, but we live in hope... It's not the perfect solution, because the two jobs tend to eat away at the cpu, and if some user puts a heavily i/o bound job up as well, the machine starts to groan. Then we just kill them fast and put them back later... So, DEC have given us the ultimate reciprocal machine... the more load you put on it, the faster it goes... 8) alec (and robert :-) )
rosenblg@cmcl2.NYU.EDU (Gary J. Rosenblum) (10/02/90)
What also gets me is that we have a 5820 running Ultrix 4.0 Rev 179, and have never received one SPR from DEC about the problem! It got so bad two weeks ago that we went to the top at DEC to get things straightened out! I've put in a request to DSIN, and when I get an answer, I'll post it here. Gary Gary J. Rosenblum UNIX Systems Manager rosenblg@nyu.edu New York University gary@nyu.edu
jmg@cernvax.UUCP (mike gerard) (10/02/90)
In article <1990Oct1.080535.17017@ircam.ircam.fr> mf@ircam.ircam.fr (Michel Fingerhut) writes: >As to reporting the problem: we can do it only by phone, are given a >call number and most of the time hear that it will get in the next >release, hopefully. We suffer from the same restrictions, except for the fact that normally we don't even hear such pleasant news as "fixed in next release". In addition, "official" DEC channels refuse to comment on problems mentioned in places like this: they ask you to submit a bug report if you have a problem. It is ridiculous that there is (apparently) no data base of known problems and, where possible, available patches. I know that there are various patches available, some of which seem to be considered "mandatory". However, access seems only to be for those people having wasted their time identifying a known problem on their own systems. -- _ _ o | __ | jmg@cernvax.uucp | | | | _ / \ _ __ _ __ _| jmg@cernvax.bitnet | | | | |_) /_) | __/_) | (___\ | (_/ | J. M. Gerard, Div. DD, CERN, | | |_|_| \_/\___ \__/ \___| (_|_| \_|_ 1211 Geneva 23, Switzerland
rosenblg@cmcl2.NYU.EDU (Gary J. Rosenblum) (10/03/90)
Of course, with a problem of this magnitude, DEC won't say if there is a problem with SMP (I can see their stock plunging, heads rolling, etc) since it is a problem of HUGE magnitude. (I'm not saying I agree, BTW). I got a call from DEC today, the person said that he will forward this to the local office. However, he said there are two things to look for: Is the configuration on an HSC, and if so, how many disks/requestors are configured? The second was a suggestion - changing bufcache in your config to 25 instead of the 10%. If you make the change, let me know how it goes. Gary J. Rosenblum UNIX Systems Manager rosenblg@nyu.edu New York University gary@nyu.edu
alan@shodha.enet.dec.com ( Alan's Home for Wayward Notes File.) (10/04/90)
In article <1990Oct1.080535.17017@ircam.ircam.fr>, mf@ircam.ircam.fr (Michel Fingerhut) writes: > To Alan Rollow (alan@shodha.enet.dec.com): you miss the point. One > would assume that DEC would have checked Ultrix 4.0 on 58n0 (n>1) > *before* shipping it out, and would have realized that such commands as > "ls" take several *seconds* for small directories. This is IMMEDIATELY > noticeable. One would also have assumed that if a problem had been > found then, it would have appeared either in the release notes, or in > mandatory patches, or in a special page added to the release (as > sometimes happens). > Actually I do get the point. You mention two possible variations of the problem: 1. We didn't test the configuration. 2. We knew about the problem and shipped V4.0 anyway. I propose a 3rd. The problem doesn't occur on all systems and didn't occur on the systems we tested. Now I don't know exactly how our engineering group does their testing, but I KNOW that the DECsystem 5810, 5820, 5830 and 5840 were all tested. Furthemore, I've heard from people I trust that 5840's used internally aren't having the problem. So, please provide us with as information as possible to help us solve the problem. The official reporting channel for this things is an SPR. As you're aware you can submit them through the CSC or mail one of the stupid zillion carbon things to the address listed on it (*). The sorts of things we need to know. o Characterization of your work load. All interactive users? Doing what? NFS server? How many clients? Diskless workstation server? How many clients? What sort of workload on the clients? Local or remote paging? o Configuration. How much memory? Which Ethernet controller? Disk controllers? Disks? What version of ULTRIX installed? Is the Mandatory patch installed and has the kernel been rebuilt (Rev. 179)? If the disks are connected via an HSC what version of HSC code? o Load information. Collect what you can from iostat vmstat, netstat and cpustat. Or if you can get a the sources for Monitor V1.3 now available on gatekeeper- .dec.com in pub/DEC/monitor_v4src.tar.Z. (*) My personal opinion is that we should allow submitting SPRs via e-mail, but I'm only a system manager in the back waters of Colorado Springs. Who's going to listen to me? -- Alan Rollow alan@nabeth.enet.dec.com
mf@ircam.ircam.fr (Michel Fingerhut) (10/08/90)
alan@shodha.enet.dec.com ( Alan's Home for Wayward Notes File.) writes: > The problem doesn't occur on all systems and didn't occur on > the systems we tested. Now I don't know exactly how our engineering > group does their testing... ... so please don't say it did not occur. DEC acknowledges the problem occurs, and that one of the problems is a design flaw in the scheduler which causes lousy response time for small interactive jobs or commands (such as ls) but which makes the 5820 a great machine for batch. Too bad. Should have gotten an IBM. > So, please provide us with as information as possible to > help us solve the problem. Response to all the information I gave was "next release". > The official reporting channel for this things is an SPR. No, at least not this side of the ocean. > (*) My personal opinion is that we should allow submitting SPRs > via e-mail Yeah mine too, with automatic aknowledgment, and the possibility to consult an online database of bug reports > but I'm only a system manager What about some responses from people at DEC who KNOW what's happening? Michael Fingerhut