[comp.sys.encore] Umax 4.3 virtual memory problem

gordoni@chook.adelaide.edu.au (Gordon Irlam) (09/18/90)

	  Some Notes on Umax 4.3 Virtual Memory Performance
	  =================================================

				 aka

	  "How to sell overpriced memory expansion boards."


		  Gordon Irlam, Adelaide University.
		     (gordoni@cs.adelaide.edu.au)

			  1990 September 18


Umax 4.3 release 4.0.0, and all previous releases of BSD Umax, contain
a serious bug in the virtual memory system that prevents it from being
able to page out pages of processes under certain commonly occurring
circumstances.  This degrades system performance.  Or equivalently
increases the amount of physical memory needed to obtain a given level
of performance.  In more extreme cases it may cause severe performance
problems or even deadlock.

Umax 4.3 is not able to page out copy on write pages.  The meaning of
this and its ramifications are explained below.


1) What is copy on write memory?
--------------------------------

Umax 4.3 has a fairly sophisticated virtual memory subsystem.
Although not as sophisticated as that of Mach or SunOS.  In such
systems virtual memory is used in a very lazy fashion.  Pages are
shared between processes whenever possible, and pages are only
duplicated when strictly necessary.  Such systems page unmodified
pages directly from the file system, and only modified data pages need
to be paged to and from a swap partition.

When a process forks under Umax all of the modifiable pages of the
parent process are marked copy on write.  The same set of pages are
marked copy on write in the child process.  Because code pages are
read only they can be shared without being marked copy on write.
Marking a page copy on write means setting its protection to read
only, and then if a write to that page causes a translation fault a
copy of the page is made, the protection on the page is set to
read-write, and the faulting instruction re-executed.  Copy on write
pages minimize the cost of forking.

If when a copy on write fault occurs the copy on write page is no
longer shared with any other processes, say because the child has
exited, the page will be set to read-write without the needing to make
a copy of the page.  Note that this final giving away of a copy on
write page is not performed as soon as the page becomes owned by a
single process, but only when the last owner of the page writes to it.
If the last owner never writes to the page it will remain copy on
write despite the fact that it is not shared with anyone else.


2) An example of the problem.
-----------------------------

Consider the following small C program.

---- start example.c ----
#define PAGE_SIZE 4096
#define MEGABYTE 1000000

#define SIZE (10*MEGABYTE)

static char space[SIZE];		/* 10M of zero-filled memory. */

main() {
 int i;

 for (i = 0; i < SIZE; i += PAGE_SIZE)	/* Touch pages to get them in core. */
   space[i] = 'x';

 if (fork () != 0) {			/* Fork. */
   while (1) {}				/* Child exits, parent doesn't. */
 }
}
---- end example.c ----

This process gets 10 megabytes of data in core by touch all the pages,
otherwise they would be marked as zero fill, and would not have been
created yet.  It then forks.  In forking all of the modifiable pages
are marked copy on write.  Because copy on write pages can not be
swapped out (even if the child has exited, and the parent is the sole
owner of them), the net result is a process occupying 10 megabytes of
non-pageable memory.  Only if the process modifies the pages, reading
them is not sufficient, will they cease to be copy on write, and
become eligible for paging.

Needless to say it doesn't require too many copies of this program to
be run before Umax starts thrashing severely, or even deadlocks.

The same effect could have been observed in the child process if the
child hadn't exited, or if the while loop was replaced by other
computations, or system calls, except obviously exec.


3) Implications for real systems.
---------------------------------

Fortunately many processes,
    1) do not fork, or
    2) fork but have a reasonably small amount of data, or
    3) shortly after forking both child and parent,
           a) exit, or
           b) exec, or
           c) modify nearly all their data pages, or
    4) only access a few pages data pages, immediately prior to
       forking, and then only read a few data pages at any time
       subsequent to forking.

Those cases where these constraints are not met cause the most
problems, and to a certain extent case 4 can also cause problems.  In
case 4 where a process only touches a few pages immediately prior to
forking, if the system was heavily loaded at the time prior to the
fork, most pages will have been swapped out, and so will not end up
being locked down by the fork - unless they are subsequently read in.
But if the system was lightly loaded at the time of the fork then case
4 will still cause a large number of pages to be locked down.

The program shown above was an extreme example of the problems that
non-pageable copy on write memory can cause, however all programs that
fork will cause problems to a certain extent.  This prevents Umax from
being able to run processes whose total virtual memory size
significantly exceeds the amount of physical memory available, even
though such processes may be idle most of the time.  Our experience is
that we can not use much more swap space than twice the physical
memory on our machines, even though many of our processes are idle for
substantial periods of time.

We had considerable difficulty when we attempted to use a Multimax as
a server for a large number of X terminals.  The machine had
sufficient compute power, virtual, and physical memory for the
clients, but nearly all of the physical memory filled up with
non-pageable copy on write pages, that weren't even being used.
Unfortunately the xterm binary was both long lived and caused a large
number of pages to be locked down for long periods of time.


4) Fixing the problem.
----------------------

Identifying the problem is fairly easy.  Sysparam will be showing the
system paging heavily, but when you do a ps you will find some pages
of processes remain in memory, even when they are idle or stopped.  In
more severe cases all of the system's memory may end up becoming
non-pageable, preventing you from even being able to login.

Unless you have enough money to afford some extra memory that is
effectively unused, there is little you can about this problem other
than be aware of it and try and manage your job mix accordingly.

If you are desperate however you could try applying a fix similar to
the one we applied to one or two of the programs that caused us the
most trouble, as outlined below.  I would recommend avoiding this if
at all possible.

We reported this bug to Encore around the end of April, so hopefully
they are aware of the problem and are working on a solution.  We have
not yet received a reply from Encore.  But I believe that the problem
warrants making available a new version of /Umax.image to those sites
that need it once it has been solved.

I thought this problem was sufficiently serious to bring it to the
attention of others.  It's a pity Encore doesn't - it would be useful
if Encore posted to the net details of serious bugs when they are
first discovered, and made a more complete list of bugs available for
anonymous ftp.


5) Caveat.
----------

It is true that this message is critical of Umax.  But this doesn't
mean that I think Umax is a poor operating system.  On the contrary,
all things considered, I think Umax is quite good.  In particular I
believe Encore have been very successful in parallelizing the BSD
kernel.


6) A nasty little hack.
-----------------------

The following routine can be called to make a process's pages
pageable.  To use it you will need sources to the programs you wish to
fix.  It works by writing to all of a process's data pages so that
they become exclusively owned.  Obviously this has performance
ramifications since it increases the amount of swap space used, and
most likely the amount of swap traffic that will occur.  This routine
should be called from those branches of a process that have just
forked and are not about to exit or exec.  It is possible that one or
two pages on the top of the stack may not get modified by this
routine, and will remain non-pageable.

---- start touch_pages.c ----
#define PAGE_SIZE 4096

#define DATA_START 0x400000
#define STACK_LIMIT 0xffffff000

#define FLOOR(p) ((char *) (((int) p) & ~ (PAGE_SIZE - 1)))

int zero()
{
 return (0);            /* This function is used to fool optimizers. */
}

touch_pages()
{
char stack_start;
int nothing;
char *start, *limit, *p;

nothing = zero();

start = FLOOR(DATA_START);              /* Modify data and bss pages. */
limit = FLOOR((int) sbrk(0) + PAGE_SIZE - 1);
for (p = start; p < limit; p += PAGE_SIZE)
    *p = *p + nothing;

start = FLOOR(&stack_start);            /* Modify stack pages. */
limit = FLOOR(STACK_LIMIT);
for (p = start; p < limit; p += PAGE_SIZE) {
    *p = *p + nothing;
 }
}
---- end touch_pages.c ----

jdarcy@encore.com (Jeff d'Arcy) (09/18/90)

gordoni@chook.adelaide.edu.au (Gordon Irlam) writes:
>	  "How to sell overpriced memory expansion boards."

I'm sure we could think of better ways to waste VM memory than what you
mention.  Maybe I'll submit a proposal.  :-)

>The program shown above was an extreme example of the problems that
>non-pageable copy on write memory can cause, however all programs that
>fork will cause problems to a certain extent.  This prevents Umax from
>being able to run processes whose total virtual memory size
>significantly exceeds the amount of physical memory available, even
>though such processes may be idle most of the time.

This is true ONLY for programs that exhibit the type of behaviour you
describe.  As you've already pointed out, such behaviour is relatively
rare and programs exhibiting it could well be considered misbehaved.
This does *not* mean that they should be able to use up physical memory,
but merely that the UMAX 4.3 kernel is not doing this all by itself.

>We reported this bug to Encore around the end of April, so hopefully
>they are aware of the problem and are working on a solution.  We have
>not yet received a reply from Encore.

. . .so you posted this.  We appreciate being made aware of the problem
(as if we weren't already), but I'm sure you can appreciate that we'd
prefer a different methodology from what you suggest.  As for your not
receiving a reply, all I can say is that I'm not in customer service so
I won't attempt to speak for them.

>it would be useful
>if Encore posted to the net details of serious bugs when they are
>first discovered, and made a more complete list of bugs available for
>anonymous ftp.

Yes, useful to our users, and even more useful to our competitors.  The
fact is that any system as complex as an SMP UNIX kernel is going to
have bugs, and some of those are going to be pretty scary.  Advertising
our bugs while our higher-market-share competitors don't reciprocate
gives them an unparalleled opportunity to slime us.  As I'm sure you're
aware, some of our competitors *already* make plenty of misleading claims
about the relative merits of our machines vs. theirs, and this would be a
godsend for their marketing/sales departments.

Obviously I can't comment on the availability of future releases, and I
honestly don't know the status of this *particular* problem.  What I can
say is that we have been aware of UMAX 4.3 VM problems for some time (we
use it in-house too) and have no reason except resources to avoid fixing
them.  That's not *any* sort of a commitment; remember that I speak only
for myself and not for Encore even in this group.  Also, my mention of
VM problems in UMAX 4.3 should not alarm anyone.  These things are a fact
of life, and I'll bet that at least one if not all of Dynix, IRIX and OSx
have worse problems lurking somewhere.
--

Jeff d'Arcy, Generic Software Engineer - jdarcy@encore.com
      Nothing was ever achieved by accepting reality

phil@eecs.nwu.edu (William LeFebvre) (09/19/90)

In article <jdarcy.653657607@zelig>, jdarcy@encore.com (Jeff d'Arcy) writes:
|>>We reported this bug to Encore around the end of April, so hopefully
|>>they are aware of the problem and are working on a solution.  We have
|>>not yet received a reply from Encore.
|>
|>. . .so you posted this.  We appreciate being made aware of the problem
|>(as if we weren't already), but I'm sure you can appreciate that we'd
|>prefer a different methodology from what you suggest.

I disagree.  Ever since we got 4.0.0 we have been complaining about the
absolutely rotten VM performance.  All we ever got from Encore was "we
don't see the problem."  It is NICE....REALLY NICE....to know that we are
not alone, that this is not a fluke but a genuine wide-spread bug.  Without
this person's posting, I would still be in the dark.

|>Yes, useful to our users, and even more useful to our competitors.  The
|>fact is that any system as complex as an SMP UNIX kernel is going to
|>have bugs, and some of those are going to be pretty scary.  Advertising
|>our bugs while our higher-market-share competitors don't reciprocate
|>gives them an unparalleled opportunity to slime us.  As I'm sure you're
|>aware, some of our competitors *already* make plenty of misleading claims
|>about the relative merits of our machines vs. theirs, and this would be a
|>godsend for their marketing/sales departments.

So in the hopes of sustaining and increasing future sales, you sacrifice
your current customers?  Seems like a pretty bad tradeoff to me.

Sun distributes something called the "Customer Distributed Bugs List"
(it is VERY thick).  They even have a bulletin-board-style system set up
to enable customers to check the bugs database online.  In this way,
customers can find out about bugs but (presumably) the so-called
"competitors" access to this list is limited.  Does Encore have anything
equivalent?

Sun willingly posts information about serious bugs to appropriate mailing
lists and newsgroups.  Thye used to put fixes and patches out on uunet, 
although that hasn't happened much recently because of a change in
support staff.  Has Encore ever done anything like that?

I'm only using Sun as an example because I am familiar with them....

|>Obviously I can't comment on the availability of future releases, and I
|>honestly don't know the status of this *particular* problem.  What I can
|>say is that we have been aware of UMAX 4.3 VM problems for some time (we
|>use it in-house too) and have no reason except resources to avoid fixing
|>them.

Then FIX them!  I'm pretty fed up with the problem at this point.
If this computer didn't use whiz-bang 20 layer interleaved super-duper
high-speed memory that costs more than my house, I'd just go buy more
memory.....

  That's not *any* sort of a commitment; remember that I speak only
|>for myself and not for Encore even in this group.  Also, my mention of
|>VM problems in UMAX 4.3 should not alarm anyone.  These things are a fact
|>of life, and I'll bet that at least one if not all of Dynix, IRIX and OSx
|>have worse problems lurking somewhere.

It is also a fact of life that serious problems which go unfixed for
a long time tend to get customers very angry and tend to make them wish
that they were not your customers.  Not that I am personally at that
point...........yet!

		William LeFebvre
		Computing Facilities Manager and Analyst
		Department of Electrical Engineering and Computer Science
		Northwestern University
		<phil@eecs.nwu.edu>
                                                        

alan@cunixf.cc.columbia.edu (Alan Crosswell) (09/19/90)

In article <jdarcy.653657607@zelig> jdarcy@encore.com (Jeff d'Arcy) writes:
>gordoni@chook.adelaide.edu.au (Gordon Irlam) writes:
>>	  "How to sell overpriced memory expansion boards."

We also "discovered" that the u area is not swappable in UMAX 4.3.  We
ran into this when our mostly-idle 4-processor 510 ran out of process
slots (at around 450 or so) since it had "only" 64M of memory.  There
were a lot of idle logged-in users tying up proc slots.  We fixed it
by accelerating our original plan to purchase more memory which we knew
we would need for performance reasons.

I can live with performance degradation when there's more virtual than
physical memory required, but stopping dead due to a physical memory
limit is silly.  After all, the u and proc tables were decoupled in
the original Unix design just so the large u area could be swapped
since it wasn't needed for process scheduling.  Again, understandable
that Encore ran into problems parallelizing a non-parallel kernel
design, but some things need to get fixed -- even if it means
scrapping the 4.3 kernel and dropping Mach in instead.  After all, not
much point in supporting two kernels that both provide a 4.3 system
call interface.

I'm not too upset since the future does offer better software for this
platform -- namely OSF/1 and/or Encore Mach.  Hopefully OSF/1 will
become a product soon....  Of course Encore Mach is now.  So, I'm not
writing off the Multimax -- it's still a neat machine and there is
a better VM implementation already running on it.

Alan Crosswell
Center for Computing Activities
Columbia University

jdarcy@encore.com (Jeff d'Arcy) (09/19/90)

alan@cunixf.cc.columbia.edu (Alan Crosswell) writes:
>some things need to get fixed -- even if it means
>scrapping the 4.3 kernel and dropping Mach in instead.  After all, not
>much point in supporting two kernels that both provide a 4.3 system
>call interface.

OK, Devil's Advocate time.  You say that there's not much point in
supporting two kernels that both provide a 4.3 system call interface,
and I agree.  Now. . .why not just go with Mach?  I'm sure I don't
need to explain the advantages of Mach over 4.3, although the one
you mention (better VM implementation) is probably very relevant to
this discussion.

>Hopefully OSF/1 will
>become a product soon....  Of course Encore Mach is now.

As far as I know - and my information may be incomplete since I'm in a
different group - Encore Mach is not a commercially supported product
but is rather considered an ongoing research project.
--

Jeff d'Arcy, Generic Software Engineer - jdarcy@encore.com
      Nothing was ever achieved by accepting reality

loverso@westford.ccur.com (John Robert LoVerso) (09/19/90)

In a recent article, Alan Crosswell writes:
> Again, understandable
> that Encore ran into problems parallelizing a non-parallel kernel
> design, but some things need to get fixed -- even if it means
> scrapping the 4.3 kernel and dropping Mach in instead.

Note: The "4.3" here should only refer to the UMAX4.3 kernel, which, in
turn, should not be confused in any way with a parallelized version of
the 4.3BSD kernel.  UMAX4.3 only provides 4.3BSD-like kernel features
over a base UMAX"4.2" kernel (VM, process control, etc) - it is not
a "port" of the 4.3BSD  kernel.  [The kernel network code is a major
exception to this].  That base kernel is still a reflection of the
original (1985) design for UMAX.  This is not to say that design is
wrong (which would be wrong to say, because it mostly works!), but
rather to point out that some continuing problems - such as VM -
are because of the original design.

As for a future UMAX4.x with MACH parts transplanted in: I place my
chips on a sysV.4 UMAX to replace the current UMAXV and UMAX4.3...

-- 
John Robert LoVerso			Home: john@loverso.loem.ma.us
Concurrent Computer Corp		Work: loverso@westford.ccur.com
RTU Network Group - FDDI project	"No terminal servers, thank you".

jdarcy@encore.com (Jeff d'Arcy) (09/19/90)

Before I get started, I'd like to remind everyone that I'm posting on my own
time and do not IN ANY MANNER OR CAPACITY speak for Encore.  Just think of
me as another UMAX user who just happens to write the OS.  :-)

phil@eecs.nwu.edu (William LeFebvre) writes:
>I disagree.  Ever since we got 4.0.0 we have been complaining about the
>absolutely rotten VM performance.  All we ever got from Encore was "we
>don't see the problem."  It is NICE....REALLY NICE....to know that we are
>not alone, that this is not a fluke but a genuine wide-spread bug.  Without
>this person's posting, I would still be in the dark.

So why didn't YOU post?  Now that there's a bandwagon to jump onto you're
pretty quick to flame Encore, but I didn't see you doing anything to get
the ball rolling.  You may even remember the last time people started
bashing Encore in this group; I posted an article asking for suggestions
on how we could use this newsgroup to our customers' best advantage.  I
got two lukewarm responses and deafening silence from everyone else.  If
you're not part of the solution. . .

>So in the hopes of sustaining and increasing future sales, you sacrifice
>your current customers?  Seems like a pretty bad tradeoff to me.

We're not sacrificing anyone.  The chain is simple:

	sales -> development resources -> bug fixes

The conclusion is obvious; no sales, no bug fixes.  Maybe no Encore.  That's
not exactly to our customers' advantage.

>Sun distributes something called the "Customer Distributed Bugs List"
>(it is VERY thick).

I've seen it, and after using Suns I'm not surprised at its thickness.  :-)

>In this way,
>customers can find out about bugs but (presumably) the so-called
>"competitors" access to this list is limited.

Well, we're not really competitors to Sun, but I'm sure my counterparts
at HP-Apollo, MIPS, SGI, etc. have copies.  So much for limited access.

>Does Encore have anything
>equivalent?

Sun can afford to do this because they have market share.  We don't and
therefore can't.  In an ideal world of educated customers who didn't
get scared at the mere mention of a system crash and competitors who
would respond in kind instead of taking the opportunity to slime us in
their sales presentations, I'd be all for an open bug list.  However,
we don't live in such a world.

>|>Obviously I can't comment on the availability of future releases, and I
>|>honestly don't know the status of this *particular* problem.  What I can
>|>say is that we have been aware of UMAX 4.3 VM problems for some time (we
>|>use it in-house too) and have no reason except resources to avoid fixing
>|>them.
>
>Then FIX them!  I'm pretty fed up with the problem at this point.
>If this computer didn't use whiz-bang 20 layer interleaved super-duper
>high-speed memory that costs more than my house, I'd just go buy more
>memory.....

Just that simple, eh?  Fix them.  Why didn't we think of that?  Do you
really think we're not doing our best to serve our customers, or that
we wouldn't like to have the highest-performance, most robust SMP UNIX
box in the world?  I'm sure that this problem's visibility has caused
its priority to rise to the top of the list, but since it's a complex
problem affecting a large and critical portion of the kernel, I'm sure
you can appreciate that our turnaround will not be instantaneous.  We
do the best we can with the resources we have.  If you really want to
help, how about buying a few more Multimaxes so we'll have the revenue
to justify hiring more engineers?  :-)

>It is also a fact of life that serious problems which go unfixed for
>a long time tend to get customers very angry and tend to make them wish
>that they were not your customers.  Not that I am personally at that
>point...........yet!

It is also a fact of life that many people like to complain and make
threats, especially when they feel their target is unable to defend
itself, but they don't do much to help the situation.  Gordon Irland's
post was well-researched and very reasonable, and we at Encore most
sincerely appreciate his efforts.  I just wish we had more customers
like him.

Again, for those who missed it at the top: I DO NOT speak for Encore
AT ALL.  I'm posting on my own time as I private individual who uses
UMAX and just happens (as if by coincidence) to work at Encore.
--

Jeff d'Arcy, Generic Software Engineer - jdarcy@encore.com
      Nothing was ever achieved by accepting reality

boykin@encore.com (Joseph Boykin) (09/19/90)

> I'm not too upset since the future does offer better software for this
> platform -- namely OSF/1 and/or Encore Mach.  Hopefully OSF/1 will
> become a product soon....  Of course Encore Mach is now.  So, I'm not
> writing off the Multimax -- it's still a neat machine and there is
> a better VM implementation already running on it.

A few points.  First, Encore distributes Mach in the "Advanced
Technology Research Product" category.  That means limited support and
limited availability of layered products.  The reason for this is
simple:  Mach is still under development, it changes regularly (change
from both CMU and Encore) and, while reasonably stable, isn't likely
to stay up for months at a time under load.  On the other hand, 0.6
(about to go into Beta test) is "encore.com" (our gateway to the
world), runs 50 users regularly (3 XPC's 64MB), serves news to all
in-house users and is our internal and external mail gateway machine.
Average uptime under constant and *very* heavy load is about 5 days.

Second, unfortunately, the folks at CMU thought that statically
specifying how large particular memory resources (such as u areas) can
be and panic'ing when you run out was an acceptable way to go.  Maybe
it is at a University where you're running a research OS on your
private workstation with 16 or 32MB of memory and one user.  On a
Multimax, it isn't.  So, the particular point you made isn't
(unfortunately) true.  We've been working on a number of those issues,
but let's face it, Mach isn't a "commercial" operating system and we
don't have the sales to warrant the support staff necessary to do all
the "little things".  Interestingly enough, we recently *did* "find"
some additional money to do alot more of these things.  We're in
the process of putting alot of manpower into Mach that really hasn't
been available before.

Alot of these problems are being addressed for OSF/1.  However,
as the sales and marketing folks keep telling me "no one is asking
for OSF/1".  If you want it, PLEASE PLEASE PLEASE send your
cards and letters in to Encore and ask for it!  If they don't see
that users want it, they won't turn it into a fully supported product.

----

Joseph Boykin
Manager, Mach OS Development
Encore Computer Corp
Treasurer, IEEE Computer Society

Internet: boykin@encore.com
Phone: 508-460-0500 x2720

phil@eecs.nwu.edu (William LeFebvre) (09/19/90)

In article <jdarcy.653747915@zelig>, jdarcy@encore.com (Jeff d'Arcy) writes:
|>Before I get started, I'd like to remind everyone that I'm posting on my own
|>time and do not IN ANY MANNER OR CAPACITY speak for Encore.  Just think of
|>me as another UMAX user who just happens to write the OS.  :-)

Understood.  Those of you who want to hear me say something positive about
Encore, be sure to read the end of this message.

|>phil@eecs.nwu.edu (William LeFebvre) writes:
|>>I disagree.  Ever since we got 4.0.0 we have been complaining about the
|>>absolutely rotten VM performance.
|>So why didn't YOU post?

Because I assumed that complaining directly to Encore would be more
appropriate and more effective than posting to the world.  Usually
manufacturers like to get "first shot" at fixing a problem.  People
at Sun have chided me in the past for posting a problem to the net
BEFORE reporting it to Sun.  I guess that Gordon Irland was the first
to get fed up enough to take the time to research the problem and
report to the net about it.

|>Now that there's a bandwagon to jump onto you're
|>pretty quick to flame Encore, but I didn't see you doing anything to get
|>the ball rolling.

I have done some things in the past to attempt to "get the ball rolling"
on this particular bug and I heard nothing but a deafening silence from
Massachusetts.

|>You may even remember the last time people started
|>bashing Encore in this group; I posted an article asking for suggestions
|>on how we could use this newsgroup to our customers' best advantage.

I probably should have said something, yes, but I was too busy at the time.
I really shouldn't be taking the time to do this, either.  The one idea 
that was suggested by Mr. Irland at the end of his post was shot down
by you as being too advantageous to your competitors.  And I didn't see
that posting asking for suggestions as a solicitation for stating specific
problems.

|>Sun can afford to do this because they have market share.  We don't and
|>therefore can't.

Here is my suggestion.  Find SOME way to distribute to your customers a
list of currently known bugs (big and small).  If the bug has been fixed
in a recent release, update, or patch, then distribute that information
as well.  If you suddenly fixed the VM problem, how would I find out?
And how would I know where to go to get the patches?

We (the group participants) have suggested two separate ways to distribute
this information (on-line and via hardcopy a la Sun).  You didn't like
either one of them.  Someone (perhaps someone in Encore?) needs to find
a way to do this that is to everyone's satisfaction.  THAT is how you can
help us!

|>In an ideal world of educated customers who didn't
|>get scared at the mere mention of a system crash and competitors who
|>would respond in kind instead of taking the opportunity to slime us in
|>their sales presentations, I'd be all for an open bug list.  However,
|>we don't live in such a world.

I agree with that sentiment.  I personally don't get scared at the thought
of a system crash:  I've certainly seen enough of them (from Sun, DEC, and
IBM as well as Encore)!  EVERY operating system has bugs.  Every single one.
Heck, even TeX has bugs!  But I don't know of any way to convince the blue
suits (vernacular for upper level management) of this simple fact.

|>>Then FIX them!  I'm pretty fed up with the problem at this point.
|>>If this computer didn't use whiz-bang 20 layer interleaved super-duper
|>>high-speed memory that costs more than my house, I'd just go buy more
|>>memory.....
|>
|>Just that simple, eh?  Fix them.  Why didn't we think of that?  Do you
|>really think we're not doing our best to serve our customers,

Well, statements like "we can't tell our customers what bugs are in
the software that they use because our competitors might use it against
us" gives me a small reason to doubt.

|>or that
|>we wouldn't like to have the highest-performance, most robust SMP UNIX
|>box in the world?

That goes without saying.

|>I'm sure that this problem's visibility has caused
|>its priority to rise to the top of the list, but since it's a complex
|>problem affecting a large and critical portion of the kernel, I'm sure
|>you can appreciate that our turnaround will not be instantaneous.

I hope it does get to the top of the list.  But I would also like to point
out that you have had years to work on this problem.  Not just weeks or
months, but YEARS.

|>We
|>do the best we can with the resources we have.  If you really want to
|>help, how about buying a few more Multimaxes so we'll have the revenue
|>to justify hiring more engineers?  :-)

I would, but they're too expensive!  :-)

Now for the promised "positive" statement....

Encore's customer service guys are really great!  When I first put
our encore into production, I had some very very serious problems.
These problems took months and some rather drastic action on the part
of Encore to fix.  But through the entire time, the customer service
people that I talked to (the folks in Massachusetts) were truly
wonderful, helpful, sympathetic, and understanding.  I have had some
very positive experiences with Encore.

		William LeFebvre

rackow@ANTARES.MCS.ANL.GOV (09/19/90)

Jeff,

  I do not think that anyone that has posted things
in this forum are actually directing the comments
at you, but a Encore the corperation.  You just
happened to be available at the time.  There is
also the thought that posting here will get someone
to take notice and fix things, or that IF someone
from Encore is reading this list, maybe they can 
push some inside track we can't get to.

I also do not want to cause you to stop reading AND
replying to problems given here.  Having people from 
encore respond is a good thing, be it official or
not. 

Note that for most messages here, I read 
"we" or "I" to be the customers of Multimaxen.
"encore" is to be taken as the corperation supporting
us, not individuals in the company.  Granted some of
those individuals are in position to guide the path
of encore the corperation and should take special note
of how the customers feel. 

As to Phil jumping on the band wagon, I remember
seeing him complaining about things during the 
beta test of 4.3 and before.  Phil is at a different
location soI can't tell how similar this is, but it 
is also the case that we usually need a "easily
repeatable" problem to report. I have been tracking a 
problem here for some time now that happens maybe once 
every week or three.  Unless I know what is happening, it
does no good to report a problem.  The load on my
system is so varied between users an apps that it is
really hard to determine what caused the system to
hang.  After reading the post about more specifics on
the VM, I have another lead. Phil and I are NOT
jumping on the bandwagon, we are just stating that
"we" have had these problems for a LONG time
and it is getting old.  The bottom line is that
the VM problem among others have been a sore spot
in Umax since day 1 and other than fixing the
things that are completely horrible, encore has
shown little improvement to the system.  I do not 
think it appropriate to POST any time there is a 
problem, but if POSTING is what it takes to get the
attention of encore, then so be it.  This list should
then be renamed "info-encore-angry-customers". ;-)
I feel that posting should be a last resort to the 
system.

I know that I have reported a few things in the past
as problems, and my feelings are that the bugs are
falling on deaf ears.  A short test is done to say
that "yes there is a problem there" then nothing.

As to your competitors, I am sure that they are all
well armed with Encores VM problem as a reason not 
to buy Encore since that problem has been around for
way to long.  They also have a list of satisfied
customers that can be called, etc. etc.  Granted
the book that Sun puts out is BIG, but if Encore
put their's into the same format, it would be 
comperable.  If you don't think that is the case,
you should take another look at your system.

 

rickert@mp.cs.niu.edu (Neil Rickert) (09/20/90)

In article <jdarcy.653747915@zelig> jdarcy@encore.com (Jeff d'Arcy) writes:

>Sun can afford to do this because they have market share.  We don't and
>therefore can't.  In an ideal world of educated customers who didn't
>get scared at the mere mention of a system crash and competitors who
>would respond in kind instead of taking the opportunity to slime us in
>their sales presentations, I'd be all for an open bug list.  However,
>we don't live in such a world.

 Perhaps I should point out that this can work both ways.  When we were
considering a purchase one of Encore's competitors took the opportunity to
'slime' them.  We rapidly dropped that competitor from our list of vendors.

     And we were one of the early purchasers who thereby helped Encore get
the ball rolling.

-- 
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
  Neil W. Rickert, Computer Science               <rickert@cs.niu.edu>
  Northern Illinois Univ.
  DeKalb, IL 60115.                                  +1-815-753-6940