[comp.databases] Single Server Bottlenecks

cs_bob@gsbacd.uchicago.edu (05/11/89)

 
In article <4185@sybase.sybase.com>, tim@phobos.sybase.com (Tim Wood) writes...
 
>In article <1588@bilpin.UUCP> mcvax!ukc!icdoc!bilpin!nick (Nick Price) writes:

>>particularly where bottle-necks are introduced by a single server
>>architecture. 
> 
>Can you give examples of these bottlenecks?  Is your point theoretical
>or from experience with a particular system?  If theoretical, could there
>be a server design that exploits SMP architecture?
> 
One example, from experience with Ingres Release 6.1, running under VMS,
which applies to SINGLE servers:

Imagine an small interactive, multi-user system where a typical mix of users
includes 5 interactive CPU bound non-DBMS jobs and 10 Ingres users. 
Under Ingres 5.0, each of the ten Ingres users had their own backend which
competed with other processes for CPU. Thus, in the extreme case where all
ten Ingres users become CPU bound, roughly 2/3 of the processor will be
available to the backends (that is, 10 out of 15 CPU bound jobs will be Ingres
backends). Under version 6.1, only 1/6 of the processor will be available,
since there is only one backend.

It is not feasible to raise the Ingres server to a higher priority, since
large reports/sorts do consume large amounts of CPU and can starve interactive
users. The only practical solution in this case is to start several backends,
but there are a couple of problems with this approach:

a) it cannot be done dynamically. That is, one cannot improve a bad situation
post hoc by starting new servers, because the current DBMS jobs running under
one server cannot be off-loaded to a new one. Even worse, in the true single
server environment, RTI provides FAST COMMIT and GROUP COMMIT options which
can only be disabled by taking down the server. If a server is running
with these features, designed to dramatically improve OLTP in particular,
no new servers can be started until it is taken down.

b) a corollary to this is that any multi-server configuration loses the
advantages of FAST COMMIT (including VAX clusters with one server per node).

An obvious solution to this would be to run the Ingres server at priority 5,
but have it monitor its own usage vis a vis other processes in the system
and periodically give up the processor if it is starving other processes.
While obvious, this solution is not exactly simple, and at the present time
the Ingres 6.1 server can very definitely become a system bottleneck.

Note also that in the example cited above, one would have to increase overall
CPU throughput dramactically to acheive the equivalent worst case capacity
provided under version 5.0 . While the worst case will probably never be
achieved, it has become fairly common for use to see an Ingres server running
on a VAX 8650 with only 4 or 5 Ingres users to become CPU bound while only
getting 15-20% of the available CPU. Moreover, this happens when the Ingres
users are competing with only 4 or 5 other processes for the processor.

jas@ernie.Berkeley.EDU (Jim Shankland) (05/12/89)

In article <3176@tank.uchicago.edu> cs_bob@gsbacd.uchicago.edu writes
[about bottlenecks in singler-server DBMS architectures, describing
INGRES 6.1 running under VAX/VMS on a "small interactive" system.
As a single process, the INGRES server gets inadequate CPU, since
it is competing for time-slices on an even footing with various
non-DBMS users, even though it is serving multiple users, and thus
should get a bigger slice of CPU.  Bumping the priority of the INGRES
server doesn't work, because then the non-DBMS users starve when the
server becomes CPU-bound.]

That sounds like a deficiency in the OS, rather than with the idea
of a single-server architecture (although I suppose one could argue
that a single-server architecture is inappropriate for such a badly
crippled OS).

I barely know VAX/VMS.  Are you really saying it provides no way to
grant a process a multiple of the resources (CPU, memory, etc.)
available to other processes without potentially starving the other
processes when the high-priority one goes completely CPU-bound?
Even UNIX, which has been roundly criticized at times for being
deficient in this area, can manage that with no problem:
if i "nice" a process to its highest possible (numerically lowest)
priority, and that process goes into a tight loop, other processes
will continue to get *some* CPU.

For my money, where the single-server architecture really starts
to hurt is in a multi-CPU environment.  Then you just have to
go to multiple servers to get reasonable CPU usage.  Even with multiple
servers, it may be difficult to exploit intra-query parallelism.
(Not impossible; in fact, a recent posting claimed this will soon
be in INGRES.)

Jim Shankland
jas@ernie.berkeley.edu

"Blame it on the lies that killed us, blame it on the truth that ran us down"

jkrueger@daitc.daitc.mil (Jonathan Krueger) (05/13/89)

cs_bob writes:

>it has become fairly common for use to se an Ingres server running
>on a VAX 8650 with only 4 or 5 Ingres users to become CPU bound while only
>getting 15-20% of the available CPU. Moreover, this happens when the Ingres
>users are competing with only 4 or 5 other processes for the processor.

Can you give us some DATA?  How common?  Under what conditions?  What
are the users doing?  What performance degradation is measured for the
individual user?  For system throughput?

>Imagine an small interactive, multi-user system where a typical mix of users
>includes 5 interactive CPU bound non-DBMS jobs and 10 Ingres users. 
>Under Ingres 5.0, each of the ten Ingres users had their own backend which
>competed with other processes for CPU. Thus, in the extreme case where all
>ten Ingres users become CPU bound, roughly 2/3 of the processor will be
>available to the backends (that is, 10 out of 15 CPU bound jobs will be Ingres
>backends). Under version 6.1, only 1/6 of the processor will be available,
>since there is only one backend.

Not that simple.  Each user has his own front end, too.  And processes
don't get time slices in proportion to their relative number on the
processor.  Has a lot to do with the scheduler and the other processes
behavior.  Also depends on memory management and i/o.  For instance,
it's common to have significant idle time (unused processor time) even
when system load is high (many jobs in the run queue at any given
instant, or many processes in COMputable state for you VMS users).
And database applications are biased toward interactive workloads,
where each user cycles think time==>input=>wait for the system
output==>look at output (think time again).  This means that 10
database users may be added before they "compete" for CPU in any real
sense.  And this is just scratching the surface: scheduling and
prioritization for mixed workloads is a hard problem in realtime
allocation of resources in more or less optimal ways.  For instance,
it's been shown that fairness must be traded off against optimality:
class schedulers versus round-robin schedulers are a case in point.
Another case in point is your observation:

>It is not feasible to raise the Ingres server to a higher priority, since
>large reports/sorts do consume large amounts of CPU and can starve interactive
>users.

Clearly, you can be more optimal or you can be more fair.  There are
also tradeoffs that meet some needs better than others.  But this is
not a problem specific to INGRES, all allocations of finite resources
suffer from this problem.  Consider for example how VMS sets
priorities for SWAPPER, OPCOM, JOB_CONTROL, or the simpler UNIX
solution of just requiring certain system code and data structures
always to reside in physical memory -- clearly this is neither a fair
nor optimal use of memory resources, it just happens that it seldom
makes a critical difference in overall fairness or performance.

>The only practical solution in this case is to start several backends,

No, there are several other solutions:

	If you want to support multiple fully runnable (COMputable)
	jobs without interference, you need a multiprocessor.  Buy one.
	Configure your servers as you find optimal for best throughput
	and fair for INGRES versus non-INGRES applications.

	If you want to support multiple fully runnable jobs but accept
	some interference, decide how much and how often, buy the
	minimum sized processor (and balanced config) to support this,
	and limit system load by limiting access by classic mechanisms
	such as limiting access, shifting usage to off-peak hours, etc.

	If you want to support different applications but they need
	not share a common system image, offload the compute intensive
	ones to cheaper systems (dedicated systems are always cheaper
	than shared and general purpose ones)

	You could implement (or buy) a class scheduler for VMS, which
	guarantees the INGRES server a certain percentage of the
	processor and more if available.  This prevents the "high
	priority" problem of the round robin scheduler: to wit, either
	INGRES or other highly computable processes starving the
	others indefinitely.

	There are others, these are just four examples.

>but there are a couple of problems with this approach (multiple servers):

Yes, for one you're just giving the scheduler more mouths to feed and
then expecting better or fairer treatment because more of the mouths
are those of your people.

>a) it cannot be done dynamically. That is, one cannot improve a bad situation
>post hoc by starting new servers, because the current DBMS jobs running under
>one server cannot be off-loaded to a new one.

As pointed out above, if all you have is a single processor, nothing
gets off-loaded anyway, you just increase the granularity of spreading
things thinner.  If you have multiple processors, you can use them for
multiple servers and let the system software do the offloading in a
transparent and flexible way.

Thus one doesn't improve the situation just by finding more mouths to
feed and dividing them up in ways that favor one group over another.
You need to ship some of those mouths over to where there really are
more resources, and if that's possible, why not allocate resources to
mouths in the first place?

>Even worse, in the true single
>server environment, RTI provides FAST COMMIT and GROUP COMMIT options which
>can only be disabled by taking down the server. If a server is running
>with these features, designed to dramatically improve OLTP in particular,
>no new servers can be started until it is taken down.

This is exactly the tradeoff of SYSGEN options under VMS: some are
dynamic and can be changed on running systems, some require a reboot.
The cost for making them all dynamic is higher cost development and
lower performance execution.  Clearly some proper subset should be
dynamic, we can argue about which should be members of that set.

But again it comes back to an invalid assumption that more mouths is a
way to get more resources or a good way to allocate existing
resources.  Look, consider the generic case of a single fully
computable non-INGRES process competing with a single fully computable
INGRES server, whether from one INGRES user's requests or a hundred.
They both sink to base priority under VMS priority promotion.  They
then compete.  Your point is that the one non-INGRES user gets half
the pie, and the remaining possibly one hundred divide up the other
half among them.  This is absolutely true.  This remains true as long
as neither process page faults, reads or writes to disk or other
devices, or sleeps (LEF state) pending user input.  This is simply
uncharacteristic of database queries: they constantly read from disk
and write to networks.  Every time they do they get priority promotion
over the other process.

>b) a corollary to this is that any multi-server configuration loses the
>advantages of FAST COMMIT (including VAX clusters with one server per node).

No, this is unrelated to your point.  How many mouths to feed has
nothing to do with the tradeoffs of removing computation bottlenecks
versus removing disk i/o bottlenecks.  In point of fact the VMS
scheduler only allocates processor time, not working sets or i/o queue
ordering.  You can play with priorities all you want and get no
advantage if you were i/o bound; in that case you need to work harder
or smarter on i/o.  Harder might be faster disks, such as the CDC
Wren.  Smarter might be fast commit.  If, however, you were compute
bound, priorities might be the answer, or other system management
tools and practices such as the ones I list above, including multiple
servers if you have multiple processors.

>An obvious solution to this would be to run the Ingres server at priority 5,
>but have it monitor its own usage vis a vis other processes in the system
>and periodically give up the processor if it is starving other processes.
>While obvious, this solution is not exactly simple, and at the present time
>the Ingres 6.1 server can very definitely become a system bottleneck.

You're about to re-invent the class scheduler without teeth, also
known as TSX-11.  It's dealt with above, I don't think anything need
be added here.  Instead, consider your use of terms: a "system
bottleneck" is a system resource that critically limits some
application or workload.  Therefore the server isn't a system
bottleneck, it's a something which bottlenecks affect.  In this case
the system bottleneck is schedulers don't know that some applications
serve more users than others, and thus allocate processor time via an
equally sized quantum.

In other words, a valid point related to the one you were making is
that servers pool the identities and thus quotas of individual
processes.  This is true, but again hardly unique to INGRES.  For
instance, memory managers, device drivers, and network processes all
serve multiple users without being able to charge back the costs of
each operation to the correct user served.  Multithreaded i/o allows
originating processes to queue multiple requests, which prevents bad
citizens from slowing down other processes just by filling up queue
slots, but other resources can still be unfairly and/or suboptimally
allocated.

For instance, consider VMS memory management: per-process quotas for
working sets were designed to prevent bad citizens from hurting anyone
but themselves.  This succeeds to the extent that other processes are
allocated the processor time while the bad citizen is waiting for its
pages to be faulted in.  But it fails when the paging causes disk i/o
whose seeks now compete with other processes' seeks.  Clearly the fair
thing to do is allocate equal seeks per user, but we can't do that
because we don't know how many users each seek represents.  Thus, just
as in the case you cite, the needs of the many may be forced to
compete on an equal basis with the needs of the few or the one.

This is a fact of life.  The cost of getting VMS to become more fair
about memory management, including its disk i/o ramifications, is more
complexity in the operating system, higher resulting cost to the user,
and poorer performance for the usual and expected case.  Sure, we
could add internal accounting to support per-user quotas on seeks, but
it isn't worth it, as far as we can tell at this time.  The cost of
getting the scheduler give some processes quotas proportional to the
number of users they serve may or may not be worth the increased
fairness, but this is a question to be settled by measurement.  Do you
have any data to support your contention that it's currently highly
suboptimal or unfair?   How suboptimal?  How unfair?  How often does
this come up?

-- Jon
-- 

elgie@canisius.UUCP (Bill Elgie) (05/14/89)

In article <518@daitc.daitc.mil>, jkrueger@daitc.daitc.mil (Jonathan Krueger) writes:
> [ re argument that a single database backend (INGRES, in this case) suffers
    because it is treated as a single process by the operating system ]
> Not that simple.  Each user has his own front end, too. 

  Most of the work (INGRES, again) is done by the back end.  INGRES front
  ends typically do less than 20% of the work (as measured in cpu time).
  User applications may consume more; we have one extreme that results in
  close to a 50/50 split.  But it still does very little i/o.
 
> And database applications are biased toward interactive workloads,
> where each user cycles think time==>input=>wait for the system
> output==>look at output (think time again).  This means that 10
> database users may be added before they "compete" for CPU in any real
> sense.

  "Interactive workloads" do not equate to low cpu utilization.  It's a
  function of the application.

  
  My own sense is that the direction that database developers are pushing
  us (consciously or implicitly) is towards dedicated back end servers and,
  in the not-too-distant future, workstation-based front end processors, con-
  centrating on managing the user interface.  While the latter are still ex-
  pensive to provide for large-volume use, a mix of the former with small
  multi-user systems, windowing terminals, and some workstations for those
  who do need the computing power may be a cheaper and more capable altern-
  ative to large-scale multi-user systems (including multi-processor boxes).

  I know that I am not saying anything new.  But I do believe that in
  the not-too-distant future, this will make the question of single-threaded
  versus multi-threaded back ends somewhat moot.
  
  greg pavlov (under borrowed account), fstrf, amherst, ny

cs_bob@gsbacd.uchicago.edu (05/15/89)

>cs_bob writes:
> 
>>it has become fairly common for use to se an Ingres server running
>>on a VAX 8650 with only 4 or 5 Ingres users to become CPU bound while only
>>getting 15-20% of the available CPU. Moreover, this happens when the Ingres
>>users are competing with only 4 or 5 other processes for the processor.
> 
>Can you give us some DATA?  How common?  Under what conditions?  What
>are the users doing?  What performance degradation is measured for the
>individual user?  For system throughput?
> 

I think that you're missing the point. My posting was an attempt to point
out that single server bottlenecks can and do exist. I'm not attacking Ingres,
but the fact remains that under VMS, with its retarded scheduling strategy,
the Ingres 6.1 server can become a true bottleneck.

If any 5.0 users wants to determine whether or not this could happen to
them, I suggest the following. 

a) when you suspect the database activity is heavy, run MONITOR PROCESS/TOPCPU
and look for the ING_BACK_* processes. You should see the busiest backends,
and if you sum the percentage of CPU they're getting, you should get a
rough estimate of the total CPU being provided the backends.

b) run MONITOR STATE/AVE for a typical working day to get an idea
of the typical CPU load (i.e. the average number of processes in the COM
state throughout the day.)

If we call the result of a) CPU_USED and remember that it is a percentage
strictly between 0 and 1, and if we call the result of b) CPU_LOAD then
IF CPU_USED > 1 / CPU_LOAD YOU WILL PROBABLY EXPERIENCE A BOTTLENECK 
RUNNING A SINGLE INGRES 6.1 SERVER. 

most of Mr.Krueger's posting is a smokescreen. He asks for hard data, then
himself proceeds with a thouroughly general, theorectical treatment of
scheduling problems. Most of what he says applies equally to all DBMS
systems, Ingres 5.0 as well as 6.1. My point is that in this respect,
Ingres 6.1 can provide inferior performance to Ingres 5.0.

My favorite of Mr.Krueger's suggestions:

>	If you want to support multiple fully runnable (COMputable)
>	jobs without interference, you need a multiprocessor.  Buy one.
>	Configure your servers as you find optimal for best throughput
>	and fair for INGRES versus non-INGRES applications.
> 
Got that? If you want your Ingres 6.1 performance to equal your Ingres 5.0
performance, Jonathon Krueger recommends that you buy yourself a
multiprocessor.

R.Kohout

#include <standard_disclaimer.h>

jkrueger@daitc.daitc.mil (Jonathan Krueger) (05/17/89)

In article <3241@tank.uchicago.edu>, cs_bob@gsbacd (R. Kohout?) writes:
>My favorite of Mr. Krueger's suggestions:
>
>>	If you want to support multiple fully runnable (COMputable)
>>	jobs without interference, you need a multiprocessor.  Buy one.
>>	Configure your servers as you find optimal for best throughput
>>	and fair for INGRES versus non-INGRES applications.
>> 
>Got that? If you want your Ingres 6.1 performance to equal your Ingres 5.0
>performance, Jonathan Krueger recommends that you buy yourself a
>multiprocessor.

I think your criticisms would have something worthwhile to contribute
if you read what I wrote.  Material quoted above doesn't lead me to
believe that you have.

At no time, in the part you cite or at any other point, have I said
anything about INGRES 6.x performance, relative to 5.x performance,
some absolute standard, or other vendor product.  In particular,
having no data on relative performance, I have no opinions on it, have
not expressed any, and in fact, am not overly concerned with the
topic.  I take it that you are, and you're unhappy with 6.x
performance.  That's fine.  INGRES 6.x performance issues are
appropriate to this group.  But they're distinct from the
multiprocessor issues.  Let me be boringly clear: at no time did I say
that you need, want, or should get a multiprocessor to run 6.x, nor do
I believe this, for performance reasons or anything else.

>I think that you're missing the point. My posting was an attempt to point
>out that single server bottlenecks can and do exist. I'm not attacking Ingres,
>but the fact remains that under VMS, with its retarded scheduling strategy,
>the Ingres 6.1 server can become a true bottleneck.

Of course they can, of course it can.  Again, if you want to be
helpful, now that you've explored the nature of the bottleneck, could
you give us estimates of its extent?  How often, how bad, how
pathological?

>If any 5.0 users want to determine whether or not this could happen to
>them, I suggest the following. 
>
>a) when you suspect the database activity is heavy, run MONITOR PROCESS/TOPCPU
>and look for the ING_BACK_* processes. You should see the busiest backends,
>and if you sum the percentage of CPU they're getting, you should get a
>rough estimate of the total CPU being provided the backends.
>
>b) run MONITOR STATE/AVE for a typical working day to get an idea
>of the typical CPU load (i.e. the average number of processes in the COM
>state throughout the day.)
>
>If we call the result of a) CPU_USED and remember that it is a percentage
>strictly between 0 and 1, and if we call the result of b) CPU_LOAD then
>IF CPU_USED > 1 / CPU_LOAD YOU WILL PROBABLY EXPERIENCE A BOTTLENECK 
>RUNNING A SINGLE INGRES 6.1 SERVER. 

Well, your units aren't comparable, and that provokes some doubt about
the validity of conclusions drawn from the metric.  Specifically, what
you call CPU_USED, or INGRES processor share, has units

	sum of processor time used by INGRES processes
	----------------------------------------------
			clock time

over an undefined sampling period (and recall that MONITOR returns
snapshots, not sum over time window: the INTERVAL parameter merely
sets sampling resolution, not reporting or updating resolution.  Over
short time windows this can add significant error when the underlying
unit is continuous, as in time).  The time units cancel yielding a
ratio, although not a pure unit, it's a time share, as in timesharing.

To be boringly clear, this isn't what you said, it's what I assume you
meant.  To "sum the percentages of cpu they're getting" is to arrive
at a meaningless number; I assume you meant instead to average them.
This can be measured and expressed in useful ways, such as deriving it
from cpu time over clock time as shown.

Now, what you call CPU_LOAD has units

	sum of number of processes in COM state
	---------------------------------------
	    	number of samples

over a sampling period, in your example a day.  This is long enough
that the snapshots collected by MONITOR should approach load averages.
Thus we can neglect quantization effects, the underlying unit is
discrete and we collect many samples.  The numbers cancel yielding a
ratio which is a pure unit, the average number of COM processes over
the time measured.

So the two numbers CPU_USED and CPU_LOAD aren't just different
measurements, they're not in comparable units.  Neither one be
expressed in terms of the other.  Worse, experience shows that real
measurements of the two are not well related; that is, neither is very
predictive of the other.  Either varies inversely with the other, but
not in a well behaved manner.  They somewhat complement each other for
performance analysis.  But not in the way you suggest:

	CPU_USED > 1 / CPU_LOAD

which may be re-expressed as

	CPU_USED * CPU_LOAD > 1

which, following my notation, means

	INGRES processor share * load average > 1

Substituting in pure units, and specifying a common time window for
data collection, this reads

	processor time used by INGRES	  number of COM processes
	-----------------------------  *  -----------------------  > 1
		total clock time	     number of samples


Take again the simple case of one fully computable INGRES process and
one fully computable non-INGRES process, at equal priorities in a
round-robin scheduler.  By this metric, "you will probably experience
a bottleneck running a single INGRES 6.1 server."  In fact, this makes
sense, you probably will, although "probably" and "bottleneck" have
yet to be expressed quantitatively: how probably and how bad.

Now take the case of the two processes staying about half compute
bound and half i/o bound.  If their i/o and computation overlap well,
they'll never see each other; if they don't, they'll interfere with
each other exactly as much as in the previous case.  But the metric
doesn't distinguish between these two sets of conditions.  Thus it's
easy to show beta error, the metric falsely predicts no problem.

Alpha error is even easier to show: consider the case of ten INGRES
users and two non-INGRES.  Number of COM processes can average 2 or
more with 30% idle time.  With good overlapping this again means that
the next pending INGRES job (process that goes from LEF to COM) will
have processor available.  VMS priority promotion favors that job over
the more recently served and higher computable ones.  Thus no problem,
but anytime INGRES processes get more than half the used processor
time, the metric falsely predicts a problem.

Alpha error points out a deeper problem with this analysis: the metric
complains about bottlenecks hurting INGRES when INGRES processes are
getting the best of things!  For an extreme example, consider what
happens if we modify priorities so that INGRES processes always
pre-empt the non-INGRES.  Now let INGRES loads increase to approach
100% use of the processor.  The other jobs remain COMputable, in fact
they'll wait for processor share indefinitely.  It's trivial to put 10
or more other jobs into the run queue, they just pile up waiting for
processor because they're not waiting for memory or i/o.  The metric
now says

	processor time used by INGRES	  number of COM processes
	-----------------------------  *  -----------------------
		total clock time	  number of samples


	   n seconds			   ~11 * (interval / seconds)
=	----------------		*  --------------------------
	n + delta seconds		   (interval / seconds)

which, for small delta (as INGRES loads increase to 100%),

=	11 >> 1

The metric predicts bottlenecks, imposed by "single server
architecture", where non-INGRES jobs get an unfair boost over INGRES.
In point of fact it's exactly the other way around, we've set it up so
that the INGRES jobs are beating the stuffing out of the other jobs,
but the metric doesn't know this.  Of course, all metrics have
contexts, and we could say this is outside of the context of
usefulness of this metric.  If we went into this further, we'd
probably agree that part of the context is that the other jobs have to
proceed too.  That makes us wonder if the context isn't getting and
giving fair share of shared use of uniprocessors.

So my point is, yes you have a bottleneck, it's just not one of INGRES
competing at a disadvantage with other jobs, as you suggest.  If
you're compute bound on a single shared processor yourr bottleneck is
that processor.  There are different methods of attaining higher
performance, but multiple servers isn't one of them.  Greg Pavlov
points out one solution: move INGRES to dedicated resources.  Another
solution is to increase the shared resources, such as more or faster
processors.  Software can't work miracles and create more resources.
All it can do is ration existing resources more fairly, flexibly,
optimally, or closely in accordance with local policies and needs.

>most of Mr. Krueger's posting is a smokescreen. He asks for hard data, then
>himself proceeds with a thouroughly general, theorectical treatment of
>scheduling problems. Most of what he says applies equally to all DBMS
>systems, Ingres 5.0 as well as 6.1.

Thanks, I couldn't ask for a better review.  It was my intention to
discuss a more general set of issues.  Since I'm not making any
specific claims, hard data from specific systems would not be
appropriate.  You, on the other hand, are: do you have any?

>My point is that in this respect,
>Ingres 6.1 can provide inferior performance to Ingres 5.0.

And your point is well taken.  Okay, it can.  Does it?  How often?
How inferior?  If you want to be helpful, please tell us not only the
what and where of bottlenecks but also when and how much.

-- Jon
-- 

cs_bob@gsbacd.uchicago.edu (05/17/89)

> 
>And your point is well taken.  Okay, it can.  Does it?  How often?
>How inferior?  If you want to be helpful, please tell us not only the
>what and where of bottlenecks but also when and how much.
> 

There is very little in your excellent analysis which I can dispute. It
is much more detailed than I ever intended to be, or could even pretend
to be. However, I would like to point out a couple of things

1) My analysis was directed at one special case, which greatly simplifies
everything. That is, it is directed at instances where the version 6 backend
becomes truly compute bound (which is to say, there are no free cycles, and
the backend may actually have use for 100% or more of the processor.)

2) I never intended to give a detailed perscription for the analysis of
all situations. I realize that different sites have a wide variety of
computing environments, and individual users will have to take this into
account. For example, a machine dedicated to serving Ingres database
requests will experience no ill effects. However, in our mix, which
for a given processor includes 3 or 4 software developers compiling,
linking etc. at any given time, a variable number of students and faculty
do various things, and generally several large, CPU bound batch jobs
running at priority 3, the bottleneck I observed is very real. (Note
too that we force large CPU bound jobs to run in batch by enforcing a
CPU limit for interactive jobs.)

3) For the purpose of rough estimation, I still stand by my metric. Users
need to understand the implications of the numbers derived, and Mr.Krueger's
analysis is a good start. There are a number of ways of calculating load.
Again, most of my analysis presumes that the CPU is busy with priority 4
jobs, including non-Ingres DBMS work.

Any statistics I give regarding frequency and extent, would of course only
apply to my particular installation. Nonetheless they result from a problem
that results from a combination of Ingres's single server architecture
and the VMS scheduling philosophy. If the server is running at priority
4, VMS is going to treat it on a par with all interactive jobs. This is
not a problem as long as the database server is primarily responsible
for issuing I/O requests, which is what it usually does. However, if
and when the server becomes CPU bound, (and in our installation it happens
2-3 times a day for 10 to 30 minute periods, when there are no more than
6 or 7 users on the server), the server is a bottleneck. It is treated by
VMS as a single process, and is provided with a single process's share
of CPU. However, it is effectively providing cycles for 6, 7 or theorectically
many more users.

I never intended to provide a detailed performance analysis for my own site,
let alone "the general case." My initial posting was in response to a request
for examples of single server bottlenecks. I believe that the potential for
such a bottleneck exists in Ingres 6.1 running under VMS. 

-- Bob Kohout