[comp.databases] Performance Data

tim@binky.sybase.com (Tim Wood) (11/22/89)

In article <126@tacitus.tfic.bc.ca> clh@tacitus.UUCP (Chris Hermansen) writes:
>In article <666@dgis.dtic.dla.mil> jkrueger@dgis.dtic.dla.mil (Jon) writes:
>>dg@sisyphus.sybase.com (David Gould) writes:
>>
>>>'exactly how much measurable benefit ...'. This is of course proprietary
>>>information
>>
>>For you, perhaps.  Some of your competitors can substantiate their
>>performance claims with DATA.
>
>
>As I tried to emphasize by my `measurable benefit' question: design is one
>thing, performance is another.  [ ... analogy deleted ... ]
>I'm not trying to accuse Sybase of
>having an inferior product; I just don't like unsubstantiated performance
>claims.
>
>Chris Hermansen                         Timberline Forest Inventory Consultants

Your question puts us in something of a bind.  If we don't give data,
folks can say, "shucks, we want data."  If we give data, folks might say, 
"you're a vendor, we don't believe your data; besides, it doesn't measure my
application--TP1's are meaningless."  And so on.  

We have numbers we like from competitive benchmarks against other products, 
conducted by our customers & prospects.  However, publicizing those numbers is 
another issue.  IMO, the most beneficial thing would be for a publication
like _Digital Review_ to conduct a benchmark.  Their trials of hardware and
software seem to be very well conducted and documented.  Also note that
it's quite difficult to design a meaningful DBMS benchmark--about as hard
as designing a DBMS schema for a real application.  I'd like to see more
consensus on what a meaningful benchmark is, then someone could measure
our performance against it publicly.  But the only benchmarks that finally
matter are people's applications.
-TW

Sybase, Inc. / 6475 Christie Ave. / Emeryville, CA / 94608	  415-596-3500
tim@sybase.com          {pacbell,pyramid,sun,{uunet,ucbvax}!mtxinu}!sybase!tim
		This message is solely my personal opinion.
		It is not a representation of Sybase, Inc.  OK.

sullivan@aqdata.uucp (Michael T. Sullivan) (11/23/89)

From article <7169@sybase.sybase.com>, by tim@binky.sybase.com (Tim Wood):
> 
> IMO, the most beneficial thing would be for a publication
> like _Digital Review_ to conduct a benchmark.

_EE Times'_ SPEC is, I believe, working on this.  However, it will take
some time to get together.
-- 
Michael Sullivan          uunet!jarthur.uucp!aqdata!sullivan
aQdata, Inc.              aqdata!sullivan@jarthur.claremont.edu
San Dimas, CA

jkrueger@dgis.dtic.dla.mil (Jon) (11/23/89)

tim@binky.sybase.com (Tim Wood) writes:

>Your question puts us in something of a bind.  If we don't give data,
>folks can say, "shucks, we want data."  If we give data, folks might say, 
>"you're a vendor, we don't believe your data; besides, it doesn't measure my
>application--TP1's are meaningless."  And so on.  

Just provide measurements we can replicate, at least in principle.  
Cf. the MIPS Performance Brief.

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
Isn't it interesting that the first thing you do with your
color bitmapped window system on a network is emulate an ASR33?

dhepner@hpisod2.HP.COM (Dan Hepner) (11/29/89)

From: hargrove@harlie.sgi.com (Mark Hargrove)
>
>The client and server don't have to run on the same machine.  In fact,
>as Jon Forrest (correctly) points out, in the general case, you don't
>*want* them to run on the same machine.

How much this will buy you is directly dependent upon the distribution
of CPU cycle requirements between the clients and the server(s), and
the relative cost of remote vs local communication between the clients 
and the server.  

1. Is it your experience that more than 10% of the work is done by 
   the clients?

2. Is it your experience that remote communication costs don't end
   up chewing into the savings attained by moving the clients 
   somewhere else?

>(and in the extreme (and not at all impractical) case, you run each 
> client and each server on its own machine).  This model is simple, 
> elegant, and fundamentally right.

This would require basically a 50-50 split of the workload between
the client and server. A practical assumption? 
 
Dan Hepner

jkrueger@dgis.dtic.dla.mil (Jon) (11/30/89)

dhepner@hpisod2.HP.COM (Dan Hepner) writes:

>1. Is it your experience that more than 10% of the work is done by 
>   the clients?

Sometimes.  If it's only 10%, we may then assign 10 clients per server,
thus balancing the load.  Yes, the server load increases too, but not
proportionately; balance might be 12 or 15 clients per server.

>2. Is it your experience that remote communication costs don't end
>   up chewing into the savings attained by moving the clients 
>   somewhere else?

No, the lower bandwidth is more than offset by multiprocessing.
When this isn't true, you probably have a poorly partitioned problem,
not insufficient communiciations hardware.  The same profiling that
tells you who's shouldering more of the processing burden will also
reveal if both sides are waiting for communications.

>>(and in the extreme (and not at all impractical) case, you run each 
>> client and each server on its own machine).  This model is simple, 
>> elegant, and fundamentally right.

This isn't the extreme case.  Multiple processors can divide work
with better granularity than client and server processes.

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
Isn't it interesting that the first thing you do with your
color bitmapped window system on a network is emulate an ASR33?

hargrove@harlie.sgi.com (Mark Hargrove) (11/30/89)

In article <13520004@hpisod2.HP.COM> dhepner@hpisod2.HP.COM (Dan Hepner)
writes:

   From: hargrove@harlie.sgi.com (Mark Hargrove)
   >
   >The client and server don't have to run on the same machine.  In fact,
   >as Jon Forrest (correctly) points out, in the general case, you don't
   >*want* them to run on the same machine.

   How much this will buy you is directly dependent upon the distribution
   of CPU cycle requirements between the clients and the server(s), and
   the relative cost of remote vs local communication between the clients 
   and the server.  

Huh?  I'm afraid I don't understand what you're getting at wrt
"distribution of CPU cycle requirements".  Naturally, you are correct in
assuming that you do have to ponder communication costs when building
client-server models.

   1. Is it your experience that more than 10% of the work is done by 
      the clients?

10% of *what* work?  It's my experience that a client does 100% of the
work that's appropriate for the client to perform.  The content of
this work is highly variable -- it depends upon your application.  Clients
make requests to servers, and then do something with the results.  A
"typical" client-server application might have the client application
handling presentation, user interface, and flow of control, while making
requests of database servers (and perhaps of directory servers to locate
the DB servers, and perhaps of Kerberos-type servers to handle 
authentication, etc.)

   2. Is it your experience that remote communication costs don't end
      up chewing into the savings attained by moving the clients 
      somewhere else?

What do you mean by "remote"?  What do you mean by "cost"?  If my
client and server live on the same ethernet (or FDDI ring, or UltraNet
backbone) then no, I don't see a problem.  On the other hand, if I'm
communicating over a 300 baud dial-up network, then I'd better be pretty
sure my clients aren't impatient for responses from the servers.
Nevertheless, there are clear cases where even this slow link is
perfectly OK.  Try rephrasing your question :-).

   >(and in the extreme (and not at all impractical) case, you run each 
   > client and each server on its own machine).  This model is simple, 
   > elegant, and fundamentally right.

   This would require basically a 50-50 split of the workload between
   the client and server. A practical assumption? 

No!  Not at all.  I'm not sure you really understand the notion of
client-server.  Have you read Mike Harris' recent postings?  He gives
good examples of what client-server is all about.  I think *you're*
drifting in the direction of distributed processing, where a single
problem is broken up and shared by several machines.  This is NOT what
we're talking about here.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Mark Hargrove                                       Silicon Graphics, Inc.
email: hargrove@harlie.corp.sgi.com                 2011 N.Shoreline Drive
voice: 415-962-3642                                 Mt.View, CA 94039

tim@binky.sybase.com (Tim Wood) (12/01/89)

In article <13520004@hpisod2.HP.COM> dhepner@hpisod2.HP.COM (Dan Hepner) writes:
>From: hargrove@harlie.sgi.com (Mark Hargrove)
>>
>>The client and server don't have to run on the same machine.  In fact,
>>as Jon Forrest (correctly) points out, in the general case, you don't
>>*want* them to run on the same machine [...]
>>(and in the extreme (and not at all impractical) case, you run each 
>> client and each server on its own machine).  This model is simple, 
>> elegant, and fundamentally right.
>
>This would require basically a 50-50 split of the workload between
>the client and server. A practical assumption? 
> 

This seems a bit simplistic.  Database servers and client applications
do fundamentally different kinds of work.  Assuming an update-intensive 
workload, the server should try to keep the disks and the client connections 
busy.  That is, as the volume of requests rises, the I/O activity, and
commit rate, should rise linearly up to the saturation level of the I/O
system.  To speed up an I/O-saturated system, you add more I/O bandwidth.  
Conversely, if the CPU can't keep the disks busy under heavy workload,
you need a faster CPU (or a different DBMS :-).

The client side is concerned with very different issues, like window 
refresh strategies, presentation styles and local data analysis.  
All those things are very CPU-intensive.  

So, the different character of the client and server workloads makes
comparing them a case of apples vs. oranges, and underlines the 
benefit in client/server: each party in the relationship does a
few things well.
-TW

Sybase, Inc. / 6475 Christie Ave. / Emeryville, CA / 94608	  415-596-3500
tim@sybase.com          {pacbell,pyramid,sun,{uunet,ucbvax}!mtxinu}!sybase!tim
		This message is solely my personal opinion.
		It is not a representation of Sybase, Inc.  OK.

dhepner@hpisod2.HP.COM (Dan Hepner) (12/01/89)

From: jkrueger@dgis.dtic.dla.mil (Jon)
> 
> >1. Is it your experience that more than 10% of the work is done by 
> >   the clients?
> 
> Sometimes.  If it's only 10%, we may then assign 10 clients per server,
> thus balancing the load.  Yes, the server load increases too, but not
> proportionately; balance might be 12 or 15 clients per server.

In the example, if one moved 10 clients taking 10% of a 100% used CPU, 
we would simplistically end up with the client CPU 10% used, and 
the server CPU still 90%. Adding one more client, we would end up with 
a saturated system with 11 Clients on an 11% utilized client machine, 
while the server was now 99% used.  If this were so, it wouldn't 
seem either all that balanced, and probably a economically unjustifyable
move.  100+% increase in hardware cost yielding a 10% increase in
throughput.  I don't see where the 12 or 15 came from, but even if
true they don't seem on the surface to be all that good a deal.

> >2. Is it your experience that remote communication costs don't end
> >   up chewing into the savings attained by moving the clients 
> >   somewhere else?
> 
> No, the lower bandwidth is more than offset by multiprocessing.

Let's assume you have plenty of bandwidth, but not plenty of CPU
cycles at the server.  Remote communication, especially reliable remote 
comm, being more expensive than local communication.  The extreme of my 
concern would be illustrated if the remote communication costs at the server
end exceeded the processing/terminal handling done by the client, 
in which case one would actually lose by adding a remote machine 
for the clients. 

> >>(and in the extreme (and not at all impractical) case, you run each 
> >> client and each server on its own machine).  This model is simple, 
> >> elegant, and fundamentally right.
> 
> This isn't the extreme case.  Multiple processors can divide work
> with better granularity than client and server processes.

Maybe you can clarify.  The case in question was how frequently it would
practical to put each client and each server on its own machine, with
the assertion that if the client/server workload split weren't near
50-50, it wouldn't be practical. 

The points of confusion:
   1) "Multiple processors" can be ambiguous as to remoteness, but given
      the context I'll assume remoteness. (right?)
   2) Granularity. Are you postulating a flexible division of the work 
      between client and server?  A server which is flexibly divisible 
      over both machines?

I think all of these questions are facets of the same underlying question: 
how much of the typical application can be done at the client?

> -- Jon

Dan Hepner

jkrueger@dgis.dtic.dla.mil (Jon) (12/02/89)

dhepner@hpisod2.HP.COM (Dan Hepner) writes:

>From: jkrueger@dgis.dtic.dla.mil (Jon)
>> 
>> >1. Is it your experience that more than 10% of the work is done by 
>> >   the clients?
>> 
>> Sometimes.  If it's only 10%, we may then assign 10 clients per server,
>> thus balancing the load.  Yes, the server load increases too, but not
>> proportionately; balance might be 12 or 15 clients per server.

>In the example, if one moved 10 clients taking 10% of a 100% used CPU, 
>we would simplistically end up with the client CPU 10% used, and 
>the server CPU still 90%.

Perhaps I'm not making myself clear.  That's 10% per client.
10% of the work is done by the client; this client serves a
single user.  Each additional concurrent user gets another
client, which consumes another 10%, in this example.

>Adding one more client, we would end up with 
>a saturated system with 11 Clients on an 11% utilized client machine, 
>while the server was now 99% used.  If this were so, it wouldn't 
>seem either all that balanced, and probably a economically unjustifyable
>move.

All you're saying is that a two-process model doesn't scale well if
we're already bottlenecked on either process.  This is a tautology.

>100+% increase in hardware cost yielding a 10% increase in
>throughput.

Indeed, it's worse than that: the interconnects aren't free.
One doesn't win by distributing inherently sequential problems
that one doesn't know how to decompose.  Again, a tautology.

>> >2. Is it your experience that remote communication costs don't end
>> >   up chewing into the savings attained by moving the clients 
>> >   somewhere else?
>> 
>> No, the lower bandwidth is more than offset by multiprocessing.

>Let's assume you have plenty of bandwidth, but not plenty of CPU
>cycles at the server.  Remote communication, especially reliable remote 
>comm, being more expensive than local communication.

In exactly the same way that reading bytes off disks costs more
cycles than referencing memory, yes.  But compelling cases for
not requiring databases to reside in main memory can be made, no?

>The extreme of my 
>concern would be illustrated if the remote communication costs at the server
>end exceeded the processing/terminal handling done by the client, 
>in which case one would actually lose by adding a remote machine 
>for the clients. 

A valid concern.  Got any data?  Measured degradation in latencies?
Throughput?  I don't deny it can happen, just asking how often it
does.

And again, you're simply saying that sometimes costs of distributing
the load are greater than benefits achieved.  How true:  sometimes the
problem is intractable, or you don't know enough to decompose it, or
your tools are poor, or the implementation is poor.  Then you get the
biggest monoprocessor you can afford, indeed.  You've admitted you
can't work smarter, so you'd better work harder.

>> >>(and in the extreme (and not at all impractical) case, you run each 
>> >> client and each server on its own machine).  This model is simple, 
>> >> elegant, and fundamentally right.
>> 
>> This isn't the extreme case.  Multiple processors can divide work
>> with better granularity than client and server processes.

>Maybe you can clarify.  The case in question was how frequently it would
>practical to put each client and each server on its own machine, with
>the assertion that if the client/server workload split weren't near
>50-50, it wouldn't be practical. 

The usual assumption is that each client can get its own machine, but
the server has to share a single machine.  This makes the server the
bottleneck, in general.  It's also a bad assumption: multithreaded
servers can use multiprocessors to scale up, distributed DBMS can use
distributed hosts to execute queries, and parallel servers can apply
processors to each component of each query.  The first two animals
exist now.

>The points of confusion:
>   1) "Multiple processors" can be ambiguous as to remoteness, but given
>      the context I'll assume remoteness. (right?)

Wrong, as in previous graf.

>   2) Granularity. Are you postulating a flexible division of the work 
>      between client and server?  A server which is flexibly divisible 
>      over both machines?

Nope, a flexible approach to designing database engines.  Remember,
your query language can't tell the difference anyway.

>I think all of these questions are facets of the same underlying question: 
>how much of the typical application can be done at the client?

Fair question, but needlessly special.  The general question is how
can we divide up work, and what tools do we need, and how many of
them exist yet?

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
Isn't it interesting that the first thing you do with your
color bitmapped window system on a network is emulate an ASR33?

dhepner@hpisod2.HP.COM (Dan Hepner) (12/02/89)

From: hargrove@harlie.sgi.com (Mark Hargrove)
> 
>    >(and in the extreme (and not at all impractical) case, you run each 
>    > client and each server on its own machine).  This model is simple, 
>    > elegant, and fundamentally right.
> 
>    This would require basically a 50-50 split of the workload between
>    the client and server. A practical assumption? 

> Have you read Mike Harris' recent postings?  He gives
> good examples of what client-server is all about.  I think *you're*
> drifting in the direction of distributed processing, where a single
> problem is broken up and shared by several machines.  This is NOT what
> we're talking about here.

Why would "you run each client and each server on its own machine" 
if not to break a single problem up to be shared by more than one
machine?  Indeed, I'm talking about distributed processing, but I 
disagree with your protest that I'm alone. Without a distributed 
processing goal, what is the point of a C/S architecture?

[Mark points out miscommunication]

Clearly it's pointless to dispute the definitions of words, so let's
get to some meat.  Here's my claim:

1) If I need to do a set of several tasks, this becomes my "problem". 
   Now I can probably break this problem easily into Mark's "single 
   problems", or tasks which share no common resource. I 
   can place each of those problem solutions on a different machine,
   and establish a communications network allowing me access to each,
   from my terminal, or whatever I have on my desk.  I may even have
   intelligence on my desk to translate my high level human requests 
   into a complex series of interaction with those solution machines.

   If you'd like me to stop here, and call that the fulfillment of a C/S 
   architecture, some might contest how important your definition of C/S was.
   You might also call this distributed processing, but some might
   contest how important your definition of distributed processing was.

2) We may not agree on whether we've been discussing distributed processing,
   but maybe we'll agree on the desireability of distributed processing, 
   even to the extent of 
            Mike Harris> Imagine a world where everything was a PC.

3) The essense of distributed processing is, to use Mark's
   terminology, "breaking single problems into into parts which 
   can be worked on by different machines", where a "single problem"
   implies a shared resource, e.g. a database.

4) The division of any "single problem" into a C/S allows for
   distribution, by allowing the client and server to be on
   different machines (serial connected, ethernet if you like).
   This I claim is the non-trivial definition of C/S, and indeed
   the primary purpose of doing so.  This definition is certainly
   meaningful across an arbitrary number of levels of abstraction.

5) The value of that C/S distribution will be proportional to the
   percentage of the problem cost, as typically measured by
   CPU cycles, which is moved into the client.  Dividing a
   problem into 10% client and 90% server isn't all that valuable.
   Doing so across 10 layers of abstraction is demonstrably ridiculous. 
   [Yes, one should consider the security advantages in some C/S 
   divisions to be of some value]

   On the other hand, dividing a problem into 90% client, and 
   10% server would yield an immediate potential for a 10X improvement
   using equal technology hardware.  And doing so across 10 layers
   of abstraction would achieve the distributed dream: a world of
   PCs.  This dream is what drives statements which describe C/S 
   as  "simple, elegant, and fundamentally right".

6) Unfortunately, the state of the art C/S divisions actually available 
   from vendors usually end up without much work being done by the
   client, and many such products IMHO would be better off without
   having bothered.  Hopefully this will change with time.  If you have 
   any notion of a world of only PCs, you gotta hope so too.

Dan Hepner
dhepner@hpda.hp.com

Disclaimer: HP may well disagree with every word of this opinion.