[comp.arch] Fastest Kernel Make

ram@shukra.UUCP (08/30/87)

Hi,

    I am curious about kernel make times. Last week I had been to the
    local Unix user group (called SVnet - Silicon Valley ...) where
    Ron Parson - Sequent's  Marketing Mgr gave a presentation on
    Balance 21XXX.  He claimed that a kernel make on the Balance
    is ~3hrs.  I have seen ads by Amdahl which claims 3 minutes.
    I know that this issue was raised by John sometime back.  I 
    am on the hunt for kernel make numbers.  Anybody knowing or 
    having those number be kind enough to mail/post the numbers.

    I know that the make time is subject to number & types of drivers
    etc.  Inlcude your number & whatever other details you want to.

---------------------
   Renu Raman				ARPA:ram@sun.com
   Sun Microsystems			UUCP:{ucbvax,seismo,hplabs}!sun!ram
   M/S 5-40, 2500 Garcia Avenue,
   Mt. View,  CA 94043

malcolm@spar.UUCP (08/31/87)

In article <26853@sun.uucp> ram%shukra@Sun.COM (Renu Raman, Sun Microsystems) 
>    I am curious about kernel make times. Last week I had been to the
>    local Unix user group (called SVnet - Silicon Valley ...) where
>    Ron Parson - Sequent's  Marketing Mgr gave a presentation on
>    Balance 21XXX.  He claimed that a kernel make on the Balance
>    is ~3hrs.  
This number (3 hours) was to make the ENTIRE system.  This includes
the kernel, man pages, user programs and everything else.  Amdahl (and
previous discussions on the net) were talking about just the kernel.

								Malcolm

henry@utzoo.UUCP (Henry Spencer) (08/31/87)

>    I know that the make time is subject to number & types of drivers
>    etc.  Inlcude your number & whatever other details you want to.

It also depends on the flavor of kernel.  A V6 kernel compiled on, say,
a Sun 4 (assuming you could get the V6 kernel past a modern C compiler!)
would probably beat the Amdahl record, simply because the V6 kernel wasn't
very big.
-- 
"There's a lot more to do in space   |  Henry Spencer @ U of Toronto Zoology
than sending people to Mars." --Bova | {allegra,ihnp4,decvax,utai}!utzoo!henry

parsons@sequent.UUCP (Ron V. Parsons) (09/03/87)

I'm Ron Parsons, the Technical Marketing Manager for Sequent Computers
Renu Raman of Sun Microsystems referred to me talking about kernel
make numbers.  Here is some more data on the subject:


      Building the DYNIX system represents a significant
      amount  of work.  The DYNIX system includes all of
      4.2bsd (with parallel enhancements)  and  much  of
      AT&T  System V Release 2.2.  There are almost 6000
      files in the DYNIX binary distribution. 

      Of  these, over 3000 must be compiled from C
      source, almost 300 must be interpreted by make and
      60  must  be directly assembled.  There are almost
      4000 compilations  and  assemblies  and  over  600
      invocations  of  the nroff text formatter to build
      the on-line documentation.

      Low-effort, large-grained parallelization  of  the
      make  utility  reduced  the time required to build
      the DYNIX system on the Balance 8000 computer by a
      factor  of seven point five from the single-stream
      version of make.  Table 1 shows the  DYNIX  system
      build times on  a  VAX(tm)11/750  and  on  various
      hardware configurations of the Balance  8000  sys-
      tem.   The  percentage  of CPU usage indicates how
      well  the  build  is   utilizing   the   available
      resources.   As  expected,  highly parallel builds
      use a greater percentage of the available CPU time
      in  both  monoprocessor  and multiprocessor confi-
      gurations.

                    Table 1
      DYNIX 2.0 build times and CPU usage


_______________________________________________________________________________
|              |    Single-stream  |  Modest parallelism|  Maximal parallelism|
|              |        (-P1)      |        (-P2)       |         (-P4)       |
|Config        |__________|________|__________|_________|__________|__________|
|              |   Time   | CPU use|   Time   | CPU use |   Time   |  CPU use |
|              |  (hh:mm) |        |  (hh:mm) |         |  (hh:mm) |          |
|______________|__________|________|__________|_________|__________|__________|
| VAX11/750    |   30:25  |   85%  |   28:00  |    94%  |     -    |     -    |
|______________|__________|________|__________|_________|__________|__________|
| Balance 8000 |   22:30  |   90%  |     -    |    -    |     -    |     -    |
| with 1 proc  |          |        |          |         |          |          |
|______________|__________|________|__________|_________|__________|__________|
| Balance 8000 |     -    |    -   |   10:42  |   185%  |     -    |     -    |
| with 2 proc  |          |        |          |         |          |          |
|______________|__________|________|__________|_________|__________|__________|
| Balance 8000 |     -    |    -   |    5:30  |   374%  |   4:14   |   505%   |
| with 6 procs |          |        |          |         |          |          |
|______________|__________|________|__________|_________|__________|__________|
| Balance 8000 |     -    |    -   |    -     |    -    |   3:03   |   711%   |
| with 12 procs|          |        |          |         |          |          |
|______________|__________|________|__________|_________|__________|__________|



    This table presents the build times (in real time)  for  the
    VAX11/750 and for various configurations of the Balance 8000
    system.  The percent CPU usage for each build is also given.
    The  VAX11/750 configuration includes 8 Mbytes of memory and
    Fujitsu Eagle disks.  The Balance  8000  configurations  in-
    clude  10  Mbytes of memory and Fujitsu Eagle disks.  The -P
    values indicate the relative amounts of parallelism for each
    build  
    
    CPU usage for  multiprocessor  configurations  reflects  the
    percentage of a single CPU (i.e. 505% CPU use in a 6 proces-
    sor system is equivalent to 100% usage of 5.05 CPUs).

steve@nuchat.UUCP (Steve Nuchia) (09/04/87)

In article <8526@utzoo.UUCP>, henry@utzoo.UUCP (Henry Spencer) writes:
[things kernel make times depend on...]
> It also depends on the flavor of kernel.  A V6 kernel compiled on, say,
> a Sun 4 (assuming you could get the V6 kernel past a modern C compiler!)
> would probably beat the Amdahl record, simply because the V6 kernel wasn't
> very big.

Being a software developer, the one benchmark that I pay a lot of
attention to is the compiler.  I once had the immence pleasure of
developing serious code on a sun, and made extensive use of the
graphical system performance monitoring facilities in support of
that project.  It is instructive to observe how very little of the
elapsed time of a compile is spent at 100% cpu utilization and how
much at maximum I/O bandwidth.

The "make <something large>" benchmark will usually tell you more
about the I/O performance of the machine than it does about the
processor.  That is a good thing, since I/O performance (including
system call overhead, of course) is a pretty important system parameter.

Using make as a benchmark we were able to quantify the braindamage of
quite a few machines, some of which made claims of being high-performance.

The specific benchmark we used at my former employer was to compile
the product, which weighed in at about 30,000 lines of C and made
four or five large programs and a couple of dozen smaller ones.  We
then ran a comprehensive set of automated tests on the resulting
product.  The whole job took 8 - 12 hours on the typical "super-micro"
of 2 - 3 years ago.

Interresting thing about it, as with most benchmarks, are the standout
exceptions.  We thought the pyramid was fast when it turned in a time
of around three hours.  Then we did an Amdahl, and the whole thing,
including reading and writing the tapes, took 45 minutes on a heavily
loaded system.

The testing part would normally display the faked user interactions
as it went along, so serial I/O was included in the timing.  For
most machines this didn't make much difference; they couldn't work
fast enough to saturate a 9600 baud line.  Then we got our first
68020 box, the convergent mightyframe (this was well over a year ago).
It was so fast it could run the testing on two 9600 baud terminals
in the same elapsed time as one.  It was spending less than half the
cpu and I/O keeping each 9600 baud line saturated.

Then there were the losers.  The old Radio Shack 16b's had a z80
handling all the I/O, including the hard disk.  That thing took
over 24 hours to run the whole cycle.  And the Arete' box, which
is supposed to have a lot of I/O power, failed.  The master processor
couldn't handle the system call overhead to keep the slave working.
The box we looked at had 2 68010s, and they claim the '020s work
better, but we didn't buy the box.  The plexus uniprocessor machine
with its intelligent I/O subsystems outperformed it on the software
developer's benchmark.  That evaluation cemented my conviction that
multiprocessor architectures should avoid distinguishing among the
processors.  Now I want a sequent.

I should point out that the compiler/linker system provided with
most unix boxes is not well balanced in its resourse usage.  If
you look at what it's doing, it spends a lot of time in a large
make (say, 2.11 news) forking and execing the myriad phases of
the compiler, which then must read the data that the old phase
just wrote.  Engineering improvements in the compiler software
could eliminate its sensitivity to I/O bandwidth.  The benchmark
is very sensitive to things like setting the sticky bit on the
compiler phases, too.

-- 
Steve Nuchia			Of course I'm respectable!  I'm old!
{soma,academ}!uhnix1		Politicians, ugly buildings, and whores
!nuchat!steve			all get respectable if they last long enough.
(713) 334 6720				- John Huston, Chinatown

aegl@root44.UUCP (09/06/87)

We have a sequent balance 21000 with 8 processors which we now use as a
cross devlopment enviroment for UniPlus+ (Trademark of UniSoft Corporation).
During the day when everyone is working on it use of the "-P" flag to make
to use more processors is mildly frowned upon - but in the evening or at
weekends if I'm the only person on I like to speed things up a bit. I used
to just throw in "-P8" to use all 8 processors ... but then I watched the
fancy light display on the front of the cabinet and noticed a fair amount
of flicker in the processor activity lights ... Ah! I thought the C compiler
isn't completely CPU bound (with a 4.2 fast file system and sequents whizzo
disk controller and loads of buffers its close but there's still a little
bit of i/o) - so I tried plotting a graph of real time vs. -Pn - and found
the lowest real time came out at about -P12. So if the people at sequent
trying for the 1 minute kernel compile haven't tried it yet they should try
using '-P' values a little higher than the number of processors they have.

-Tony Luck
--------------
Disclaimer: I don't work for sequent (but if you rush out and buy one today
tell them I sent you - perhaps I'll get lucky and get some commission from
them - I'll split it with you ... honest!)