[comp.software-eng] style guides and testing tools comments and requests

dt@nixtdc.uucp (David Tilbrook) (12/15/90)

Embarrassing as it may be for a Unix-old-timer to admit, I am not
sure that this will get out, so if someone who knows me reads
this, please acknowledge that they did so.

Due to the wide range of issues covered in this posting (am I
violating news etiquette?) the following a summary of the
discussions and requests that follow:

Brief comments on:

- Does testing belong in this newsgroup?

- Are programming style standards relevant?

Requests for information on experience with, opinions of,
availability of, or solutions for:

- Barton Miller's fuzz package.

- test data generators that reuse randomly generated data

- ways to convert rand() into desired distribution and express
  conversion

- references to schemes that embed documentation and testing specs
  in source files.

- using write-once data storage systems for a version system

- request for statistics on versioning system use

- is it a platform, machine, configuration, or environment, and
  how does one name it?

- strategies for coping with interface discrepancies

First w.r.t.  to discussions (D) and my comments (C):

D) Does testing belong in this newsgroup?

C) God I hope so ...  if a software-engineering grope [sic] isn't
   concerned with testing, who is?  Testing is an integral part of
   the software process, whatever model one uses.

D) Are programming style standards relevant?

C) Only if you can test them and/or they help in the testing.  I
   welcome more discussion on this topic, particularly w.r.t.
   what belongs in a style guide, and, most importantly, why.

Request: Does anybody have, or know how to get a copy of fuzz and
     ptyjig (the testing tools described in: ``An Empirical Study
     of Reliability of Operating System Utilities'', Barton P,.
     Miller et al, Usenix Software Management Workshop, 1989.) Any
     experience with this package would be appreciated.  Better
     yet would how I could acquire a copy.  (Bart - are you
     there?)

Request: I have a reverse grammer generator (tdg) that we use to
     generate test data.  However, it does not have a way of
     saving generated information for later incorporation.  If
     anyone has experience with a random test-data generator that
     can save and reuse previously randomly generated data, I
     would appreciate discussion of such experience and perhaps
     some evaluation of how to control it.  For example, I
     recently used tdg to generate random requests of a
     network-wide database server by generating shell commands
     that requested appends, replacements, and deletes.  This
     worked a treat, but had the problem that the range of
     database keys had to be limited to ensure that the deletes
     would occasionally try to delete an existent key.  Any ideas?
     Any P.D.  tools?  (tdg is available as part of the EUUG
     (whoops I mean EurOpen) Fall 90 distribution.)

Request: Does anyone have a nice way of turning rand() results
     into various string specified distributions for incorporation
     into a test data generator?  I would be particularly
     interested in a mechanism to express a normal or student-t
     distribution over a limited range ...  nothing too
     sophisticated ...  think of generating words, or sentences of
     normal length to test an editor, or a duration in minutes for
     a phone call (student-t).  The distribution needs to be
     expressed as a simple string so that it could be given as an
     argument to the generator.

Request: I am writing a paper on the way we embed of data-base
     entries into source code to contain the documentation,
     interface specifications, and regression testing information
     for the related module.  Any references to other such schemes
     (e.g., Mangle, Sob, and Wheeze) would be appreciated.  If
     there is interest, I will post a brief overview of my work
     and our experience in applying this work to our product, most
     of which is are large set (750 modules) of subroutines.

Request: I am currently doing a requirements analysis of a
     versioning system.  One aspect of this research is the
     problems inherent in using a write-once storage system.  For
     obvious reasons, using the change recording mechanisms used
     by either SCCS or RCS would not be satisfactory to anyone
     other than a WORM salesperson (would consume disks at an
     unaffordable rate).  So system should probably use some sort
     of mechanism that did not rewrite existing information ...
     however, the system must retrieve any version of the source
     in equal time (like SCCS but unlike RCS).  Any comments?  Any
     interest?

Request: As part of the aforementioned research, I need statistics
     on the use of versioning systems for long running systems.
     For example, I was recently informed that for one package
     running for some five years, the average number of versions
     per module was 80 and the average number of branches was 17
     -- that's right seventeen!  Within our group, there have been
     44,000 deltas made to some 6,000 modules over the last 18
     months, although many of the modules have died or been
     renamed, and thousands of the deltas are minor cosmetics
     (e.g., changing the name of our company from Nixdorf to
     Siemens Nixdorf ...).  My numbers are somewhat inflated as a
     initial file creation, file renaming each count for 2 deltas.
     I would really appreciate if people would send me such raw
     statistics from their own experience.

Request: Our product is built from a single source system on nine
     platforms simultaneously and has been or needs to be ported
     to many more.  There is a single parameterization file for
     each build tree and one setting within this file - namely the
     name/type of the platform - is used to map to the
     capabilities or select appropriate facilities throughout the
     source. The first request is for a clarification of
     terminology: What noun does one use to refer to the class of
     names that state: machine type; the operating system type and
     its release, version, and/or flavour?  Is this a
     ``platform'', a ``configuration'', a ``system'', an
     ``environment'', or a ``whats_it''?  Is there any sort of
     standard for the assignment of a name within this class?  Is
     there a need for a universal registry of such names?  My own
     scheme uses a simple concatenation of a cpu brand name, the
     vendor's specific OS version number and/or name, an underbar,
     and the closest flavour of a ``standard'' operating system
     (e.g., ``v?', ``4.?bsd'', ``unix5.?'').  The software tools
     that I use provide mechanisms to do shell like pattern
     matches against this name to do selection or suppression.
     Conforming to a widely accepted naming standard is obviously
     desirable.

Request: Inherent in the facilities/capabilities configuration
     when porting to a large number of platforms is identifying
     the subtle differences that arise between different releases
     of the same base system.  Fortunately, my experience is that
     the number of these subtle differences are usually small or
     detectable by the compilers.  Unfortunately most suppliers
     are very poor at identifying and/or documenting the
     differences that are not benign or detectable (e.g., the
     differing semantics of fopen(file, "a+")).  If readers have
     opinions on how these should be identified and resolved, I
     would be interested in hearing their views.  What is required
     is a strategy that can be applied to deal with such
     discrepancies, as they constantly arise and one wants a
     mechanism that can be applied quickly, while ensuring that
     any previous port is not going to be broken.  For example,
     the ``#ifdef <header_file_manifest>'' tactic is popular, but
     is frequently inadequate and can get excessively cumbersome
     without an #elif construct, or in situations where there are
     a large number (i.e., >= 3) of variations.  Furthermore there
     are many examples of situations where it just does not work.
     For example, my D-Tree used to use:

	  #include <envir/sys_stat.h> /* map to appropriate stat.h */
	  ...
	  #ifdef S_IFLNK
		  /* assume lstat(2), symlink() provided */
	  #else
		  /* assume lstat(2), symlink() NOT provided */
	  #endif

     This worked on some forty to fifty ports, until I encountered
     a system for which S_IFLNK was defined but lstat(2) was not
     provided.

Well, that's enough for now - apologies for the wide range of
topics raised, but I chose to do one posting rather than eight.

If you want to raise a discussion on any of the above issues,
please do so.  For those issues for which I receive data, I will
endeavour to post summaries.
-- 

-----------------------------------
David Tilbrook
Siemens Nixdorf Information Systems Ltd.

rh@smds.UUCP (Richard Harter) (12/21/90)

In article <1990Dec15.071555.14971@nixtdc.uucp>, dt@nixtdc.uucp (David Tilbrook) writes:

> Request: I am currently doing a requirements analysis of a
>      versioning system.  One aspect of this research is the
>      problems inherent in using a write-once storage system.  For
>      obvious reasons, using the change recording mechanisms used
>      by either SCCS or RCS would not be satisfactory to anyone
>      other than a WORM salesperson (would consume disks at an
>      unaffordable rate).  So system should probably use some sort
>      of mechanism that did not rewrite existing information ...
>      however, the system must retrieve any version of the source
>      in equal time (like SCCS but unlike RCS).  Any comments?  Any
>      interest?

Surprisingly enough a sequential update system, which incorrect rumor
credits with terrible performance, will do what you want quite nicely.
The general scheme runs as follows:  Store the original file with
(implictly) numbered records in order.  Store the deltas as units.
The only trick is that each new number gets the next insertion number
as an ID.  Inserts are stated as inserts after previous record n,
deletes are stated as deletes of previous records using absolute
record numbers.  You can then rebuild any prior version using a pick
and choose strategy.

At first sight it would seem that this would be expensive.  However,
_if you arrange the processing correctly_, each record is only accessed
once -- the cost is equivalent to the cost for using the SCCS scheme.

The reason that sequential update reconstruction has a bad reputation
is that it is very expensive if you rebuild each intermediate version
as a complete file -- the cost being that a record is moved many times,
once for each intermediate version.  However the actual cost need only
be proportional to the total number of records.
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

rh@smds.UUCP (Richard Harter) (12/21/90)

In article <1990Dec15.071555.14971@nixtdc.uucp>, dt@nixtdc.uucp (David Tilbrook) writes:

> Request: As part of the aforementioned research, I need statistics
>      on the use of versioning systems for long running systems.
>      For example, I was recently informed that for one package
>      running for some five years, the average number of versions
>      per module was 80 and the average number of branches was 17
>      -- that's right seventeen!  Within our group, there have been
>      44,000 deltas made to some 6,000 modules over the last 18
>      months, although many of the modules have died or been
>      renamed, and thousands of the deltas are minor cosmetics
>      (e.g., changing the name of our company from Nixdorf to
>      Siemens Nixdorf ...).  My numbers are somewhat inflated as a
>      initial file creation, file renaming each count for 2 deltas.
>      I would really appreciate if people would send me such raw
>      statistics from their own experience.

80 versions per module is bizarre.  SMDS markets Aide-De-Camp, which
is a configuration management/version control/etc package.  The product
includes a report program which produces these kinds of statistics.
From time to time some of our customers send us summary reports so
we have a fair picture of what kind of numbers to expect.  Typical
numbers for long term projects (5+ years) are more like 4.5 deltas
per file (not counting file creation or renaming).  Your numbers are
representative.  Interestingly enough the ratio is fairly stable over
time.  The reason seems to be that most files are stable after a few
changes with a minority of files being hot spots.  For example (this
is from an old report in our literature) out of 834 files we have

	# deltas	# files
	   0		  295
	   1		   99
	   2		   51
	   3		   51
	   4		   48
	   5		   49
	   7		   46
	   8		   36
	   9		   23
	  10		   12

with a long tail.  (Read the above as 295 files not changed, 99
changed once, 51 changed twice, etc.)

The reason for the inflated numbers in the project you referenced
may be that they store deltas after every editing session automatically
or that they have some kind of automated tool that records changes
every night.  I have to admit that this explanation is inconistent
with the reported number of branches.  I hate to say that they simply
do awful software engineering on the basis of a couple of numbers but
that does seem like the most likely explanation.

The numbers from ADC controlled projects might be a little bit lower
than industry norm (this is a guess).  The reason is that it is usually
used in the context of an established change control policy which tends
to lower the total number of deltas.  Basically what happens is that
if you use change orders (work orders) with controlled sign in (check in)
you get fewer incomplete (incorrect) deltas.
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

pcg@cs.aber.ac.uk (Piercarlo Grandi) (12/26/90)

On 15 Dec 90 07:15:55 GMT, dt@nixtdc.uucp (David Tilbrook) said:

dt> - using write-once data storage systems for a version system

The classic reference is (more or less): Reid, "Naming and
Sycnhronization in a distributed computer network", an MIT LCS TR.

The MIT people have done a lot of further research into versioning
systems for WORMs. Anything that is about "SWALLOW", a WORM versioning
repository, is the right thing.  Scan a list of MIT LCS TRs.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk