dt@nixtdc.uucp (David Tilbrook) (12/15/90)
Embarrassing as it may be for a Unix-old-timer to admit, I am not sure that this will get out, so if someone who knows me reads this, please acknowledge that they did so. Due to the wide range of issues covered in this posting (am I violating news etiquette?) the following a summary of the discussions and requests that follow: Brief comments on: - Does testing belong in this newsgroup? - Are programming style standards relevant? Requests for information on experience with, opinions of, availability of, or solutions for: - Barton Miller's fuzz package. - test data generators that reuse randomly generated data - ways to convert rand() into desired distribution and express conversion - references to schemes that embed documentation and testing specs in source files. - using write-once data storage systems for a version system - request for statistics on versioning system use - is it a platform, machine, configuration, or environment, and how does one name it? - strategies for coping with interface discrepancies First w.r.t. to discussions (D) and my comments (C): D) Does testing belong in this newsgroup? C) God I hope so ... if a software-engineering grope [sic] isn't concerned with testing, who is? Testing is an integral part of the software process, whatever model one uses. D) Are programming style standards relevant? C) Only if you can test them and/or they help in the testing. I welcome more discussion on this topic, particularly w.r.t. what belongs in a style guide, and, most importantly, why. Request: Does anybody have, or know how to get a copy of fuzz and ptyjig (the testing tools described in: ``An Empirical Study of Reliability of Operating System Utilities'', Barton P,. Miller et al, Usenix Software Management Workshop, 1989.) Any experience with this package would be appreciated. Better yet would how I could acquire a copy. (Bart - are you there?) Request: I have a reverse grammer generator (tdg) that we use to generate test data. However, it does not have a way of saving generated information for later incorporation. If anyone has experience with a random test-data generator that can save and reuse previously randomly generated data, I would appreciate discussion of such experience and perhaps some evaluation of how to control it. For example, I recently used tdg to generate random requests of a network-wide database server by generating shell commands that requested appends, replacements, and deletes. This worked a treat, but had the problem that the range of database keys had to be limited to ensure that the deletes would occasionally try to delete an existent key. Any ideas? Any P.D. tools? (tdg is available as part of the EUUG (whoops I mean EurOpen) Fall 90 distribution.) Request: Does anyone have a nice way of turning rand() results into various string specified distributions for incorporation into a test data generator? I would be particularly interested in a mechanism to express a normal or student-t distribution over a limited range ... nothing too sophisticated ... think of generating words, or sentences of normal length to test an editor, or a duration in minutes for a phone call (student-t). The distribution needs to be expressed as a simple string so that it could be given as an argument to the generator. Request: I am writing a paper on the way we embed of data-base entries into source code to contain the documentation, interface specifications, and regression testing information for the related module. Any references to other such schemes (e.g., Mangle, Sob, and Wheeze) would be appreciated. If there is interest, I will post a brief overview of my work and our experience in applying this work to our product, most of which is are large set (750 modules) of subroutines. Request: I am currently doing a requirements analysis of a versioning system. One aspect of this research is the problems inherent in using a write-once storage system. For obvious reasons, using the change recording mechanisms used by either SCCS or RCS would not be satisfactory to anyone other than a WORM salesperson (would consume disks at an unaffordable rate). So system should probably use some sort of mechanism that did not rewrite existing information ... however, the system must retrieve any version of the source in equal time (like SCCS but unlike RCS). Any comments? Any interest? Request: As part of the aforementioned research, I need statistics on the use of versioning systems for long running systems. For example, I was recently informed that for one package running for some five years, the average number of versions per module was 80 and the average number of branches was 17 -- that's right seventeen! Within our group, there have been 44,000 deltas made to some 6,000 modules over the last 18 months, although many of the modules have died or been renamed, and thousands of the deltas are minor cosmetics (e.g., changing the name of our company from Nixdorf to Siemens Nixdorf ...). My numbers are somewhat inflated as a initial file creation, file renaming each count for 2 deltas. I would really appreciate if people would send me such raw statistics from their own experience. Request: Our product is built from a single source system on nine platforms simultaneously and has been or needs to be ported to many more. There is a single parameterization file for each build tree and one setting within this file - namely the name/type of the platform - is used to map to the capabilities or select appropriate facilities throughout the source. The first request is for a clarification of terminology: What noun does one use to refer to the class of names that state: machine type; the operating system type and its release, version, and/or flavour? Is this a ``platform'', a ``configuration'', a ``system'', an ``environment'', or a ``whats_it''? Is there any sort of standard for the assignment of a name within this class? Is there a need for a universal registry of such names? My own scheme uses a simple concatenation of a cpu brand name, the vendor's specific OS version number and/or name, an underbar, and the closest flavour of a ``standard'' operating system (e.g., ``v?', ``4.?bsd'', ``unix5.?''). The software tools that I use provide mechanisms to do shell like pattern matches against this name to do selection or suppression. Conforming to a widely accepted naming standard is obviously desirable. Request: Inherent in the facilities/capabilities configuration when porting to a large number of platforms is identifying the subtle differences that arise between different releases of the same base system. Fortunately, my experience is that the number of these subtle differences are usually small or detectable by the compilers. Unfortunately most suppliers are very poor at identifying and/or documenting the differences that are not benign or detectable (e.g., the differing semantics of fopen(file, "a+")). If readers have opinions on how these should be identified and resolved, I would be interested in hearing their views. What is required is a strategy that can be applied to deal with such discrepancies, as they constantly arise and one wants a mechanism that can be applied quickly, while ensuring that any previous port is not going to be broken. For example, the ``#ifdef <header_file_manifest>'' tactic is popular, but is frequently inadequate and can get excessively cumbersome without an #elif construct, or in situations where there are a large number (i.e., >= 3) of variations. Furthermore there are many examples of situations where it just does not work. For example, my D-Tree used to use: #include <envir/sys_stat.h> /* map to appropriate stat.h */ ... #ifdef S_IFLNK /* assume lstat(2), symlink() provided */ #else /* assume lstat(2), symlink() NOT provided */ #endif This worked on some forty to fifty ports, until I encountered a system for which S_IFLNK was defined but lstat(2) was not provided. Well, that's enough for now - apologies for the wide range of topics raised, but I chose to do one posting rather than eight. If you want to raise a discussion on any of the above issues, please do so. For those issues for which I receive data, I will endeavour to post summaries. -- ----------------------------------- David Tilbrook Siemens Nixdorf Information Systems Ltd.
rh@smds.UUCP (Richard Harter) (12/21/90)
In article <1990Dec15.071555.14971@nixtdc.uucp>, dt@nixtdc.uucp (David Tilbrook) writes: > Request: I am currently doing a requirements analysis of a > versioning system. One aspect of this research is the > problems inherent in using a write-once storage system. For > obvious reasons, using the change recording mechanisms used > by either SCCS or RCS would not be satisfactory to anyone > other than a WORM salesperson (would consume disks at an > unaffordable rate). So system should probably use some sort > of mechanism that did not rewrite existing information ... > however, the system must retrieve any version of the source > in equal time (like SCCS but unlike RCS). Any comments? Any > interest? Surprisingly enough a sequential update system, which incorrect rumor credits with terrible performance, will do what you want quite nicely. The general scheme runs as follows: Store the original file with (implictly) numbered records in order. Store the deltas as units. The only trick is that each new number gets the next insertion number as an ID. Inserts are stated as inserts after previous record n, deletes are stated as deletes of previous records using absolute record numbers. You can then rebuild any prior version using a pick and choose strategy. At first sight it would seem that this would be expensive. However, _if you arrange the processing correctly_, each record is only accessed once -- the cost is equivalent to the cost for using the SCCS scheme. The reason that sequential update reconstruction has a bad reputation is that it is very expensive if you rebuild each intermediate version as a complete file -- the cost being that a record is moved many times, once for each intermediate version. However the actual cost need only be proportional to the total number of records. -- Richard Harter, Software Maintenance and Development Systems, Inc. Net address: jjmhome!smds!rh Phone: 508-369-7398 US Mail: SMDS Inc., PO Box 555, Concord MA 01742 This sentence no verb. This sentence short. This signature done.
rh@smds.UUCP (Richard Harter) (12/21/90)
In article <1990Dec15.071555.14971@nixtdc.uucp>, dt@nixtdc.uucp (David Tilbrook) writes: > Request: As part of the aforementioned research, I need statistics > on the use of versioning systems for long running systems. > For example, I was recently informed that for one package > running for some five years, the average number of versions > per module was 80 and the average number of branches was 17 > -- that's right seventeen! Within our group, there have been > 44,000 deltas made to some 6,000 modules over the last 18 > months, although many of the modules have died or been > renamed, and thousands of the deltas are minor cosmetics > (e.g., changing the name of our company from Nixdorf to > Siemens Nixdorf ...). My numbers are somewhat inflated as a > initial file creation, file renaming each count for 2 deltas. > I would really appreciate if people would send me such raw > statistics from their own experience. 80 versions per module is bizarre. SMDS markets Aide-De-Camp, which is a configuration management/version control/etc package. The product includes a report program which produces these kinds of statistics. From time to time some of our customers send us summary reports so we have a fair picture of what kind of numbers to expect. Typical numbers for long term projects (5+ years) are more like 4.5 deltas per file (not counting file creation or renaming). Your numbers are representative. Interestingly enough the ratio is fairly stable over time. The reason seems to be that most files are stable after a few changes with a minority of files being hot spots. For example (this is from an old report in our literature) out of 834 files we have # deltas # files 0 295 1 99 2 51 3 51 4 48 5 49 7 46 8 36 9 23 10 12 with a long tail. (Read the above as 295 files not changed, 99 changed once, 51 changed twice, etc.) The reason for the inflated numbers in the project you referenced may be that they store deltas after every editing session automatically or that they have some kind of automated tool that records changes every night. I have to admit that this explanation is inconistent with the reported number of branches. I hate to say that they simply do awful software engineering on the basis of a couple of numbers but that does seem like the most likely explanation. The numbers from ADC controlled projects might be a little bit lower than industry norm (this is a guess). The reason is that it is usually used in the context of an established change control policy which tends to lower the total number of deltas. Basically what happens is that if you use change orders (work orders) with controlled sign in (check in) you get fewer incomplete (incorrect) deltas. -- Richard Harter, Software Maintenance and Development Systems, Inc. Net address: jjmhome!smds!rh Phone: 508-369-7398 US Mail: SMDS Inc., PO Box 555, Concord MA 01742 This sentence no verb. This sentence short. This signature done.
pcg@cs.aber.ac.uk (Piercarlo Grandi) (12/26/90)
On 15 Dec 90 07:15:55 GMT, dt@nixtdc.uucp (David Tilbrook) said: dt> - using write-once data storage systems for a version system The classic reference is (more or less): Reid, "Naming and Sycnhronization in a distributed computer network", an MIT LCS TR. The MIT people have done a lot of further research into versioning systems for WORMs. Anything that is about "SWALLOW", a WORM versioning repository, is the right thing. Scan a list of MIT LCS TRs. -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk