jxh@cup.portal.com (05/11/88)
Some time ago, there was a discussion among a few friends of mine about source-control programs. I thought it was time to revive this issue, and now that I have a news feed I thought that comp.software-eng was an appropriate place for it to rage. What follows is the entire text of the discussion as I received it over email. Beware of people's email addresses: several have changed since then, notably mine. Also, bear in mind that these were personal messages originally, and were not composed for net consumption, so first names are used freely. Here is a list of the people involved, and their real names and (I hope) current addresses: Jim Hickstein (jxh) jxh@cup.PORTAL.COM [me] Jeff Woolsey (jlw) woolsey@nsc.NSC.COM Jeff Pomeroy (jlp) jlp@CRAY.COM Dan Germann (deg) deg@kksys.UUCP Mark Ransom (msr) msr@kksys.UUCP Alan Arndt (aga) (not on the net just now: send to care of jxh) I intended this discussion to revolve around issues of implementing our own such program. Now that I open this to the net, I would welcome thoughtful insights about features of various (perhaps extant) programs; but still with an eye toward implementing something. Please, no flames about somebody-or- other's lousy program that ate your file. We should work toward specifying a program that will solve problems for those involved. ============================================================================ - From tsec!nsc!amdahl!meccts!kksys!deg Fri Sep 11 08:13:22 1987 Received: by nsc.NSC.COM; Thu Sep 10 21:44:44 1987 Received: by amdahl.UUCP (4.12/UTS580_/\o-/\) id AA07489; Thu, 10 Sep 87 16:36:09 PDT Received: by meccts.MECC.MN.ORG (smail2.5) id AA16368; 10 Sep 87 04:57:37 CDT (Thu) Received: by kksys.UUCP (smail2.3) id AA01692; 10 Sep 87 03:36:15 CDT (Thu) To: deg, meccts!amdahl!nsc!tsec!nsc-nca!jxh, meccts!amdahl!nsc!woolsey, msr, umn-cs!cray!jlp Subject: Welcome to library-people mailing list Date: 10 Sep 87 03:36:15 CDT (Thu) From: deg@kksys.UUCP Message-Id: <8709100336.AA01692@kksys.UUCP> A week or so ago, Jim and I were talking about Revise, and noted that it would be nice to have all our discussions down on paper. We also thought it would be nice if our discussions were not just between the two of us, given that there are several people who are interested in source library maintenance. Therefore, I have created this mailing list. Our intention is to discuss the functions and features desirable in tools which maintain source programs. Anyone care to kick things off? -Dan From jxh Fri Sep 11 18:31:15 1987 To: jxh, tsec!nsc!deg@kksys, tsec!nsc!jlp@cray, tsec!nsc!msr@kksys, tsec!nsc!woolsey Subject: library-people kickoff Well, I suppose I'll start, but this is probably not the first kickoff, since we're not synchronized. Let me first propose that Alan Arndt be added to this group: Al works for Ultron Labs in San Jose, and uses Polytron PVCS (Polytron Version Control System). He has quite a bit of experience with it. I do not propose that he be added to sell Polytron to us, but that his experience with its good *and* bad points may serve us well. He is not on the net as I speak, but I intend to help him rectify this situation soon. Tentatively, send to nsc-nca!aga. For those of you who just tuned in, let me recap my discussions with Dan. I have been sort of working on Revise, which is a descendent of UPDATE/MODIFY from the Cybers, in that it has some of the same basic assumptions, namely that there are Decks containing Lines (they got rid of the word Card at this point), and that lines may be Active or Inactive, and that they may be inserted, and deactivated, but not actually deleted, by a modset. A modset can also reactivate "deleted" lines. It is different from UPDATE/MODIFY in that it is a standard Pascal program designed for portability, and it operates on simple Text files: the PL (program library) format is specified to be so general as not to cause problems on any known Pascal. Tradeoffs were made in favor of portability over efficiency of time and space. This is a very pure example of the tenet that the more adaptable a system is, the less well-adapted it is in each different circumstance. It is also different from UPDATE/MODIFY in that it operates on only one PL at a time. Dan and I identified this is a major departure from a quite useful feature, namely the *OPL directive, which indicates that several OPL files be considered as "OldLib" for one run. There are those in Unix land who use a program called SCCS, and another called RCS (not caps). I am less familiar with these, but I think that PVCS is related: They operate by controlling the "checking out" and subsequent "checking in" of modules. One checks out a module, edits it, and checks it in, identifying a "version" as the label for this instance of that module. Basically, these programs run COMPARE over the current and new modules, and store the differences. Also, they tend to store the most "recent" "version" in a form most efficient to access: the stored changes are "inverted" so that the program applies a change to "go back" to a "previous" version. All this is in quotes because, as we know, program modifications are not necessarily monotonic with time. An example is the distribution kit for Revise, which gives a base PL, and two Modsets "CYBER" and "VAX", one of which is applied to make Revise compile in each environment. Neither CYBER nor VAX is more "recent" than the other, although one could certainly say that they are "versions." I often use the word "flavors" to express this concept. The differences between SCCS/RCS and UPDATE/MODIFY are radical. They are so different, that I say that each represents a "paradigm" of source control generally. The UPDATE/MODIFY paradigm is the one with which I am most familiar, so I naturally want a program that embodies it to run in my preferred (enforced?) environment. Those "brought up" on RCS or SCCS cannot see any merit in the other paradigm, at least those to whom I have talked. I invite you to make your own observations about these paradigms, and about specific programs. Certainly, there are features conceivable that exist in neither of these worlds: let us consider them, as well. Our goal should be to establish a new paradigm which embodies the best features of all others, and can be implemented well. We can then move to design questions about specific implementations. Fire away. Jim Hickstein ...!nsc!tsec!nsc-nca!jxh VSAT Systems, Inc. San Jose, CA From tsec!nsc!nsc.NSC.COM!woolsey Tue Sep 15 20:23:20 1987 Received: by nsc.NSC.COM; Tue Sep 15 10:32:16 1987 Received: by pubs.nsc.com; Tue Sep 15 10:30:31 1987 Date: Tue, 15 Sep 87 10:30:31 PDT From: Jeff Woolsey <woolsey@nsc.NSC.COM> Message-Id: <8709151730.AA13546@pubs.nsc.com> To: deg@kksys.UUCP, jlp@cray.COM, msr@kksys.UUCP, nsc!tsec!nsc-nca!jxh, tsec!nsc-nca!aga Subject: library-people first down >[Al] uses Polytron PVCS (Polytron Version Control System). Department of Redundancy Department I looked into a similar package while at TGC; I think Grady [Davis, now @ VSI] got it for us to evaluate. I can't remember the name of the package. That's some indication of how impressed I was. Perhaps Jim can remember its name. It was of the general paradigm rampant among small machines for such packages. (For these purposes a small machine is anything of VAX power or smaller, or anything running Unix (sorry, Jeff).) >it has some of the same basic assumptions, namely that >there are Decks containing Lines (they got rid of the word Card at this point), but not the word "Deck"? >it is a standard Pascal program designed for portability, and it operates on >simple Text files: the PL (program library) format is specified to be so >general as not to cause problems on any known Pascal. Tradeoffs were made in >favor of portability over efficiency of time and space. Weren't there some cases in the code for Revise where an order of magnitude improvement could be had at no cost in portability? Didn't you find some of those, Jim? >It is also different from UPDATE/MODIFY in that it operates on only >one PL at a time. Dan and I identified this is [sic] >a major departure from a quite useful feature, namely the *OPL directive, >which indicates that several OPL files be considered as "OldLib" for one >run. Strike one. It could be difficult to restore this feature efficiently, judging from present Revise performance with one PL on machines we can afford to purchase. >There are those in Unix land who use a program called SCCS, and another called >RCS (not caps). Huh? RCS is still the name of RCS, even though RCS might be invoked as rcs. >I am less familiar with these, but I think that PVCS is >related: They operate by controlling the "checking out" and subsequent >"checking in" of modules. One checks out a module, edits it, and checks it >in, identifying a "version" as the label for this instance of that module. Can any of these programs operate correctly if the directory and files where the "library" are are read-only? The check-out process usually wants to note somewhere that someone is doing something which could cause inconsistencies in someone's view of the state of the "library". M/U users used a property of the NOS file system, namely that a D/A file was BUSY (write-locked by someone else) or unwritable (someone has it in READ without ALLOW-MODIFY or ALLOW-APPEND). >Basically, these programs run COMPARE over the current and new modules, and >store the differences. Also, they tend to store the most "recent" "version" >in a form most efficient to access: the stored changes are "inverted" so that >the program applies a change to "go back" to a "previous" version. All >this is in quotes because, as we know, program modifications are not >necessarily monotonic with time. An example is the distribution kit for >Revise, which gives a base PL, and two Modsets "CYBER" and "VAX", one of >which is applied to make Revise compile in each environment. I think the importance of this concept cannot be overstated, but has been overlooked by all of the "small system" [as above] library tools. This is feature code, a sort of conditional compilation/assembly pulled back one level of processing. >The differences between SCCS/RCS and UPDATE/MODIFY are radical. That's putting it mildly. The only thing in common is that they are attempts to solve the same problem. Well, almost. I think MODIFY/UPDATE recognized additional sub-problems and tried to solve those, too. >They are >so different, that I say that each represents a "paradigm" of source No need for quotes here... >control generally. The UPDATE/MODIFY paradigm is the one with which I am >most familiar, so I naturally want a program that embodies it to run in my >preferred (enforced?) environment. Those "brought up" on RCS or SCCS cannot >see any merit in the other paradigm, at least those to whom I have talked. Indeed. Perhaps we can attack their character by saying that they never saw the need because they never worked on a VERY large product (such as NOS) requiring coordinated effort by sizable teams. I see here also another major difference between M/U and *CS: The former is monolithic, while the latter is incremental. Let me explain. M/U run on machines where (at least theoretically) there is enough power available that it is no great drain on resources to keep editing a modset and creating a COMPILE file every time you want to assemble/compile something. You do not notice how much (possibly redundant) work you are asking the machine to perform, because small increments of work are not noticable. This is true only up to a point, as I would sometimes go the *CS route: pull out a source file, edit it for two days, THEN use compare to make a modset. *CS run on machines without sufficient power to hide these small increments of work. Thus the paradigm changed to permit the elimination of most of the MODSET -> COMPILE file operations in an edit cycle. The time required for the edit cycle remains on the not-enough-time-to-get-coffee side of the line, whereas with M/U (and Revise, as we have seen) it retreats past coffee and on into time-enough-to-read-War-and-Peace territory. Nothing like breaking a train of thought to introduce errors and reduce engineer effectiveness. So small increments of change are evaluated. Other examples of this technique are incremental compilers, and Chess 0.5 (as featured in BYTE some time ago). There was even some talk at UCC about a text editor that would spit out a modset when you were done. >I invite you to make your own observations about these paradigms, and about >specific programs. Certainly, there are features conceivable that exist in >neither of these worlds: let us consider them, as well. Our goal should be >to establish a new paradigm which embodies the best features of all others. Dream on, then? OK. I like named modsets. I like independent modsets. I like being protected from disaster (significant effort required to make changes stick). I like to remove modsets. I like to make virtual OPLs (*OPLFILE) (Often I'd use four such PLs in building the Cray Station.). I would like to minimize work and maximize speed using incremental techniques. I'm not completely comfortable with the smallest unit of change being a line, as the significance of lines diminishes in modern languages. For that matter, we aren't always maintaining programs. Soon library-people shall enter the world of the DBMS. The biggest difference between MODIFY and REVISE/UPDATE is random access, and everything you can do with it. Your turn. From tsec!nsc!amdahl!meccts!kksys!deg Sat Oct 3 18:19:31 1987 Received: by nsc.NSC.COM; Tue Oct 6 00:28:21 1987 Received: by amdahl.UUCP (4.12/UTS580_/\o-/\) id AA25752; Mon, 5 Oct 87 23:39:04 PDT Received: by meccts.MECC.MN.ORG (smail2.5) id AA06036; 6 Oct 87 00:04:28 CDT (Tue) Received: by kksys.UUCP (smail2.3) id AA10528; 5 Oct 87 22:30:41 CDT (Mon) To: deg, meccts!amdahl!nsc!tsec!nsc-nca!aga, meccts!amdahl!nsc!tsec!nsc-nca!jxh, meccts!amdahl!nsc!woolsey, msr, umn-cs!cray!jlp Subject: Ramblings from one of the guys "in the back" Date: 5 Oct 87 22:30:41 CDT (Mon) From: deg@kksys.UUCP Message-Id: <8710052230.AA10528@kksys.UUCP> >SCCS, RCS, etc... SCCS attempted to enforce module integrity by allowing only one user to gain access to the "source" of the module in "edit" mode at one time. There were several commands in the SCCS package, "admin" to administrate the SCCS files, "delta" to apply a change to a SCCS file, "get" to obtain the source to a SCCS file, etc. I believe that "get -e" (or something) was used to declare that you intended to make changes to the file, rather than just look at or compile it. You were not allowed to do a "get -e" on a file that someone else had interlocked by their "get -e". The interlock was cleared when the "delta" to the SCCS file was posted. The "teeth" in SCCS were due to the files being owned by a project leader, who created and modified the SCCS files. He (actually, SCCS probably did this by default) would set the file permissions so that only he would be able to write the SCCS versions of the files. The project team members would be in a unix group that had permission to read the SCCS files. I beileve that RCS works similarly, except that the commands are "ci" (check in) [don't mistype your editor name or the file disappears!] and "co" (check out). I have no idea how they work. In either case, if the SCCS/RCS files were not protected BY THE OPERATING SYSTEM, any person with write permission to the files could apply any change at any time. In fact, they could trash the files completely. What an interesting implementation of "source code control". Actually, this is no different from Modify/Update/Revise/yournamehere; perhaps we will have to rely on the file protection facilities available in the host operating systems to ensure PL integrity. >...non-monotonic program modifications... Another advantage to the M/U pardigm is the ability to have a "debug" or "test" modset that is not kept in the PL. The Pascal Group at UCC had a modset that introduced all sorts of good writelns into the compiler source. We used this whenever there was a bizarre code generation error. All we had to do was apply the modset and recompile the compiler. Unfortunately, when we made MAJOR changes to the compiler, we had to resequence the modset. Fortunately, this was an infrequent occurrence. We could have made the modset a permanent part of the PL, but it would have blurred the otherwise clear boundary between the compiler and the debug code. As part of the PL, the modset would have required updating whenever a compiler change affected it. If we put the debug modset corrections in the compiler change, we could no longer simply "yank" the debug modset to remove the debug code. If we made the corrections in a second compiler modset, we would have been unable to "yank" the compiler changes without also yanking the debug code modset. This sounds like a good issue to address; perhaps what we need is an ability to group modsets: when mod1 is yanked, also yank mod2, and when mod3 is yanked, also yank mod2. Then again, maybe this is a load of rubbish. I recently found myself longing for this capability here at CFA. We have two versions of a printer driver: production and test [I know, I know... it sounds like an IBM shop. sorry.]. In the beginning of this year, we drastically changed the production version. As we all know, test versions of software hang around forever. This printer driver is no exception. If I had a modset that could flip between the two versions, I'd be ecstatic. However, this is CFA... stone knives and bearskins, remember? At least we take offside dumps every now and then... but source control? Forget it... we come as close to good source control as Bork comes to having a real beard. >what else would you call the other Revise feature-vestige but "Common >Deck?" Associate(d) Deck? >Observe, however, that if the logic of Revise changes too drastically, >the source will not be recoverable. (It also helps to take executable >versions of Revise on field trips.) This is the bootstrapping problem. Gee, that's a good point. >Virtual OPLs (*OPLFILE)... Maybe we should reassess the need for this feature. Why was it there (in Modify) in the first place? As near as I can tell, the only reason for *OPLFILE was for an ABS assembly, where you had to *CALL all the subroutines you used into the source fed to COMPASS so there would be no external references. Why did we have ABS assemblies? Another good question. Answers include being able to create multiple entry point programs, having the "good" loader tables (needed by Cyber loader if you ran under RFL,0), and being able to fix inter-program communication areas at specified addresses (did we REALLY do that?). I'm not sure if any of these reasons applies to what we're doing today. We all use language processors that generate relocatable object files, and don't complain too much about having to link everything. OK, so we use Turbo Pascal, too. But we've already complained about that. Maybe they will fix these annoyances in 518 ... er ... version 4.0. At any rate, do we really need to include subroutines with *CALL if we have pre-compiled versions of them in a library somewhere? Just think, if we don't have to compile them every time, we can cut down the compile-test-edit cycle time. >Revise, as Dan convinced me with great eloquence and amplitude, is in the >past. It is rather too fixed to bother spending much time reworking it... >>Strange though this sounds, I think I now understand Revise >>well enough to abandon it. I embarked on my translation project >>because I believed that Revise contained the Final Wisdom about >>library management (well, sort of) if I could only get the oracle >>to speak. Having dug deep enough to understand Revise quite >>completely, I can now see its shortcomings clearly. > >Abandoning it is perhaps a bit drastic. Revise does embody some useful >concepts. It's just that they are in a form that is not abstract >enough for our discussions. There are many good things present in Revise; it's just that, along with all the useful features and familiarity (with the M/U-like interface), there's a lot of deadwood. Jeff P. keeps telling me about the source maintenance utilities they use at Cray. I beleive he said it was a collection of three or four C-language programs, one to three pages each (how long is Revise?). He also said that most of the good features in Modify were implemented. Jeff, I'd sure like to hear more about those programs. Other reasons for starting from scratch include having the freedom to come up with a package (which, from the sounds of things, is heading toward a DBMS... there, now we've all said it) capable of performing the functions we deem necessary, without having to work around an existing inadequate framework, and being able to do with the resultant package as we please. >The difference here is that we want transactions that can be backed out, >and we want transactions that are independent of one another. Well said, Jeff. However, does this necessarily imply that a change to the database, consisting of multiple interrelated transactions, will result in a consistent update? What I'm talking about here is the "classic" OPL and JPL system used at UCC. >>I think the edit-debug cycle followed by the >>process of making a modset to describe what you've just been >>through is valid. > >What that does is solidify (freeze) a set of changes. By making them >more permanent you're supposed to think about them more. But the existence >of COMPARE along with text editors subverts that intent. I don't agree that Compare is responsible for sloppy work. It does make modset creation easier, but aren't we all looking for ways to make source maintenance easier? It is up to us to continue to be responsible and professional about our changes; being able to easily generate a modset from an edited source file is no excuse for failure to do a thorough job. >The more I think about this, the less I know. One thing is for certain: by the time we have a product, we'll all know what the pitfalls of source maintenance are. That, itself, may be the most valuable thing each of us will carry away from this project. >P.S. How come I haven't heard anything from anyone but JLW? Am >I not getting through to faraway parts? Can you HEAR ME IN THE >BACK, THERE? (This is the second note I have sent to >library-people). I don't think Mark has logged in yet, and I'm terminally disorganized. Steve Oyanagi was mentioning a new Unicos release due out soon, so JLP is probably up to his ears in testing. From tsec!nsc!nsc.NSC.COM!woolsey Mon Sep 28 22:44:45 1987 Received: by nsc.NSC.COM; Thu Oct 1 14:39:28 1987 Received: by pubs.nsc.com; Thu Oct 1 14:39:21 1987 Date: Thu, 1 Oct 87 14:39:21 PDT From: Jeff Woolsey <woolsey@nsc.NSC.COM> Message-Id: <8710012139.AA04751@pubs.nsc.com> To: deg@kksys.UUCP, msr@kksys.UUCP, nsc!tsec!nsc-nca!jxh, pyramid!crayamid!cray!jlp, tsec!nsc-nca!aga Subject: more ramblings >To: jxh, tsec!nsc!amdahl!meccts!kksys!deg, tsec!nsc!amdahl!meccts!kksys!msr, > tsec!nsc!amdahl!meccts!umn-cs!cray!jlp, tsec!nsc!woolsey Well, it ought to get to those other sites. I just hope that tsec is smart enough to realize that all the recipients for the copies it got are going the same direction. >Subject: Response to JLW's First Down I see you learned how to do article quoting. Usually that is the majority of any article over 50 lines. Not in this case. How unusual. >what else would you call the other Revise feature-vestige but "Common >Deck?" Other extant names are "include file" and "header file", neither of which I like very much. Somewhere out there there is a gem of a term for describing this concept ("subroutine"? "macro"?) but it hasn't presented itself yet. >Not at *no* cost. I adapted Revise (not to say hosed over) to >take better advantage of my environment. Your statement does >not imply respect for the authors. I must defend them (having >seen their code most closely): they implemented tradeoffs in >favor of portability, but they implemented them well. Revise is >a beautiful Pascal work-of-art. Alas, art is seldom utilitarian. >Among other things, Pascal's character I/O was leaned on heavily; >it is not implemented well in the Pascal compilers available to me. >Borland's "product" (for all its good traits, it is *NOT* a Pascal compiler >if it can't compile Revise) fails to implement character I/O *at all*, Your statement does not imply respect for the authors. I must defend them (having used their code most closely): I contend that they DID implement character I/O. I base my statement on the presence of big, ugly kludges like BLOCKREAD and the untyped FILE to provide block I/O. Granted, the character I/O of which you speak is characterized [sorry] by file POINTERS, but that wasn't blindingly obvious in your tirade. As for the original authors of Revise, I intended not to impugn their abilities as Pascal programmers--indeed, some, if not all, of them had their fingers in the P-6000 pie. I'd be rather surprised [why does that word have two r's in it? -- never mind] if Revise was not already fairly optimal from P-6000's point of view. Rather, I meant to have my recollection of Jim's efforts clarified, as was obviously needed. Incidentally, I sure hope that our [NSC's] forthcoming Pascal compiler has file pointers. I think it does, owing to reports of efforts to pass the Tasmanian test suite. >>>...OPLFILE... > >>It could be difficult to restore this feature efficiently... > >It is not my intention to *restore* features of MODIFY to Revise. It would be easier to add it to something else. >Rather we should concentrate on specifying a new program which >has these desirable features. Revise, as Dan convinced me with >great eloquence and amplitude, is the past. It is rather too >fixed to bother spending much time reworking it until it no >longer resembles its former self. That was just my approach >during the early stages of my work in this realm precisely >because I could not afford to change the logic of Revise for fear >of destroying the only *documentation* of its behavior on other >systems: the code itself. There's a paradox here. Fear of destroying this "documentation" can be alleviated by making a copy. Or by applying your changes as modsets. Observe, however, that if the logic of Revise changes too drastically, the source will not be recoverable. (It also helps to take executable versions of Revise on field trips.) This is the bootstrapping problem. >Strange though this sounds, I think I now understand Revise >well enough to abandon it. I embarked on my translation project >because I believed that Revise contained the Final Wisdom about >library management (well, sort of) if I could only get the oracle >to speak. Having dug deep enough to understand Revise quite >completely, I can now see its shortcomings clearly. Abandoning it is perhaps a bit drastic. Revise does embody some useful concepts. It's just that they are in a form that is not abstract enough for our discussions. >>>... They operate by controlling the "checking out" and >>>subsequent "checking in" of modules. > >>Can any of these programs operate correctly if the directory and >>files where the "library" are are read-only? > >Perhaps you phrase your question too narrowly. How about: Can >these programs utilize existing file protection mechanisms to >protect the library from accidental or unauthorized modification? >I believe that PVCS has a network *version* ( <-irony ) which >can work with, e.g. PC-Network-compatible thingies. I have a great deal of difficulty believing that the mere presence of a network version of something like PVCS prevents me from using a non- network version to modify the library, or from going at the library with ordinary DOS commands. Protection of the library is important, and UPDATE was able to take advantage of the simple scheme available with 1/2" tape: write-rings. (UPDATE (and I guess Revise, too) could process a library sequentially) The trouble is that existing file protection mechanisms vary widely, and may not even exist in some places. >But even if some available program does this, it probably won't do it to our >liking. PVCS is awfully tightly coupled to the architecture of >the "network", in that the application program interface for >file- and record-locking and -sharing is quite specific to PC >DOS. I would hesitate to try to write a "portable" program which >assumed that this interface existed everywhere. Oh, you could go ahead and write such a program. It would only be portable to systems that supported that interface. All you have to do, then, is select an interface which is available everywhere that you would want to take your program. This is what standards committees are for, and what they occasionally accomplish. (e.g. POSIX. (P = Portable)) >I think we should simply say that our ideal program must have >this capability. ^^^^^^^^^^ Property, please. This should not be optional. >Thinking further about this: MODIFY could use >read-only PLs without telling anyone about it (i.e. writing in >the PL some indication that I am making a modset); that >information really just prevents two people making conflicting >modsets; Actually, it just keeps files consistent. As applied to MODIFY, this provides the prevention mentioned. >in MODIFY, that conflict was resolved later by the >authority responsible for actually applying changes to the >system-wide PL. I think it would be neat if we could somehow >automatically avoid such conflicts or at least automatically >resolve them. Avoiding them is easier than resolving them. >Perhaps modsets being applied at "the same time" >can be sorted out by a program and adjusted. One such conflict >that seems a good candidate is that of one modset naming a line >just deleted by another; a program that is "co-applying" these >modsets would "know" about this line until all modsets were >applied. There. I just opened THE big can of worms. Come and >get it! What an appealing metaphor. Yecch. In the MODIFY universe, there are very names for each type of problematic combination of modsets. I'll describe them here and let you match them to your ideas above (and below, it turns out). Dependent modsets relate such that one modset refers to a line that the other inserted or deleted. Modsets can refer to deleted lines; actually, they are inactive rather than deleted. This kind of reference generated warnings from UPDATE or some such program. The trivial case of conflicting modsets would be two identical modsets with different names. Applying the second one should have no effect other than doubling the inserted groups. These modsets may be yanked or deleted indepently of one another. The generalized case is two modsets that do the same thing to a line in a third modset or the original text. Another kind of dependency [is that a word?] is two modsets that insert after the same card (line). Here the dependence is upon what order the modsets are applied, with the newest modset's text appearing first. Some or all of these conditions could be detected automatically, although MODIFY never bothered. UCC had a program called DEPEND to address some of these issues. Obviously, conflicting modsets are the product of disorganization among programmers (or a deranged mind if the same programmer produced both conflicting modsets). Dependent modsets were solved by producing a GEN modset whose sole purpose in life was to be depended upon. >Perhaps we are asking too much in an age when many language >processors have, indeed, conditional preprocessors. But using >them in lieu of the feature at hand doesn't solve any problems of >source control: it merely takes them out of the hands of the >library management program and puts them squarely back onto the >broad shoulders of the programmer, which is precisely where we >are trying to AVOID putting such things. If a library management >program did not introduce horrendous delays in the >edit-compile-debug cycle, I think it could be relied upon to take >over this task from the language processor. There are two sides >to this coin, though: 1) Although it makes such a feature uniform >across all language processors in use (notably those which have >no such facility already), 2) it probably will not have exactly >those features of the preprocessor in use that the programmer >likes and uses. Discussion is indicated. The conditional "pre"processing available with various languages ranges from none, to simple (include files, *CALL) but unconditional, to conditional ( IF ELSE ENDIF ), to include macro processing, and to include kitchen sink processing. Typically your overgrown assembler is of this last variety ( remember COMPASS' DUP, IRP, and other pseudo ops?) Another avenue for complicated preprocessing is to divorce such activity from any language definition. This yield things like m4, cpp (used by languages other than C, and things known at cpp-time are unknown at compile-time), and things like sed, awk, & lex. I may be wandering a bit, as preprocessors should be transparent to anything that isn't directives. I.E. I'm not sure such general preprocessing belongs in a library tool, except as warranted for include files that are in the library. Another point worth considering is that selecting between using a preprocessor and a source-control tool to provide feature code depends upon which is more widely available among the recipients. Back in the CDC days, everyone who received NOS had MODIFY, thus NOS was distributed in source-control form. >Perhaps MODIFY/UPDATE caused more problems, that then needed >solving, but I think not. M/U really did attempt to solve >problems, but not in the normal fashion: that of so automating a >task that the programmer could not hurt himself. It would be >nice if automation of this sort of thing took the form: the easy >way is also the good way. So what, then, is the easy way? >What, then, is that problem that needs solving? Seems to me that >*the problem* can be simply stated as: multiple, simultaneous, >dependent and independent modifications of a database, with the ^^^^^^^^ Yes, I think we are heading into that realm. The more generalized we want things, the more general must the tools be. For example, the notion of OPLFILE, which is really a virtual OPL, abstracts into creating a virtual database by logically combining two or more smaller (an uncommon word in the database world) ones. >special case that the database is a set of source files subject >to an iterative edit-compile-debug cycle. Peanuts compared to transaction processing on a large database. >"Dependent and independent" means that two kinds of conflicts can >exist among any given set of modifications. Independent mods >(to the same file) are those which do not overlap; that is they >do not modify the same part of a file. They are tantamount to >modsets to separate files, with the line between files being >rather arbitrary. The difference here is that we want transactions that can be backed out, and we want transactions that are independent of one another. I'm not yet sure whether the database model of program library maintenance needs any extensions. Every database that I can think of is a model of some physical phenomena, such as the existence of people or instruments. People are unique (though their names might not be; I don't think we are that interested in this problem) and financial instruments must be unique or we quickly find ourselves in financial ruin. We can also think of these instruments as transactions, and one would not want (in the interest of correctness) the same transaction applied multiple (or zero) times! >>There was even some talk at UCC about a text editor that would >>spit out a modset when you were done. > >Well, let's not get into the holy war over "your favorite text >editor" here. That's not the point. Any editor could be encased in a procedure file or shell script which would run COMPARE or diff between the before and after files. >I think the edit-debug cycle followed by the >process of making a modset to describe what you've just been >through is valid. What that does is solidify (freeze) a set of changes. By making them more permanent you're supposed to think about them more. But the existence of COMPARE along with text editors subverts that intent. >>I like to make virtual OPLs (*OPLFILE)... > >Let's generalize that: This is really an artifact of the line >between one "file" and the next. It may be that some environment >blurs this line (PLATO comes to mind, where one "lesson" had many >different types of "blocks" that were related by being part of >the same lesson. Sort of a subdirectory.) An implementation >must not allow arbitrary lines between "files" to define the >boundary of the "library." Paradoxically, on Cybers you can concatenate files full of REL records and get one file of REL records (and the loader will like it), while on UNIX you can concatenate two tar files and get one file that tar won't like. >>The biggest difference between MODIFY and REVISE/UPDATE is >>random access, and everything you can do with it. Even in an environment without random access, the library maintenance tool can still maintain an index, for use if the library is ever taken to an environment that has random access, or is such an environment should arrive where the library is. This is far too long. I think a summary is in order, next time. The more I think about this, the less I know. From tsec!nsc!nsc.NSC.COM!woolsey Sun Oct 4 18:17:12 1987 Received: by nsc.NSC.COM; Tue Oct 6 16:28:22 1987 Received: by pubs.nsc.com; Tue Oct 6 16:27:57 1987 Date: Tue, 6 Oct 87 16:27:57 PDT From: Jeff Woolsey <woolsey@nsc.NSC.COM> Message-Id: <8710062327.AA20770@pubs.nsc.com> To: deg@kksys.UUCP, msr@kksys.UUCP, nsc!tsec!nsc-nca!jxh, pyramid!crayamid!cray!jlp, tsec!nsc-nca!aga Subject: good seats (I say, these subject lines are getting pretty stupid. We all know what we're talking about: general ramblings on the subject of source control. When (if?) we start getting more specific, they'll make more sense.) >He (actually, SCCS probably did this by >default) would set the file permissions so that only he would be able >to write the SCCS versions of the files. The project team members would >be in a unix group that had permission to read the SCCS files. Boy, were we spoiled with the PERMIT command. There was a discussion raging in netnews many moons ago about grafting Access Control Lists onto Unix. Not much became of it, except lament for the systems that people had left that had had them. I mean, the method above sounds like a kludge, and I still don't know whether the SCCS tools need to be setgid to the project group, or setuid to the project leader, or what. And if so, how does one use it on multiple projects? >I beileve [sic] that RCS works similarly, except that the commands are >"ci" (check in) [don't mistype your editor name or the file disappears!] In my case, ci is an alias for vi. So there. cu used to conflict, too, but tip, and later telnet, came along. >In either case, if >the SCCS/RCS files were not protected BY THE OPERATING SYSTEM, any person >with write permission to the files could apply any change at any time. CDC manuals suggested that you access the system OPL with COMMON(OPL). The system allowed certain people to create or access these files; a deadstart (or judicious CM editing from the console followed by a DTKM) was required to delete them. (Incidentally, how many access word bits were required to make an ECS file COMMON? ECS files were an ancestor of RAM-disk, though we did not know it back then.) >perhaps we will have to rely on >the file protection facilities available in the host operating systems >to ensure PL integrity. I think that that is a tautology. >>...non-monotonic program modifications... > >Another advantage to the M/U pardigm [sic] is the ability to have a "debug" >or "test" modset that is not kept in the PL. I had such a modset for USERS/DSDSIM. It used 1DS and QFM to do dangerous things during system time. >The Pascal Group at UCC >had a modset that introduced all sorts of good writelns into the compiler >source. We used this whenever there was a bizarre code generation error. >All we had to do was apply the modset and recompile the compiler. And I bet that the reason it was done that way instead of CONST DEBUG = TRUE; IF DEBUG THEN WRITELN('DAG NODE = ', P^.LEFT^.DAG[PI].STYPE^.TOKEN: 6 OCT); was code size in the production compiler. I don't generally count on HLL compilers to recognize this as a form of conditional compilation, though an increasing number do. Besides, this way you still have to recompile. Using DEBUG as a variable that is turned on by a switch is probably a good idea in early stages of compiler building, but as the bug density diminishes, this gets to be a drag. >This sounds like >a good issue to address; perhaps what we need is an ability to group >modsets: when mod1 is yanked, also yank mod2, and when mod3 is yanked, >also yank mod2. Then again, maybe this is a load of rubbish. So you want to teach the source control system about change dependency. Sounds like a directed graph to me. What if you yank mod3, but not mod1? What happens to mod2? >However, this is CFA... stone knives and bearskins, remember? At least we >take offside dumps every now and then... but source control? Forget it... ^^^^^^^ A five-yard penalty. Hey, this football metaphor sure is pervasive! >we come as close to good source control as Bork comes to having a real beard. Have you been reading SU-BBOARD??? Nothing like turning a technical discussion into a political one. It's happened to SDI, and disarmament, and 55 mph, and .... Does the fact that half this list's members are bearded have anything to do with that remark? >>what else would you call the other Revise feature-vestige but "Common >>Deck?" > >Associate(d) Deck? But it's still a DECK! Next thing you know, we'll have portholes, gangways, bulkheads... >>Observe, however, that if the logic of Revise changes too drastically, >>the source will not be recoverable. (It also helps to take executable >>versions of Revise on field trips.) This is the bootstrapping problem. > >Gee, that's a good point. I thought so. Has this been rubbed far enough in yet? >>Virtual OPLs (*OPLFILE)... > >Maybe we should reassess the need for this feature. Why was it there >(in Modify) in the first place? As near as I can tell, the only reason >for *OPLFILE was for an ABS assembly, where you had to *CALL all the >subroutines you used into the source fed to COMPASS so there would be >no external references. You're forgetting your history. The thing that was *CALLed is called a COMMON deck. It was intended for use in FORTRAN programs, to contain the COMMON declarations for the program. >Why did we have ABS assemblies? Another good >question. Answers include being able to create multiple entry point >programs, having the "good" loader tables (needed by Cyber loader if >you ran under RFL,0), ... to make sure that no unneeded trash (like CMM) got hauled in at link-time ... >and being able to fix inter-program communication >areas at specified addresses (did we REALLY do that?). Sure. ARGR, CDDR, FWPR, uh... Gee, this stuff evaporates quickly without an Instant. Actually, those didn't need ABS; the loader just started the text high enough that these constants could be used for IPC. >I'm not sure if >any of these reasons applies to what we're doing today. We all use >language processors that generate relocatable object files, and don't >complain too much about having to link everything. Oh yeah? I miss having 1AJ call the loader when it didn't know _what_ it was looking at. >OK, so we use >Turbo Pascal, too. But we've already complained about that. Maybe they >will fix these annoyances in 518 ... er ... version 4.0. I doubt it. >At any rate, >do we really need to include subroutines with *CALL if we have pre-compiled >versions of them in a library somewhere? Just think, if we don't have to >compile them every time, we can cut down the compile-test-edit cycle time. It becomes compile-link-test-edit cycle time. Linkers can be slow, too. I'll give you one very good reason for having done ABS assemblies. Confidence. ABSolute confidence. If I know that it was I who assembled (or compiled) every last line of code in a program, I'll have all of the information available about how it was created and what's in there, so that there are no surprises. All of the binaries (relocatable, even) that go into making something are built from sources that I can actually see. There's even a little motivation for disassemblers here, too, in case you must use a library without source (e.g. Wreckage Mangler, or CTI). "If you want something done right, do it yourself." Still, that can be a lot of work, mitigated by the machine speed... >There are many good things present in Revise; it's just that, along with >all the useful features and familiarity (with the M/U-like interface), >there's a lot of deadwood. Jeff P. keeps telling me about the source >maintenance utilities they use at Cray. Perhaps he should tell _us_. >I beleive he said it was a collection >of three or four C-language programs, one to three pages each (how long is >Revise?). He also said that most of the good features in Modify were >implemented. Jeff, I'd sure like to hear more about those programs. I wonder just how portable are sources maintained with these tools. (Not that they needed to be; they're proprietary, aren't they?) >Other reasons for starting from scratch include having the freedom to >come up with a package (which, from the sounds of things, is heading toward >a DBMS... there, now we've all said it) capable of performing the functions >we deem necessary, without having to work around an existing inadequate >framework, and being able to do with the resultant package as we please. but not anytime soon. >>The difference here is that we want transactions that can be backed out, >>and we want transactions that are independent of one another. > >Well said, Jeff. However, does this necessarily imply that a change to >the database, consisting of multiple interrelated transactions, will result must >in a consistent update? What I'm talking about here is the "classic" OPL >and JPL system used at UCC. "Consistency is the mother of strange hobgoblins." There's a parallel here between updating a database and managing a semaphore. >>>I think the edit-debug cycle followed by the >>>process of making a modset to describe what you've just been >>>through is valid. >> >>What that does is solidify (freeze) a set of changes. By making them >>more permanent you're supposed to think about them more. But the existence >>of COMPARE along with text editors subverts that intent. > >I don't agree that Compare is responsible for sloppy work. It does make >modset creation easier, but aren't we all looking for ways to make source >maintenance easier? Not only easier, but more reliable. >>The more I think about this, the less I know. ...until pretty soon I know nothing about everything. >One thing is for certain: by the time we have a product, we'll all know >what the pitfalls of source maintenance are. That, itself, may be the >most valuable thing each of us will carry away from this project. Platitudes, eh? OK. Source maintenance is an attempt to cure the too-many-cooks disease. (Somtimes one is too many.) >>P.S. How come I haven't heard anything from anyone but JLW? Am >>I not getting through to faraway parts? Can you HEAR ME IN THE >>BACK, THERE? (This is the second note I have sent to >>library-people). > >I don't think Mark has logged in yet, and I'm terminally disorganized. >Steve Oyanagi was mentioning a new Unicos release due out soon, so JLP >is probably up to his ears in testing. I think someone's implying that Jim and I have little to do... Too much free time on our hands... We can continue this discussion in person (Jim can debrief me) this weekend. -- LERMINATING PREVIOUS SESSION. PQEASE RETRY. Jeff Woolsey National Semiconductor woolsey@nsc.UUCP woolsey@umn-cs.EDU From tsec!nsc!pyramid!crayamid!cray!jlp Mon Oct 5 17:17:55 1987 Received: by nsc.NSC.COM; Thu Oct 8 04:58:58 1987 Received: by pyramid.UUCP (5.51/OSx4.0b-870424) id AA20821; Thu, 8 Oct 87 04:34:57 PDT Received: by vax2.cray.uucp (4.12/25-eef) id AA19316; Wed, 7 Oct 87 22:23:08 cdt Date: Wed, 7 Oct 87 22:23:08 cdt From: cray!jlp (Jeff Pomeroy) Message-Id: <8710080323.AA19316@vax2.cray.uucp> To: crayamid!!pyramid!nsc!tsec!nsc-nca!aga, crayamid!!pyramid!nsc!tsec!nsc-nca!jxh, crayamid!!pyramid!nsc!woolsey, umn-cs!meccts!kksys!deg, umn-cs!meccts!kksys!msr Subject: Is it too late to buy a round trip ticket to ... Hi gang. I have been reading the messages as they go by. I think that i should start off by explaining what i have been up to for the past two and a half years. I work at Cray handling the source code of the UNICOS operating system. Cray has something called UPDATE that was based on CDC's UPDATE and has run under COS for years and UNICOS for about a year. UPDATE is written in CFT, and thus runs only on the Cray mainframes. Cray also has something call 'scm', which is a locally designed and written thingy that was intended to put UPDATE out of business. Scm runs on the Crays, the VAX and the Suns. Some people within Cray also use SCCS on the VAX. I have to deal with all of them. We have had many battles over source control, which has shown one thing: there is no winner. So, what can i add to this discussion? I really do not want to start complaining about my day-to-day problems, and have you guys waste your time trying to think up some solution(s) which i am sure have little or no impact on what we do around here. But, i could relate to you some of the things that have happened around here in the past, just as examples, to show how these things work in the real world. I will start with SCCS. It is owned and operated by AT&T. They only let people use it. You may think i am kidding. We have both Suns running 4.2 and VAXes running System V, side by side, on the same network. The Sun SCCS was based on UNIX version 7 (or PWB) . Guess what? AT&T changed SCCS around System III so they are not compatible. OK, so some things did stay the same. Quick, what are they? (Strike One) SCCS has magic cookies. (do i need to explain this?) A magic cookie in this case means in-band data. (That should wake jim up) SCCS will do different things depending on the content of the file you are putting under source control. Another way to say this is: "what goes in does not come out". (if this gives you a bad feeling deep in your gut, good) The SCCS magic cookies are in the f percent, letter, percent. Like '%D%', which i think changes to the current date. Nice, eh? What about: printf(" %D%D ", i, j); (I think that %D is the double precision decimal format...) This is silently distroyed by SCCS. This really happened in the Berkeley 4.2 release where under the user contributed software, some guy had developed a program without SCCS, but Berkeley passed it through SCCS before releasing it. At Cray, some people use SCCS to hold there assembly code. What about: IDENTSIN ENTRYSIN ENTRYSIN% ENTRYSIN%R% The entry for SIN saves and restores all registers. The entry for SIN% does not save or restore, this is used by the compiler with it knows it is 'safe' to bash registers. And lastly SIN%R% means that operands are passed in registers and not on the stack. As i recall, SCCS changed %R% to the current release level of that module. (Strike Two) SCCS is made up of many commands. One of them is called 'help'. This command seems to be there for the sole purpose of almost but not quite helping the user. Here i will use proof by example... VAX running AT&T System V: Type: % help Output: % Cray running UNICOS: type: % help Output: The help command is unfortunately not what it would seem to be. It provides a limited amount of help and only for SCCS commands. Its arguments are the small alpha-numeric strings that accompany the error messages from those commands. If you are looking for more general help in using the rest of the system try the on-line manual that is available through the man command. Try: man man Sun running Sun UNIX 4.2 Release 3.2: Type: % help Output: msg number or comd name? Type return Output: ERROR: not found (he1) Type: % help Output: msg number or comd name? Type: he1 Output: he1: "not found" No helpful information associated with your argument was found. If you're sure you've got it right, do a "help stuck". Type: help stuck Output: stuck: First, if you know the value of the system error number (errno), you can either look up a description of it in INTRO(II), or execute "help err<number>" (e.g., if the error number is 1 execute "help err1"). If you don't know the error number, or you don't understand what's going on - Try the following, in order: 1. Make sure the answer isn't in the documentation. 2. Try to write(I) to anyone logged in as "adm". 3. Contact your PWB/UNIX counsellor. 4. File an MR (see System Administrator for instructions). You call this help? Just for a laugh, i tryed 'help he2' and got: he2: "argument too long" Dost thou jest? Wilst thou mock HELP?? Please limit your blitherings in arguments to less than fifty (50) characters. (Strike Three, you're out!) Lastly, here is a quote from the Sun man page for the SCCS get command: (Lets see if anyone in the home audience can figure out why) BUGS If the effective user has write permission (either expli- citly or implicitly) in the directory containing the SCCS files, but the real user doesn't, only one file may be named when the -e option is used. The list goes on and on... I will never put any of my own programs under SCCS. As deg would say... "SCCS is a 'FINE' piece of software" It is getting late.
franka@mmintl.UUCP (Frank Adams) (05/14/88)
In article <5291@cup.portal.com> jxh@cup.portal.com writes: >I have been sort of working on Revise, which is a descendent of UPDATE/MODIFY >from the Cybers, in that it has some of the same basic assumptions, namely >that there are Decks containing Lines (they got rid of the word Card at >this point), and that lines may be Active or Inactive, and that they may b >inserted, and deactivated, but not actually deleted, by a modset. A >modset can also reactivate "deleted" lines. > >There are those in Unix land who use a program called SCCS, and [others]. >They operate by controlling the "checking out" and subsequent "checking >in" of modules. One checks out a module, edits it, and checks it in, >identifying a "version" as the label for this instance of that module. I have considerable familiarity with this latter class of code control systems (including having written one), but I have never before encountered the former kind. I am having some difficulty understanding just how it is supposed to work. Everyone involved in the discussion apparently was quite familiar with them, so the above is the best description of them supplied. It leaves quite a bit unanswered. I will attempt to describe the system based on my understanding of it; I would appreciate it if the original poster or someone equally competant would review this, note any misconceptions, and answer my questions. I will be concentrating on "how to use the system", not "how the system works"; the lines quoted above seem to cover that pretty well. It appears that the main editing done by programmers using such a system is the creation of "modsets". These, in general, specify that certain lines are to be inserted into a particular piece of source code. (I gather some systems allow more than one piece of source (source file or "deck") to be updated with the same modset. Maybe they all do?) In addition, a modset may specify that certain lines of code be deleted (deactivated), or that certain lines which were previously deleted be restored. I don't know how the lines to be inserted or deleted are identified. I would guess that each line has a line number, and that new lines are inserted so that their line numbers remain in order. It appears that traditionally, programmers directly created modsets, and that it is a relatively new and far from universal thing for them to edit the entire source file, and create the modfile mechanically. It appears that the compilation step (in the development cycle) with such a system is preceded by combining the programmer's modsets with the current state of the code control system to produce the actual source to be compiled. When a programmer is satisfied with his changes, he "signs in" his modfile(s). There may(?) be a review by someone before this actually becomes part of the standard. -------- It seems to me that the main problem with this kind of system is sorting out simultaneous changes to the same piece of code. It seems to me that the advocates of this approach have become so used to dealing with this, that they simply accept it as part of the system. (In the comments I saw, there were several suggestions for *mitigating* the problem, but no hint that it was something one might want to *eliminate*. It is a key advantage of the SCCS-style approach that it does avoid the problem.) I should note my own preference (not fully shared by my current co-workers). I prefer an SCCS-style code control system, in conjuction with a convention that there is only one entry point per file. If, as is good for other reasons as well, the functions are all kept relatively small, then all the source files are small. This means that two programmers trying to access the same file at the same time are really trying to change the same code, and one of them should wait for the other to finish. (The inability for two people to modify the same module at the same time is the characteristic problem of this style of code control system, as integrating simultaneous changes is the characteristic problem of the other.) One *must* support this with some kind of include file system, so that declarations can be consistent across modules. The inclusion process need not be regarded as the responsibility of the code control system, however. (The include files themselves are source files to be maintained; but that is another matter.) -- Frank Adams ihnp4!philabs!pwa-b!mmintl!franka Ashton-Tate 52 Oakland Ave North E. Hartford, CT 06108
jxh@cup.portal.com (05/17/88)
In <2846@mmintl.UUCP>, Frank Adams (franka@mmintl.UUCP) writes: >>...a descendent of UPDATE/MODIFY from the Cybers... >It leaves quite a bit unanswered. Yes, sorry about that. Your observation that the original parties to the discussion were familiar with UPDATE/MODIFY is quite true. I will try to summarize the relevant facts about that universe (or perhaps Dan Germann or Jeff Woolsey, who used it much more than did I); but your "understanding" will stand for the moment. It's pretty good. >I will attempt to describe the system based on my understanding of it. >I don't know how the lines to be inserted or deleted are identified. I >would guess that each line has a line number, and that new lines are >inserted so that their line numbers remain in order. Lines were identified by a deckname or modname, followed by a sequence number, e.g. DECK.1, DECK.2, ..., DECK.999. Sequence numbers were assigned beginning with 1 for each different name. Thus: DECK.1, DECK.2, MODSET.1, DECK.3 ... . The order of these lines in the Program Library (PL) defined the sequence of lines in the actual source. Modsets consisted of directives, such as *DELETE DECK.1,DECK.4 <deletes range of lines, inclusive> *INSERT DECK.2 <inserts following lines after DECK.2, assigning insertion text sequence numbers to each: MODSET.1, MODSET.2 > more insertion text *more directives >It appears that traditionally, programmers directly created modsets, and >that it is a relatively new and far from universal thing for them to edit >the entire source file, and create the modfile mechanically. Just so. A separate text-comparison program was given the ability to express differences as a set of directives suitable for feeding to UPDATE for just this purpose. Actually, this may be more universal than I thought, as more-capable editors become available under NOS. >It appears that the compilation step (in the development cycle) with such a >system is preceded by combining the programmer's modsets with the current >state of the code control system to produce the actual source to be compiled. You're batting 1.000. This step takes the PLs and modsets, and creates the COMPILE file, which is the modified source. COMPILE was typically an alternate default filename on, e.g., the assembler, to eliminate steps from your batch job. (Egad!) >When a programmer is satisfied with his changes, he "signs in" his >modfile(s). There may(?) be a review by someone before this actually >becomes part of the standard. Someone else (Messrs. Woolsey or Germann) should elaborate on the procedural aspects of source control that developed in their shop. For the moment, let me simply say "code review" and "proposed modset" and hope that gives the right impression. I think the fact that modsets were "hard" to introduce into the database permanently had a positive effect on quality, as they were subjected to tremendous scrutiny. -------- >It seems to me that the main problem with this kind of system is sorting out >simultaneous changes to the same piece of code. It seems to me that the >advocates of this approach have become so used to dealing with this, that >they simply accept it as part of the system. (In the comments I saw, there >were several suggestions for *mitigating* the problem, but no hint that it >was something one might want to *eliminate*. It is a key advantage of the >SCCS-style approach that it does avoid the problem.) Well. Here we go. I would settle for *mitigating* if it meant I could get named modsets. *Eliminating* would be nice, but implies the (imho) too- restrictive locking of source files. How about *automating* the process of identifying (and perhaps resolving) conflicts? >I prefer an SCCS-style code control system, in conjuction with a convention >that there is only one entry point per file. If, as is good for other >reasons as well, the functions are all kept relatively small, then all the >source files are small. This means that two programmers trying to access >the same file at the same time are really trying to change the same code, >and one of them should wait for the other to finish. (The inability for two >people to modify the same module at the same time is the characteristic >problem of this style of code control system, as integrating simultaneous >changes is the characteristic problem of the other.) It is true that, on the Cybers, programs tended to be huge and monolithic (not because of bad programming practice but because of institutional biases such as *LOCAL FILE LIMIT*); whereas small-computer programs tend to have lots of small parts. I applaud modularity when it is APPROPRIATE FOR THE PROGRAM, not imposed by outside forces such as making source control easier, or making things compile faster (however worthy those goals certainly are). I detect a bias toward C programs, where all functions are at the same scope level; in this case, putting them into separate files makes sense. However, Modula-2 programs tend to have many routines within a module; and this tends to make single modules rather monolithic themselves. (I have broken a single module into several because of an implementation restriction, when I would have preferred keeping it in one piece to ensure data integrity of private objects.) Making simultaneous changes to one module might easily mean implementing two completely different and, conceptually, independent modifications. A source control program should allow me to modify the top while you're working on the bottom; if there is no real conflict, then file-locking should not preclude my getting some useful work done while you, presumably, do the same. PVCS tries to allow this by "branching," which seems simply to be a cop-out, the program recognizing that a conflict is (potentially) being created that will have to be sorted out later when the two branches "merge." Furthermore, my change might remain unrelated to yours forever. Perhaps I want to apply a temporary change to see what would happen. I could, of course, do this by making a local copy of the source file in question, and do as I please with it, but if my prattlings in my sandbox become worthwhile I would like to be able to put them into the real source for everyone's benefit without having to reconstruct what I did. A modset from the common base source describes my actions succinctly, and can be stored in rather less space. (Flame off. Sorry. I really don't know the first thing about SCCS, so if all this is possible there simply by sleight of hand, please let me know). Oh, this is exciting! I thought this newsgroup would be a good place for this discussion! P.S. My boss just walked into my office brandishing a copy of the glossy for PVCS and, in a moment of candor, I told him that we should get it and use it; that it is, at least, a giant step in the right direction, even if it isn't quite all I hoped for. Of course, we should all get Suns and avoid the PC problems altogether, but he's not prepared to hear that. Not yet. -- Jim Hickstein, VSAT Systems, Inc., San Jose CA jxh@cup.portal.com ...!sun!portal!cup.portal.com
woolsey@nsc.nsc.com (Jeff Woolsey) (05/17/88)
In article <2846@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes: >I have considerable familiarity with this latter class of code control (SCCS/RCS/PVCS et. al. (S/R/P hereinafter)) >systems (including having written one), but I have never before encountered >the former kind. (MODIFY/UPDATE/REVISE et. al. (M/U/R hereinafter)) >I am having some difficulty understanding just how it is supposed to work. >Everyone involved in the discussion apparently was quite familiar with them, >so the above is the best description of them supplied. It leaves quite a >bit unanswered. Let me provide a simple, yet complete run-down of the M/U/R model. The differences between the three are implementation issues, or small variances within the paradigm. There exists a Program Library (PL) consisting of source program modules (decks), include files (common decks), named changes (modsets), and a directory (for random access). The PL is a single file. Its appeal is its integrity. Some effort is required to delete part of a PL. If you have the PL, you have the whole source. When you want to compile a program in a PL, you direct M/U/R to make a compile file. This file is fed to the assembler or compiler as is. Each line in it consists of the source line and a sequence number. The sequence number is the name of the deck or modset followed by the ordinal of that line within the deck or modset. Common decks are the same as source decks except that they have a bit set meaning that they can be included (at compile-file time) in another source deck. Their original purpose was to hold all the COMMON declarations in FORTRAN programs. You cannot generate a compile file from just a common deck. A modset consists of its name, the name of the deck to modify, and insertion or deletion directives each followed by abritrarily long sections of text. The text is inserted at the point the insertion or deletion directives refer to. The insertion points are desginated as modname.seqno or deckname.seqno, or just seqno if an original card in the deck is desired. Deletion points can specify a range of lines. The resemblance to drawers of punched card decks is the obvious ancestry. It's probably obsolete, although lines are still pretty pervasive as the unit of change in source control. Modset creation generally means that you take a nice assembly/compiler listing (made from the compile file with its sequence numbers) and go off into a corner and mark it all up. Then you figure out which lines are the insertion and deletion points, and type up your new text in between directives. Then you reenter the edit-compile-debug loop. Eventually the process was automated with tools analogous to diff -e. If a source deck contained enough modsets such that not more than some arbitrary number of lines were original code, it was said to need resequencing. This is a fairly major event in the life of a deck, and probably only happens twice. It destroys all the modset information while retaining the net changes. The OS vendor would resequence decks occasionally with new releases, and installation-specific modsets would have to be converted because the line numbers changed. Locally- developed programs could suffer resequencing, too. A set of miscellaneous tools rounded things out with the ability to extract everything back into source form and expunge modsets and other mundane operations. >It appears that the main editing done by programmers using such a system is >the creation of "modsets". These, in general, specify that certain lines >are to be inserted into a particular piece of source code. (I gather some >systems allow more than one piece of source (source file or "deck") to be >updated with the same modset. Maybe they all do?) For reference purposes, UPDATE/REVISE treat decks and modsets as the same thing, whereas MODIFY distingushes them. This just dictates whether there are "DECK" directives in your modset, and what the modset sequence numbers are. However, this ability of one named change to alter several related source entities simultaneously and indivisibly is one of several features I miss from the M/R/U paradigm. Another feature is independent, non-monotonic changes, any particular collection of which may be selected to build a particular flavor of product. (analogous to #ifdef FEATURE ) The third is PL integrity (mentioned above). >In addition, a modset >may specify that certain lines of code be deleted (deactivated), or that >certain lines which were previously deleted be restored. Restoring deleted lines, though supported, was a no-no as far as OS support was concerned. A real good way to cause dependencies and other conflicts. You were to accomplish the same net effect by inserting new copies of the orignally-deleted lines. COMPARE (diff -e) would do it this way. Joe User, in complete control of his own PLs, however, is perfectly welcome to create conflicts and dependencies, as long as he can deal with the results. >I don't know how the lines to be inserted or deleted are identified. I >would guess that each line has a line number, and that new lines are >inserted so that their line numbers remain in order. By name.number . "number" is always sequential, and there are no fractions. Each change (modset) has a unique name (modname). The number of a line never changes, but it could be deactivated, and an identical line with a different modname and number could replace it. This is one reason why resequencing is so traumatic. >It appears that traditionally, programmers directly created modsets, and >that it is a relatively new and far from universal thing for them to edit >the entire source file, and create the modfile mechanically. "relatively" is relative here. This basically describes the state of affairs four years ago and more at a large university running mainframes. Eventually each programmer would discover that COMPARE could generate modsets. >It appears that the compilation step (in the development cycle) with such a >system is preceded by combining the programmer's modsets with the current >state of the code control system to produce the actual source to be compiled. Essentially correct. It is a special case. Even in the case of product installation, a compile file is built. The PL is really a list of lines, each of which contains a pointer to each modset that referenced it and how (active or not). When applying a modset, another set of these pointers is built, and making the compile file involves looking at the list of pointers to see the net effect. If the line is now active, it is copied to the compile file. >When a programmer is satisfied with his changes, he "signs in" his >modfile(s). There may(?) be a review by someone before this actually >becomes part of the standard. For the OS with local changes, yes. The system programmers "submit" their modsets to a coordinator, who then figured out which (if any) modsets conflicted or depended upon one another or the same thing. (This process was eventually automated.) The submittors were then asked to reconcile their modsets to resolve these problems. Then the whole mess was printed and circulated for code review. >-------- >It seems to me that the main problem with this kind of system is sorting out >simultaneous changes to the same piece of code. It seems to me that the >advocates of this approach have become so used to dealing with this, that >they simply accept it as part of the system. (In the comments I saw, there >were several suggestions for *mitigating* the problem, but no hint that it >was something one might want to *eliminate*. It is a key advantage of the >SCCS-style approach that it does avoid the problem.) So what does a programmer do while waiting for the code to be available? Go to the listings on the shelf and work out the changes based on that version. Then when the source is free, apply the changes. Surprise, reality has changed. But there are usually many ways to subvert the intentions of a source code control system. >I should note my own preference (not fully shared by my current co-workers). >I prefer an SCCS-style code control system, in conjuction with a convention >that there is only one entry point per file. If, as is good for other >reasons as well, the functions are all kept relatively small, then all the >source files are small. This means that two programmers trying to access >the same file at the same time are really trying to change the same code, >and one of them should wait for the other to finish. (The inability for two >people to modify the same module at the same time is the characteristic >problem of this style of code control system, as integrating simultaneous >changes is the characteristic problem of the other.) Usually the integration is painless, if the two programmers are doing different things and don't bump into each other. >One *must* support this with some kind of include file system, so that >declarations can be consistent across modules. The inclusion process need >not be regarded as the responsibility of the code control system, however. >(The include files themselves are source files to be maintained; but that is >another matter.) Indeed. It just so happend that M/U/R provided the include mechanism because the included portions are part of the program library. They could also be included from another such library, if needed (usually the case for the system common decks). -- Scrape 'em off, Jim! Jeff Woolsey National Semiconductor woolsey@nsc.NSC.COM -or- woolsey@umn-cs.cs.umn.EDU
smryan@garth.UUCP (Steven Ryan) (05/19/88)
Gee, I never thought anybody outside of CDC knew of MODIFY (NOS), UPDATE (NOS/BE, NOS, VSOS), and now SCU (NOS/VE). All of the source code is stored in what is called a program (or source) library (called the PL). The program library is divided into decks (as in punched cards) which are divided into individual lines. A deck can be module, data structure, or whatever conceptual chunk you wish to use. Each deck has a unique name. MODIFY maintains last modification date for each deck. A line is a line of source text, an identifier, and a modification history. UPDATE line identifiers are unique across the PL. MODIFY identifiers are only unique within the deck. Lines in the original deck have identifiers like deckname.1, deckname.2, ... Lines subsequently inserted have identifiers like id.1, id.2, ... A line can be deactivated (deleted), activated, deactivated, et cetera, an arbritrary number of times as the result of a series idents (changes). The modification history attached to each line refers to successive activations/deactivations and which ident did it. Only lines which are currently activated are listed. All the deleted lines are still there, though, which is useful in the whoops! mode of programming. Once a group of idents is permanently added to the PL and the system built, if you discover, whoops! ident F7B022 just broke the entire vectoriser, you just do a *YANK F7B022 to magically erase F7B022 from the system. Actually, it goes through the modification history and adds an activation to lines deleted by F7B022 and deactivation to lines inserted. As long as subsequent idents do not refer to the same lines, in principle, F7B022 can be *YANKed and *UNYANKed as many times as necessary to get it right. Programmers have to handcode the changes line by line, not very pleasant, unless SCOOP is working. But having all the line changes in an ident has the advantage of making the changes very visible, simplifying the code reviews. Idents contain both editting commands and comments which have the programmers name, other identification, and explanation of the change and its reason all in one bundle. Far from making collisions in the source more burdensome, it usually makes them less so. Two separate programmers can modify the same deck without error as long as they modify distinct lines. Generally safe. The project leader is suppose to review all idents for interferences but this includes interferences which might be on separate decks. All idents for a PL are collected and applied in mass for each build cycle. At this point, if two idents affect the same line, MODIFY/UPDATE squeals loudly, and the project adjusts for the unexpected overlap. This does happen, but usually like once a year and takes a five-minute change. Hafa an godne daege. sm ryan ps. Control Data/ETA has no access to this network that I know of. For this reason, you may never get response from CDC on this or any other subject.
jxh@cup.portal.com (05/20/88)
I just got my copy of Polytron's PVCS (Polytron Version Control System). I will be launching myself into it shortly. When I surface again, I'll bring a full report. -Jim Hickstein, VSAT Systems, Inc, San Jose, CA jxh@cup.portal.com ...!sun!portal!cup.portal.com!jxh
dricej@drilex.UUCP (Craig Jackson) (05/21/88)
This is a discussion that is near and dear to my heart. While the *CS programs have many advantages, there are a few things which I sorely miss from 'mainframe' source code control systems, (of which MODIFY/UPDATE is one). The biggest thing that I miss in the Unix world is the ability to easily have independent development on a common body of source by two sets of programmers in two locations. The most common case of this is a vendor sending out source updates, and a local site making patches. In the Unix world, each time AT&T comes out with a release, all of the System V vendors need to re-do their port. Now Unix is portable, but it isn't so portable that unnecessary ports are to be desired. In a system such as M/U, there is a unique identifier attached to each line of the source file. Two modsets can affect different regions of the file with no change whatsoever. In the unix world, diff -c & patch attempt to provide the same utility. However, if there isn't enough common context, things fall apart. Also, diff -c only comes from Berkeley; you're left to the net to pick up diffc if you're on a USG system. 'patch' only comes from the net in the first place. You may denigrate the need for 'line numbers' or 'line identifiers' in systems such as M/U. Yes, they are extra baggage. Yes, they do go along with such things as fixed-length source lines and source listings. Yes, they do imply occasional resequencing. However, by uniquely identifying each line, it's possible to unabiguously talk about precise regions of code. I only wish I could give each token in the source file a unique identifier, but it isn't really feasible. -- Craig Jackson UUCP: {harvard!axiom,linus!axiom,ll-xn}!drilex!dricej BIX: cjackson