[comp.arch] A Shared Libraries Solution

craig@unicus.UUCP (Craig D. Hubley) (10/07/87)

One effective way to deal with revisions to shared libraries is by maintaining
several versions around, and have PROGRAMS know which revision levels they 
can count on to perform code.

`Services', which are programs such as print or mail services, but could
just as easily be shared libraries (which should almost never be compiled
into code) have a revision level, such as 10.2, where the 10 is a major
revision level, and the .2 signifies changes that are not known to cause
ANY program to break.  Each `service' or library knows what revision levels
it has available or can emulate.

If a program breaks inside or `on the border' of a library routine,
it stores that fact, and the revision level of the library it was using.
Thereafter, it will ask for `Print Service 8.0 - 10.0', and if only 10.2
is available, it will fail with a robust error, perhaps searching elsewhere
on the system for another print server or archived library of old service
routines.  In fact, most services can deal quite handily with such problems,
simply by having backup disk storage that contains the older services, if
a particular site has programs that need them.  If you don't need the older
services, then don't store them.  The worst that will happen is that your
program will try the new one, fail, back up to the error (if possible),
or restart if not, and ask you to make the old one available.
On a microcomputer, this might mean inserting a floppy.  Quite a bit friendler
than weird data errors, hein?  

XNS uses a similar system for services to know whether or not they can 
serve various programs, though I don't know if the program-based revision
tracking is automatic.  It might be by now.

This method is effective because:

	It frees you, unlike "check the interface" from having to debug
	the whole system before getting an actual solution.

	The revision-tracking is automatic.

	Programs assume new services will work until they actually fail.

	Programs can find problems and log them, notifying the user,
	or users can find problems.  In either case, the `buggy' revision
	will no longer be used by that program.  Or at least, that copy
	of that program.  An alternative would be to have the library store
	the failed-program data, but that would impose a burden.

	Unlike "Revision X.Y or greater", such as the Amiga uses, it does
	not assume that upgrades are always robust.  As anyone involved in
	large systems design should know, the NUMBER of bugs remains constant
	above a certain size... they only move around.

	It is being effectively employed, at least partially, in XNS, and 
	I believe that a similar, though less straightforward, system is
	used in IBM mainframes.

Some disadvantages:

	Programs would have to store failed-version information on every
	shared library they use.  This is fairly minimal in terms of size,
	but restarts, and retrys, could use up a fair bit of computing time,
	where libraries change often, or many copies of a program exist.

	The automatic-logging aspect of the system would be subject to bugs.

	Users could become `spoiled' enough to count on the system to find
	incompatibilities, and fail to look for data errors themselves.

	Shared libraries would have to be checked, on open, for compatibility.

Considering some of these are problems already extant in the existing
bug-spotting procedures, and the worst thing that gets added is a little
extra data and a few more cycles to open libraries, it seems pro overall.

Any comments, particularly from those who have used distributed services
under such a system?

Perhaps more importantly from a UNIX point of view, could it be effectively
implemented on today's systems?

This has been an interesting debate.  Keep it up.

	Craig Hubley, Unicus Corporation, Toronto, Ont.
	craig@Unicus.COM				(Internet)
	{uunet!mnetor, utzoo!utcsri}!unicus!craig	(dumb uucp)
	mnetor!unicus!craig@uunet.uu.net		(dumb arpa)

steve@nuchat.UUCP (Steve Nuchia) (10/16/87)

In article <1057@unicus.UUCP>, craig@unicus.UUCP (Craig D. Hubley) writes:
> One effective way to deal with revisions to shared libraries is by maintaining
> several versions around, and have PROGRAMS know which revision levels they 
> can count on to perform code.

With you so far...

> If a program breaks inside or `on the border' of a library routine,
> it stores that fact, and the revision level of the library it was using.
> Thereafter, it will ask for `Print Service 8.0 - 10.0', and if only 10.2
> is available, it will fail with a robust error, perhaps searching elsewhere
> on the system for another print server or archived library of old service
> routines.  In fact, most services can deal quite handily with such problems,
> simply by having backup disk storage that contains the older services, if
> a particular site has programs that need them.  If you don't need the older
> services, then don't store them.  The worst that will happen is that your

Still with you...

> program will try the new one, fail, back up to the error (if possible),
> or restart if not, and ask you to make the old one available.

Does this not beg the question of how the program _detects_ the failure?

> On a microcomputer, this might mean inserting a floppy.  Quite a bit friendler
> than weird data errors, hein?  

Jah, if it works.

> XNS uses a similar system for services to know whether or not they can 
> serve various programs, though I don't know if the program-based revision
> tracking is automatic.  It might be by now.

Scarier and scarier.

> This method is effective because:
> 	It frees you, unlike "check the interface" from having to debug
> 	the whole system before getting an actual solution.

I think I understand you to be saying that your approach allow the system
to run in the presence of a new, untested library?  How does this differ
(in the light of the sequel) from the "old way" ?

> 	The revision-tracking is automatic.

True, under the assumptions.  Is this a Good Thing?

> 	Programs assume new services will work until they actually fail.

This is the heart of the matter.  Your proposal is for an optimistic
policy, wheras the traditional approach is pessimistic.  In the pessimistic
approach a program asks for the library(s) it has been tested with, and
someone has to update its idea of what libraries are good manually.  In
your optimistic approach a program would use the latest available library
that had nod been _found_to_be_buggy_ (in a relative sense).

> 	Programs can find problems and log them, notifying the user,
> 	or users can find problems.  In either case, the `buggy' revision
> 	will no longer be used by that program.  Or at least, that copy
> 	of that program.  An alternative would be to have the library store
> 	the failed-program data, but that would impose a burden.

Exactly how are programs to do this?  Is this not a close relative to
the halting problem?  I've heard that the ESS5 control program was
over 75% "audit" code - keeping an eye on the other 25%.  This seems
like an extreme penalty (if my understanding is correct) for not
proving the operative 25%, and illustrates the practical difficulty
of software self-test.

> 	Unlike "Revision X.Y or greater", such as the Amiga uses, it does
> 	not assume that upgrades are always robust.  As anyone involved in
> 	large systems design should know, the NUMBER of bugs remains constant
> 	above a certain size... they only move around.

Agreed, the x.y or greater approach is even more optimistic than yours,
since it makes no explicit provision for buggy (just "old") libraries.

> 	It is being effectively employed, at least partially, in XNS, and 
> 	I believe that a similar, though less straightforward, system is
> 	used in IBM mainframes.

Perhaps I misunderstand you.  Do these operational systems employ human
intervention in the error detection loop?

> Some disadvantages:
> 	Programs would have to store failed-version information on every
> 	shared library they use.  This is fairly minimal in terms of size,
> 	but restarts, and retrys, could use up a fair bit of computing time,
> 	where libraries change often, or many copies of a program exist.

Looks like a proper analysis.

> 	The automatic-logging aspect of the system would be subject to bugs.

True, but such things can be managed easily - any _specific_ library
service can be made robust, the dificulty that brings us to this
discussion lies in making a large, diverse, and ever-changing
collection of services robust in the agregate.

> 	Users could become `spoiled' enough to count on the system to find
> 	incompatibilities, and fail to look for data errors themselves.

Naive users are a problem in many areas, password security being one of the
most well known, with inadequate failure reporting running a close second.

> 	Shared libraries would have to be checked, on open, for compatibility.

If by this you mean comparing them against the stored list of compatibilities,
I had understood this to be a part of the overhead of that scheme.  Do you
have something else in mind?  Perhaps you allude to the "testing" of the
library on first encounter?

> Considering some of these are problems already extant in the existing
> bug-spotting procedures, and the worst thing that gets added is a little
> extra data and a few more cycles to open libraries, it seems pro overall.

Actually, assuming I properly understand you, the user complaceny is
probably the worst that gets added.  Especially if this extends to
the software engineering folks, who _should_ be testing things and
not relying on a mathematically unsound (isomorphic with the halting
problem) problem detection and logging scheme.

> Any comments, particularly from those who have used distributed services
> under such a system?

I think the system you advocate, call it "optimistic but reactionary",
is a useful addition to the family of library sharing algorithms.  It
should not be expected to work miracles, and indeed should be seen as
a way of integrating _user_ problem reporting into the library ungrade
cycle rather than eliminating human testing.

> This has been an interesting debate.  Keep it up.
I concur.
-- 
Steve Nuchia	    | [...] but the machine would probably be allowed no mercy.
uunet!nuchat!steve  | In other words then, if a machine is expected to be
(713) 334 6720	    | infallible, it cannot be intelligent.  - Alan Turing, 1947

daveb@geac.UUCP (10/18/87)

In article <400@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes:
>In article <1057@unicus.UUCP>, craig@unicus.UUCP (Craig D. Hubley) writes:
>> program will try the new one, fail, back up to the error (if possible),
>> or restart if not, and ask you to make the old one available.
>
>Does this not beg the question of how the program _detects_ the failure?
> ...
>I think I understand you to be saying that your approach allow the system
>to run in the presence of a new, untested library?  How does this differ
>(in the light of the sequel) from the "old way" ?

  One technique actually used was to have a human detect certain
errors by running with "EXperimental_Library" in his search path
before the tested libraries.  If a program-detectable error (a
mismatch, in practice) occurred, she got an error message.  If she
detected an error herself or received a message, she contacted the
author of the library or the system administrator (since that was
easy) before setting up a special referencing domain for the program
(which was not so easy, but at least possible).
  Humans would actually use >exl in their search paths, to be sure
of getting the newest versions of things.  Even I did. 

  There were other support facilities underneath the human tester,
obviously.  The most important was a translate-to-different-version
routine that the author of the new, improved library was required to
write for each incompatible data-structure of file-format change.

 --dave (I was discussing Mutlics, you understand) c-b
-- 
 David Collier-Brown.                 {mnetor|yetti|utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind)
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.

blarson@skat.usc.edu (Bob Larson) (10/20/87)

In article <1629@geac.UUCP> daveb@geac.UUCP (Dave Collier-Brown) writes:
>  One technique actually used was to have a human detect certain
>errors by running with "EXperimental_Library" in his search path
>before the tested libraries.  If a program-detectable error (a

This is possible to do this on (modern) Primos as well.  I've never
seen the code to rearange search rules for a single program, but it is
possible.  (Although it is probably easier to fix the new library...)
--
Bob Larson		Arpa: Blarson@Ecla.Usc.Edu
Uucp: {sdcrdcf,cit-vax}!oberon!skat!blarson		blarson@skat.usc.edu
Prime mailing list (requests):	info-prime-request%fns1@ecla.usc.edu