[comp.risks] RISKS DIGEST 9.66

risks@CSL.SRI.COM (RISKS Forum) (02/06/90)

RISKS-LIST: RISKS-FORUM Digest  Monday 5 February 1990   Volume 9 : Issue 66

        FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS 
   ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Contents:
  Another SAGE memoir (Jon Jacky)
  DoD plans another attack on the "software crisis" (Jon Jacky)
  The Cultural Dimensions of Educational Computing (Phil Agre)
  Vincennes' Aegis System: Why did RISKS ignore specifications? (R. Horn)
  Computer Virus Book of Records (Simson L. Garfinkel)
  Re: AT&T (Gene Spafford, David Keppel, Stanley Chow)
  Sendmail (Brian Kantor, Rayan Zachariassen, Geoffrey H. Cooper, 
    Kyle Jones, Craig Everhart)
  Re: Risks of Voicemail systems (Randall Davis)

The RISKS Forum is moderated.  Contributions should be relevant, sound, in good
taste, objective, coherent, concise, and nonrepetitious.  Diversity is welcome.
CONTRIBUTIONS to RISKS@CSL.SRI.COM, with relevant, substantive "Subject:" line
(otherwise they may be ignored).  REQUESTS to RISKS-Request@CSL.SRI.COM.
TO FTP VOL i ISSUE j:  ftp CRVAX.sri.com<CR>login anonymous<CR>AnyNonNullPW<CR>
  cd sys$user2:[risks]<CR>get risks-i.j .  Vol summaries now in risks-i.0 (j=0)

----------------------------------------------------------------------

Date:    Sun, 4 Feb 1990 14:46:36 PST
From: JON@GAFFER.RAD.WASHINGTON.EDU   (Jon Jacky)
Subject: Another SAGE memoir

Les Earnest's posting on SAGE reminded me of an anecdote James Fallows tells
in his book, NATIONAL DEFENSE (Vintage Books, 1982, p. 59):

  "As a child in California, I grew up five miles from SAGE headquarters at
  Norton Air Force Base.  Each year our classes would take school field trips
  to Norton.  The dramatic conclusion came when we were ushered into the SAGE
  control room.  The commanding general would appear at this point and attempt
  a demonstration of how quickly and reliably his system worked.  In every
  instance I can remember, there was a technical screw-up of some kind, and the
  general would lead us out, assuring us that, heh heh, this sort of thing did
  not happen very often."

Much of Fallows' book is a critique of technically complex weapons systems,
which many readers of this digest would find interesting.  Fallows summarizes
the SAGE story this way (also on p. 59):

  "Wouldn't it be wonderful if, instead of leaving aerial combat to a group of
  pilots trying to figure it out for themselves which enemy planes to destroy,
  the whole enterprise could be automatically controlled from the ground?  If
  you had a huge radar and computer complex, it might be able to identify all
  the "friendly" and "enemy" planes in the sky and rationally distribute
  assignments for shooting them.  Then it could transmit commands to each
  fighter plane, guiding it precisely to its target.  Visions of this sort lay
  behind a $20 billion radar complex of the sixties known as SAGE --- which,
  after countless revisions, finally foundered due to the technical complexity
  of devising a computer program that could keep the friendly and enemy planes
  straight.  Nonetheless, the Air Force and Navy have invested further billions
  in radar planes known as the AWACS and E-2, which face the far greater
  technical challenge of doing the same thing from a single plane in the air."

In a footnote Fallows says more of the technical difficulties of distinguishing
friend from foe:

  "The real problem was that since planes in a dogfight fly in unpredictable
  patterns, when two "blips" from two planes crossed on the radar screens the
  computer could not be sure which plane was which when they separated again.
  ..."

(Fallows doesn't give a source for this explanation of the foundering of SAGE;
the more usual explanation is that SAGE became obsolete after the Soviet Union
concentrated on aiming ICBM's, rather than manned bombers, at the continental
U.S.A.  I remember a local news story around 1982 saying they were finally
shutting down the SAGE installation at McChord Air Force Base near Seattle,
Washington).

- Jon Jacky, University of Washington

------------------------------

Date:    Sun, 4 Feb 1990 15:23:21 PST
From: JON@GAFFER.RAD.WASHINGTON.EDU   (Jon Jacky)
Subject: DoD plans another attack on the "software crisis"

Here are excerpts from ELECTRONICS ENGINEERING TIMES, Jan 29 1990, p. 16:

DOD PLAN ADDRESSES SOFTWARE PROBLEMS by Brian Robinson

Washington - The Defense Department is expected to go public next month with an
ambitious plan aimed at solving its growing software problems.  The product of
an agency-wide collaboration, the plan represents the first time the Pentagon
has managed to get a broad consensus on the issue.

The master plan, to be implemented over five years, will tackle the rapidly 
expanding size and development costs of defense software, a problem made worse 
by the tendency of different groups within the Pentagon to go their separate
ways when it comes to software requirements.

Some 20 groups within the defense community reportedly took part, including the
Army, Navy, Air Force, Defense Communications Agency, National Security Agency,
and the Defense Advanced Research Projects Agency.

The plan covers six major topics: software acquisition and life cycle
management, government software policies, organizational coordination and
cooperation, personnel, the software technology base and software technology
transition. ...

Pentagon analysts have predicted growing problems as military systems expand
in size and complexity and as projects are developed that require programs
with many millions of lines of code, the Strategic Defense Initiative 
being the prime example. ...

A House investigations subcommittee report late last year accused the Defense
Department and other federal agencies of putting lives at risk and wasting 
billions of dollars with substandard software ...

The National Research Council also condemned the current state of sofware and
development practices, contending that researchers in government and industry
had not kept up with the development of complex software systems.

The DoD will collect public comments on the plan at a forum April 3 -- 5 in 
Falls Church, VA.

(The article does not mention who, or which agency, was the source of this
story.  The article also does not mention any of the DoD agencies and projects
already charged with this problem, including STARS, SEI, AJPO, or the 1987
Defense Science Board study).

- Jon Jacky, University of Washington

------------------------------

Date: Sun, 4 Feb 90 18:05:30 199
From: "Phil Agre" <agre@gargoyle.uchicago.edu>
Subject: The Cultural Dimensions of Educational Computing

Anyone who is interested in technology as a cultural phenomenon will probably
want to read the following book:

C. A. Bowers, The Cultural Dimensions of Educational Computing: Understanding
the Non-Neutrality of Technology, Teacher's College Press, 1988.

It is often said that computers are neutral in that, like pencils and hammers,
they can be used for either good or evil.  This might be true on some possible
interpretations, but Bowers argues that it is false on a long list of others.
Specifically, he argues that particular computer systems for education often
incorporate unarticulated assumptions about computers, about thinking, about
society, and about the relations among these things, and that the use of these
systems can inculcate or reinforce the uncritical, even unwitting acceptance
of those assumptions by students and teachers alike.  He gives many examples,
and his arguments seem to me to apply equally well to a wide variety of other
applications of computers.  In developing his arguments, he touches on a wide
variety of critiques of technology and of computation as social phenomena.
Although he has done a valuable public service in presenting these ideas in
accessible ways, the principal weak spot of the book is a sometimes excessive
credulity toward the critiques themselves, which are not of uniform quality.
Reading him thus calls for a critical and selective attitude as well as an
open mind, but then it is exactly his point that we should bring such an
attitude to everything that concerns technology.  Highly recommended.

------------------------------

Date: Mon, 5 Feb 90 09:15 EST
From: HORN%HYDRA@sdi.polaroid.com
Subject: Vincennes' Aegis System: Why did RISKS ignore specifications? 

Recent continuation of the Vincennes controversy in the Naval press
spurs an observation:

     When faced with a catastrophic failure, the non-computer naval
     community is analyzing the system specifications and comparing
     them with the actions taken (albeit non-mathematically).  When
     faced with the same catastrophe, the RISKS community utterly
     ignored the specifications and lept into discussion of potential
     software flaws and changes.  This from a group where proof of
     conformance to specification has a strong following.

By specifications I refer not to the engineering documents used in building the
shipboard equipment.  I mean the laws and treaties governing the behaviour of
combatant and non-combatant in areas of conflict.  They did and do have direct
relevance to the computer systems.

There have been at least five relevant treaties covering such behaviour in the
last century.  There is a tremendous literature exploring issues and
alternatives.  Situations like the Vincennes are explicitly explored and
analyzed.  These are the recognized specifications for how all parties should
behave.

I am not interested in renewing controversy over the events.  Parties
interested in the relevant treaties and laws might start with the overview in
_International Law for Seagoing Officers_, Roberts, and branch out from there
into details and current discussions.

I am interested in some introspective analysis of why the computer community of
RISKS totally ignored the specifications.  Understanding this behaviour can
lead to understanding a common failure mode of computer systems.

Was it ignorance?  If so, why no requests for information?  Why such
willingness to proceed in ignorance?

Was it fear that the discussion of treaty and law might degenerate into
political arguments?  If so, how can systems involving political sensitivity be
subject to specification and analysis?

Was it other group dynamics?  If so, how can these be controlled productively?
(In my own case I would place this as dominant.  I became interested in the
discussions and overlooked the growing irrelevancy.  Furthermore, the group
dynamic discouraging radical departures from the current topic of discussion
made me hesitate to change topics.)

What other factors were involved?

I can think of only one invalid excuse.  Unlike most systems, these
specifications are readily available to the public world wide.

R Horn    horn%hydra@polaroid.com

  [I don't think the RISKS community completely ignored the `specifications'.
  See my summary of Matt Jaffe's discussion, RISKS-8.74, 26 May 1989.  PGN]

------------------------------

Date: 4 Feb 90 14:46:06 EST (Sun)
From: simsong@prose.CAMBRIDGE.MA.US (Simson L. Garfinkel)
Subject: Computer Virus Book of Records

(This is chart 74 from the National Center for Computer Crime Data's
1989 report, Commitment to security.)  Please forgive any typos.

$97,000,000	John McAfee's estimate of the cost of the "Internet Worm."
		[John McAfee is president of the Computer Virus Industry
		Association.]

$10,000,000	Cliff Stoll's estimate of cost of "Internet Worm."
		(See $100,000.)

250,000		Richard Brandow's estimate of the number of computers
		his "World Peace Virus" infected.

$250,000	Cost of reacting to "Internet Worm" at Los Alamos
		National Laboratory.

$200,000	Gene Spafford's estimate of cost of "Internet Worm."

168,000		Records destroyed by one computer trojan horse planted
		in Texas.

$100,000	Cliff Stoll's low bound estimate of cost of "Internet
		Worm" (see $97,000,000)

$72,500		NASA Ames' estimate of its loss from "Internet Worm"

8,000		Gene Spafford's estimate of personnel hours lost battling
		"Internet Worm"

6,000		NCCD's esimate of number of these hours which were not
		compensated.

6,000		Most common estimate of the number of computers
		affected by "Internet Worm."

3,000		Copies of "world peace" virus found in Aldus software

2,000		Cliff Stoll's estimate of the number of computers
		infected by the "Internet Worm."

800		Computer virus incidents reported to Computer Virus
		Industry Association in first 8 months of 1988 (see 96)

130		Countries in which computers were infected by
		"Christmas Tree" virus

96		Percentage of reports received by Computer Virus
		Industry Association which incorrectly identified viruses.

53		Percentage of National Center survey respondents who
		expected to be using anti-virus software in 1991

28		Articles about viruses listed in Reader's Guide to
		Periodical Literature in 1988 (see 1)

22		Percentage of National Center security survey
		respondents who were using anti-virus products in 1988
		(see 1.5)

21		Editorials arguing that the "Internet Worm"
		demonstrates the need for greater commitment to
		security (see 10).

14		Computer virus cartoons collected at NCCCD

10		States currently considering new computer crime laws
		to fight viruses

10		Letters to the editor saying we should applaud rather
		than punish those who set loose computer viruses (see 21)

8		Letters to the editor and columns calling for
		punishment of those who set loose computer viruses (see 10)

7		Editorials calling for tough law enforcement against
		computer virus vandals.

6		Books in English on viruses

5		Years since term "virus" was coined.

5		Publicized calls for computer ethics in light of
		"Internet Worm"

4		State computer crime law virus prosecutions

2		Civil suits over viruses

1.5		Percentage of national Center survey respondents who
		were using anti-virus products in 1985.  (See 22,53)

1		Articles about viruses listed in Reader's Guide to Periodical
		Literature in 1987 (see 28)

1		Federal computer crime law virus prosecutions.

------------------------------

From: spaf@cs.purdue.edu (Gene Spafford)
Subject: Re: AT&T (RISKS-9.62)
Date: 27 Jan 90 17:52:42 GMT

In article <CMM.0.88.633398480.risks@hercules.csl.sri.com> risks@csl.sri.com writes:
>From Telephony, Jan 22, 1990 p11:
>
>    The problem began the afternoon of Jan 15 when a piece of trunk
>    interface equipment developed internal problems for reasons that
>    have yet to be determined.

An interesting twist to this: several members of the media have gotten phone
calls from a rogue hacker claiming that he and a few friends had broken into
the NYC switch and were "looking around" at the time of the incident.

This raises two interesting (at least to me) possibilities: 

  1) They had, indeed, broken in, and were responsible for the crash.
     (Don't blindly accept published statements from AT&T that it
     was all a simple glitch.  Stories told off-the-record by law
     enforcement personnel and telco security indicate this kind of
     break-in is common.)

     If this is true, what to do from here?  Obviously, this raises
     some major security questions about how best to protect our phone
     systems.  It also raises some interesting social/legal questions.
     The nationwide losses here are probably greater than the Internet
     Worm, but the Federal Computer Crime and abuse act don't cover it
     (only one system tampered with).  Other laws maybe cover it, but
     is there any hope of proving it and prosecuting?

  2) These guys were not on the machine but are trying to get the
     press to publish their names as the ones responsible.  This would
     greatly enhance their image in the cracker/phreaker community.
     It's akin to having the Islamic Jihad call up and claim that a
     suicide caller had crashed the system (to protest dial-a-prayer
     and dial-a-porn, perhaps; remember that the Great Satan is a
     local call from NYC :-).   It raises interesting questions about how
     the press should handle such claims, and how we should react to them.

A third possibility exists, of course, that those guys had hacked into the
switch, but they had nothing to do with the failure.  That raises both sets of
questions.

I worry that it won't be long before this kind of thing happens and the phone
calls ARE from some terrorist group claiming responsibility: "We are holding
your dial tone hostage until you get your troops out of Panama, make abortion
illegal, stop killing animals for fur, and prevent Peter Neumann from making
more puns."

Or, perhaps AT&T security gets a call like: "We've planted a logic bomb in the
switching code.  Put $1 million in small unmarked bills in the following locker
at the bus station, or in 4 hours every call made in Boston will get routed to
dial-a-porn numbers in NYC.  We'll tell you how to fix it as soon as we get the
money."

Any bets that something like this will happen this year?  Last year's WANK worm
and politically-motivated viruses seem to suggest the time is ripe.

Gene Spafford, NSF/Purdue/U of Florida  Software Engineering Research Center,
Dept. of Computer Sciences, Purdue University, W. Lafayette IN 47907-2004
                         uucp	...!{decwrl,gatech,ucbvax}!purdue!spaf

    [By the way, AT&T is certain it was an open&shut (a no-pun&shut?) case of 
    a hardware-triggered software flaw, reproducible in the testbed ...  PGN]

------------------------------

Date: 1 Feb 90 02:26:40 GMT
From: pardo@cs.washington.edu (David Keppel)
Subject: Re: AT&T (RISKS-9.63)

In RISKS 9:63, Willaim Murray (WHMurray.Catwalk@DOCKMASTER.NCSC.MIL) writes:
>AUTOMATE ONLY THOSE THINGS THAT HAPPEN WITH SUFFICIENT FREQUENCY THAT
>AUTOMATION IS JUSTIFIED.  AVOID GRATUITOUS AUTOMATION.

``Sufficient frequency'' should be qualified.  When I was a system manager I
was frequently glad that our Unix would reboot itself after minor panics.

That reminds me of a chap named Ferguson who built cars in the 60's.  They had
4-wheel drive and anti-skid braking.  He said ``They [the safety features] only
need to save your life *once* to have them pay for themselves.''

       		;-D on  ( Rhino boot )  Pardo

------------------------------

Date: 	Wed, 31 Jan 90 00:30:14 EST
From: Stanley Chow <schow@bcarh185.BNR.CA>
Subject: Re: AT&T (RISKS-9.62)

I would like to make a comment regarding the AT&T incident. 

I want to state clearly that I work for Bell-Northern Research. We are the R&D
arm of Northern Telecom, which happens to be in hot competition with AT&T. In
particular, I work on the DMS switches, for which 4ESS is the prime
competition.  In no way do I represent the official views of BNR or NT. What
follows is strictly the observations of a computer professional.

The article does not answer the key question:

   How can such a simple and REPRODUCIBLE bug be released to the
   field? Especially in such a critical arena?

Note that a number of things had to happen:

 1) The failure of a single piece of equipment turns into the
    (perceived) failure of all equipment at the same site. Thus,
    "Multiplying" the problem.

    I.e., recovery of a single trunk ends up sending messages out through
    ALL trunks.

 2) Recovery action of "healthy" sites ends up "Spreading" the problem.

    The recovery of a truck should not cause the collapse of the whole site.

 3) Testing does not catch the problem.

    The problem is reproducible and should have been caught in a pre-release
    simulated real-life testing of the error recovery system.

All of these are likely flaws are in any system. Error recovery is rarely
needed - on most systems, you just reboot the machine or just logoff/logon to
your account again. As a result, most people don't think about error recovery
much less test it. Unfortunately, error recovery is very difficult to design
and even harder to test.

Stanley Chow, BNR BitNet:  schow@BNR.CA  UUCP: ..!psuvax1!BNR.CA.bitnet!schow
      (613) 763-2831		     ..!utgpu!bnr-vpa!bnr-rsc!schow%bcarh185

------------------------------

Date: Fri, 2 Feb 90 22:12:52 -0800
From: brian@ucsd.edu (Brian Kantor)
Subject: Re: sendmail flaw

It is not an SMTP requirement to complete delivery between receipt of the "."
signifying the end of the message and returning the "250 OK" message.  It is
perfectly valid to simply store the message and return the OK; you do NOT have
to deliver in real time while the sender waits.

That sendmail often does this is perhaps a common flaw, but don't confuse it
with any RFC requirement!  It's valid to accept the message and mail back a
delivery failure later.  Probably sendmail should do that.

The most common cause of long waits is expanding mailing lists;
sometimes this takes so long that the sender times out and resends the
message on the assumption that it's failed.  However, the recipient
sendmail believes it to have succeeded, so lots of people gets lots of
copies of the message until such time as the network environment lets
things happen within the timeout limits.

We at UCSD have most of our mailing lists explode in deferred time so
that an incoming message for one of them is just stored and immediately
acknowledged.  It is then delivered later.

Another common reason for long waits between "." and "250 OK" is the
time taken to process headers; if that invokes calls to the system
nameserver to look something up in the DNS, there might well be a
delay.  That's not good planning but lots of people do it.

Sending sites probably ought to have their timeout set to around 15
minutes to a couple of hours to avoid these problems, at least until
sendmail is fixed.

Finally, sendmail has a tendency to do invalid longjumps on timeouts of
various kinds: occasionally the stack is buggered and the longjump
winds up causing it to die horribly, leaving the list of delivered
addresses un-updated.  Then the next queue run happens and the people
early on the list get the message again and again and again....

I've heard rumors of a patch to sendmail that makes it checkpoint the
delivery list (the qf file) after every successful delivery.  That
solves that problem, but it's really a bandaid on the bad longjump
problem.  I don't have a copy of that patch.

I know people swear at sendmail; it's a difficult program to understand
and it's been worked on by a lot of people, so some degree of bit-rot
has indeed set in.  But I'm pleased that it works as well as it does.
It just happens to be one of those programs that causes maximum user
annoyance when it goes wrong.
                         			brian@ucsd.edu ucsd!brian
Brian Kantor, UCSD Network Operations, UCSD C-024, La Jolla, CA 92093-0124

------------------------------

Date: 	Sat, 3 Feb 90 16:37:14 EST
From: rayan@cs.toronto.edu (Rayan Zachariassen)
Subject: Re: Sendmail Flaw 

I would rephrase the SMTP problem slightly, and draw different conclusions:

After the message terminator ("." CRLF) has been sent, there are three
possible states:

1.  The server SMTP crashes before accepting responsibility for delivery
    (defined by receipt of an OK code at the client SMTP).
2.  The server SMTP crashes after accepting responsibility for delivery
    but before it can deliver the OK code to the client SMTP.
3.  The server SMTP doesn't crash.

What makes this bad is that during synchronous delivery, the final
acceptance OK code isn't returned until the server SMTP has delivered
the message to its recipients.  If the recipient is really an address
exploder, some addresses may be processed to completion before the
server SMTP crashes.  This is a state 1 condition because the server
SMTP has implicitly accepted responsibility for delivery to *some* of
the recipients of the message, but not yet all.

There is also a vulnerable window in state 2 above.  You would think
that the window is very small, but there is ample opportunity for a
swapout or some other act-of-God delay in the execution of the
acknowledgement delivery, during which time the server can crash.

Both of these seem to happen more frequently that people thought.

# (BTW, it was decided in SMTP's design that it was better to have multiple
# messages than to have messages get lost, so it is not considered acceptable
# for the SMTP server to queue the message but not deliver it [synchronously]).

On the contrary, the server SMTP may do anything it wants as long as it
takes responsibility for delivery of the message.  In particular this
means using asynchronous delivery, after simply queueing the message to
decrease the vulnerable window (of state 1).  Some people like the 'real-time'
feedback of synchronous delivery, but it is a dangerous thing to like
given the cost.

There are economic arguments for doing synchronous address verification
in the SMTP protocol (if you are on a volume-charged network, you don't
want to transfer the message data until you know the server SMTP knows
what to do with the message), but doing so also leads to instability on
client SMTP computers as queues build up waiting for a slow server.

Barring economic/bandwidth issues, in message transfer the HOT ROCK
model is very appropriate: you try to get rid of a queued message as
quickly as possible, by almost any means.  This requires asynchronous
checking and delivery in server SMTPs.

See also RFC1047 by Craig Partridge on "Duplicate messages and SMTP".

rayan

------------------------------

Date: Sat, 3 Feb 90 18:43:03 PST
From: geof@aurora.com (Geoffrey H. Cooper)
Subject: Re: Re: sendmail flaw

Thanks for your message.

 >      It is not an SMTP requirement to complete delivery between receipt of
 >      the "." signifying the end of the message and returning the "250 OK"
 >      message.

I stand corrected.  The problem I bring up is in the design of the
protocol and the consequent generalization to software design.  It
would be different if the protocol spec SPECIFIED that no processing
was to be done during this time.  That would certainly diminish the
problem, and one could make the valid argument that this fixes the
problem "enough." After all, by the same top-level reliability
argument, SMTP itself can never guarantee truly reliable mail (only
the sender and recipient of the mail can do that). 

 >      Another common reason for long waits between "." and "250 OK" is the
 >      time taken to process headers; if that invokes calls to the system
 >      NAMESERVER to look something up in the DNS, there might well be a
 >      delay.  That's not good planning but lots of people do it.

(my emphasis added) That one is interesting, since SMTP (and, I admit,
my significant exposure to it) somewhat predates domain naming.  An
example where a lingering bug in a design is made worse by changing
system requirements. 

 >      I know people swear at sendmail.

I'm not one of them.  Although I hate debugging sendmail scripts as
much as the next system type, I'd much rather do that than deal with
binary distribution software that is non-configurable.  After all, my
systems requirements change from time to time, too.

- Geof

------------------------------

Date: Sat, 3 Feb 90 21:09:53 EST
From: kyle@cs.odu.edu (Kyle Jones)
Subject: re: Sendmail Flaw

In RISKS 9.65, Geoffrey H. Cooper writes:
 > The sendmail problem to which our moderator frequently refers is
 > actually a design problem in the SMTP protocol [...]
 >
 > (BTW, it was decided in SMTP's design that it was better to have
 > multiple messages than to have messages get lost, so it is not
 > considered acceptable for the SMTP server to queue the message but
 > not deliver it during the pause in [5]).

I never knew of such a design decision.  It's certainly not applicable
to the mail on the Internet today, considering that the domain system
allows mail to be sent to hosts on networks not directly connected to
the Internet.  Queueing is inevitable since there is no way for the
SMTP-server to wait for final delivery on a network that does not
support notification of that event.

RFC 821, page 2:

   When the recipients have been negotiated the SMTP-sender sends the
   mail data, terminating with a special sequence.  If the SMTP-receiver
   successfully processes the mail data it responds with an OK reply.

Note the word used is "processes", not "delivers".

The RFC also specifies that if the server finds that it can deliver to
some of the recipients but not others, then it should respond with an OK
reply, but also compose and send an "undeliverable mail" notification
message back to the original sender of the message.

If I were writing an SMTP-server I would take take the above as an
invitation to queue the message after doing a cursory check of the
recipient addresses, send an OK reply, and dispose of the message at
leisure, sending error notifications as necessary.

kyle jones   <kyle@cs.odu.edu>   ...!uunet!talos!kjones

------------------------------

Date: Mon,  5 Feb 90 10:11:48 -0500 (EST)
From: Craig_Everhart@transarc.com
Subject: Re: Sendmail Flaw

Certainly Geof Cooper's problem is inherent in SMTP, but I assumed that PGN's
distribution problems were more sendmail-specific.  To wit: sendmail maintains
each outgoing mail message as a pair of files, a qfXXX file listing headers and
recipients and a dfXXX file listing the mail body (where the two values of XXX
match).  Sendmail processes an outgoing mail request by locking the qf/df pair
(well, the XXX value) and attempting delivery to each of the recipients listed
in the qfXXX file.  When it's made an attempt on each recipient, it writes a
new qfXXX file recording the recipients to which the mail has yet to be
delivered.

In our environment, sendmail executions got interrupted all the time: we
rebooted our mail-handling servers daily, and our sendmail processes would get
stuck on an SMTP connection for all kinds of reasons.  Thus, when our sendmail
would start processing a message with many recipients, its run would often be
interrupted before it had made a complete pass through all the recipients; in
such cases, it would never record the fact that delivery was successful to any
of the recipients.  The next time sendmail started processing that long list of
recipients, it would try 'em all again: bingo, duplicates.

My solution was to have sendmail update the qfXXX file (containing the list of
recipients) after every successful delivery.  This required a little
source-code hacking, but it was very helpful for us.  Not only did we stop
generating lots of duplicate mail, but we also reduced our mail-processing load
so that processing of those many-recipient lists would terminate!
                                                    		    Craig Everhart

------------------------------

Date: Wed, 24 Jan 90 20:11:14 est
From: davis@ai.mit.edu (Randall Davis)
Subject: Re: Risks of Voicemail systems (RISKS-9.61)

  Date: Thu, 18 Jan 90 08:24:18 EST
  From: r.aminzade@lynx.northeastern.edu
  Subject: Risks of Voicemail systems that expect a human at the other end

  Last night my car had a dead battery (I left the lights on -- something that
  a very simple piece of digital circuitry could have prevented...

Yes, indeed.  And the first time that piece of circuitry failed in any
interesting, amusing, or dangerous way, 40 people would send articles to RISKS
deploring the inexorable trend toward technological overkill in today's
society, suggesting how dumb the engineers were to have replaced the good, old
fashioned manual switches, and pointing out how that sort of failure NEVER
happened with manual switches.

They would of course be right (manual switches fail differently) and they
would have forgotten all the dead batteries that didn't happen.

Three morals:

    Accidents that don't happen rarely make it into the papers, the
    public consciousness, or get factored into the ire over failures.

    As Don Norman put it rather nicely a while back, the baseline on any
    technology isn't zero defects.  Nothing is perfect now, and for any
    change the relevant question is how it works, how it fails, and whether
    on balance it's better than what we had; not whether it's perfect.

    There is no free lunch: if you want the convenience, you have to accept
    the attendant, inevitable risks.

Applying this to the phone system failure, the only perfectly reliable
communication medium is none at all; you have to be in the same room with
someone.  If you want to be someplace else and talk to them, you have to accept
the risk of malfunction.  And you want direct dial international calls, call
waiting, one-touch memory dialing, conference calls, and call forwarding, too?
Then accept the risks inherent in increased complexity that inevitably come
along.  It won't be perfect, but you might be better off than you were.

------------------------------

End of RISKS-FORUM Digest 9.66
************************