[comp.lang.misc] Need reference for "firewall" modularization

masticol@cadenza.rutgers.edu (Steve Masticola) (10/25/90)

Can anyone help out with a reference to "firewalls"? A professor here
says they're a modularization structure which is intended to stop the
spread of error effects within a software system. Unfortunately, he
doesn't remember where he saw the reference.

Thanks for your help!

- Steve (masticol@athos.rutgers.edu)

strom@arnor.uucp (10/26/90)

In article <Oct.24.22.04.05.1990.393@cadenza.rutgers.edu>, masticol@cadenza.rutgers.edu (Steve Masticola) writes:
|> Can anyone help out with a reference to "firewalls"? A professor here
|> says they're a modularization structure which is intended to stop the
|> spread of error effects within a software system. Unfortunately, he
|> doesn't remember where he saw the reference.
|> 
|> Thanks for your help!
|> 
|> - Steve (masticol@athos.rutgers.edu)

I don't know if this is the reference intended by your professor, but I used
this term in my paper with Shaula Yemini, ``Typestate: A Programming Language
Concept for Enhancing Software Reliability'' (IEEE Trans. Software Eng.,
SE-12, 1, January 1986).

As you point out, a firewall is a form of protection.  If programs A and B
are running together on the same machine, and program A contains an error,
it is desirable to confine the effects of the error so that program B
is not affected.  Having firewalls improves reliability (since program
B will still work), security (since program A cannot sabotage program B),
and problem determination (if program A misbehaves, I need not search
in B for the possible cause).  All other things being equal, the
finer-grained your firewalls, the better.

The most common firewalls are the address spaces provided by operating
systems---e.g. UNIX processes, Mach tasks, VM ``virtual machines''.
These firewalls are effective, but heavyweight.  Communication
across address spaces is more expensive than communication within
address spaces.  It would not be practical to put each *module* of
a large software system in a separate address space.

We would ideally like firewalls to have a granularity as small as a single
module, but without any performance penalty.  Our approach is to
rely on compile-time checking to detect those kinds of programming
errors which, if undetected, would result in undefined, 
implementation-dependent side-effects on other modules.  
Conventional type-checking is inadequate, since
many program bugs are the result of issuing otherwise correct operations
*in the wrong order* --- e.g. storing into a buffer before it has been
allocated.  *Typestate* checking is a dataflow technique which
statically identifies which subset of operations on a particular data
object are legal at which program points.  When an operation is
issued from an incorrect context, it is flagged as an error.  

We incorporated typestate checking in an experimental language called
NIL (Network Implementation Language), and in Hermes.  In these languages,
any module which successfully compiles is guaranteed at execution time
not to corrupt other modules, even if these modules are running in the
same address space.  We thus get the effect of fine-grained ``firewalls''
between modules without the performance penalty.   Modules belonging
to different applications can safely coexist in one address space,
but can communicate as cheaply as modules of the same application.
As an additional benefit, we catch an additional class of programming
errors at compile-time.

What do you give up to get typestate checking?  First, typestate checking
requires that any aliasing of variables be detectable by the compiler.
As has already been discussed on comp.lang.misc, Hermes is a pointer-free
and hence alias-free language, so this requirement was already met.
Other languages would have to be restricted to meet this requirement.
Second, you have to structure your program to avoid ambiguous typestates.
For example, a variable which is initialized on some paths to a
statement and uninitialized on other paths would have to be explicitly
declared as a variant.  In our experience, the potential for
error detection, the gain in efficiency of cross-application communication,
and the security/reliability/debugging advantages of firewalls are
benefits which are well worth the costs.


-- 
Rob Strom, strom@ibm.com, (914) 784-7641
IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY  10958

nick@cs.edinburgh.ac.uk (Nick Rothwell) (10/26/90)

In article <1990Oct25.193935.375@arnor.uucp>, strom@arnor.uucp writes:
> Conventional type-checking is inadequate, since
> many program bugs are the result of issuing otherwise correct operations
> *in the wrong order* --- e.g. storing into a buffer before it has been
> allocated.

Just a minor point here - that's a fault with conventional procedural
languages with assignable variables, and nothing to do with typechecking.
Functional and logic languages don't have this problem at all.

> In our experience, the potential for
> error detection, the gain in efficiency of cross-application communication,
> and the security/reliability/debugging advantages of firewalls are
> benefits which are well worth the costs.


-- 
Nick Rothwell,	Laboratory for Foundations of Computer Science, Edinburgh.
		nick@lfcs.ed.ac.uk    <Atlantic Ocean>!mcsun!ukc!lfcs!nick
~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~
 "Now remember - and this is most important - you must think in Russian."

gudeman@cs.arizona.edu (David Gudeman) (10/27/90)

In article  <961@skye.cs.ed.ac.uk> nick@cs.edinburgh.ac.uk (Nick Rothwell) writes:
]In article <1990Oct25.193935.375@arnor.uucp>, strom@arnor.uucp writes:
]> many program bugs are the result of issuing otherwise correct operations
]> *in the wrong order*

]Just a minor point here - that's a fault with conventional procedural
]languages with assignable variables, and nothing to do with typechecking.
]Functional and logic languages don't have this problem at all.

First, some functional languages and the only well-known logic
language (Prolog) certainly _do_ have this problem.  If you do things
in the wrong order, you may get peculiar results like non-termination.

Second, the implication that this would show some sort of superiority
for non-procedural languages is just wrong.  I could just as well say
"most problems in functional programs are caused by composing the
right functions in the wrong ways -- procedural languages don't have
this problem since you can't compose functions."  or "most problems in
logic languages are caused by unifying otherwise correct terms with
the wrong variables -- procedural languages don't have this problem
since they don't have unification".

Basically someone screwed-up somewhere, and changing the paradigm
just changes the nature of the screw-ups, not the quantity or the
quality.  This is not to say that changing the language itself can't
reduce errors, just that there is no reason to assume that functional
or logic languages by the fact of being function-based or predicate
logic-based reduce errors.  To be sure, functional and logic languages
tend to be higher-level than procedural languages, and higher-level
languages seem to reduce errors, but this matter of semantic level is
independent of whether the language is based on functions, predicates,
procedures, or some combination of the above.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

zed@mdbs.uucp (Bill Smith) (10/28/90)

In article <1990Oct25.193935.375@arnor.uucp> strom@andreadoria.watson.ibm.com (Rob Strom) writes:
>
>In article <Oct.24.22.04.05.1990.393@cadenza.rutgers.edu>, masticol@cadenza.rutgers.edu (Steve Masticola) writes:
>|> Can anyone help out with a reference to "firewalls"? A professor here
>|> says they're a modularization structure which is intended to stop the
>|> spread of error effects within a software system. Unfortunately, he
>|> doesn't remember where he saw the reference.
>|> 
>|> Thanks for your help!
>|> 
>|> - Steve (masticol@athos.rutgers.edu)
>
>I don't know if this is the reference intended by your professor, but I used
>this term in my paper with Shaula Yemini, ``Typestate: A Programming Language
>Concept for Enhancing Software Reliability'' (IEEE Trans. Software Eng.,
>SE-12, 1, January 1986).
>
>As you point out, a firewall is a form of protection.  If programs A and B
>are running together on the same machine, and program A contains an error,
>it is desirable to confine the effects of the error so that program B
>is not affected.  Having firewalls improves reliability (since program
>B will still work), security (since program A cannot sabotage program B),
>and problem determination (if program A misbehaves, I need not search
>in B for the possible cause).  

I find this an excellent philosophy for life under pressure although I 
haven't empirically determined it's worth.  It might be too rigid for
me, personally.  (Of course, I'me too rigid for me, personally too. ;-)

The idea I mean is that each person has to have their own private self
that they keep alone from anyone else.  In this way, they become one 
with their private self and are able to keep it in whatever shape that
want to.  "Give me some space." is a slang equivalent to "You're trying
to beat on my firewall.  Let me keep myself reliable so that I'm able
to be of use to you when you need me.  I still love you, but everyone
has their limits too."  

>All other things being equal, the
>finer-grained your firewalls, the better.

Each firewall should be set according to the desires of the one they
protect.  If a program doesn't work with big, cement firewalls, then
it shouldn't have them.  "Tear down the walls." But, the walls have
to be there until every works the way the program requires.

>We would ideally like firewalls to have a granularity as small as a single
>module, but without any performance penalty.  Our approach is to
>rely on compile-time checking to detect those kinds of programming
>errors which, if undetected, would result in undefined, 
>implementation-dependent side-effects on other modules.  

Well, run time checking will always be necessary.  Someone might spill
some some Coke on the motherboard.  (or spill some coke down your nose... ;-)
You don't know what might happen, so it's hard to be able to have a program
without some run-time firewalls, some boundaries that are always set to
prevent it from hurting itself by reformatting the hardware.  This is
the essence of fault tolerance.

>Conventional type-checking is inadequate, since
>many program bugs are the result of issuing otherwise correct operations
>*in the wrong order* --- e.g. storing into a buffer before it has been
>allocated.  

A program has to be willing to work with the operating system first, then
with itself, not the other way around.  If you don't know your own software,
how can you be sure how you relate to the operating system.  I think I
know myself, but do I really?  How can I KNOW myself?  I don't know what
I'll do next.  I pray that it will be good for me and good for others,
but I must take a leap of faith that the OS has been designed by a
good OS development team.  Even if there is "proof" that my program and
the OS work together, I'll still have to take a leap of faith to accept
the proof, the proof system or whatever it is that makes me sure.

>*Typestate* checking is a dataflow technique which
>statically identifies which subset of operations on a particular data
>object are legal at which program points.  

Personifying this technique, what I'll do is (somehow) find out ahead of
time a set of rules that each person (program) needs to keep his or her
firewalls in good shape.  These rules will have to be appropriate for
a particular data object (person) and chosen by that person (program points).
The typestate (person) is up to the situation at a given time to follow
to maintain reality checks (assertions in CS lingo) so that if a violation
occurs somewhere, the firewall may be activated in its pure force to support
a safe landing and prevent program (program) crashes.

>When an operation is
>issued from an incorrect context, it is flagged as an error.  

I will complain if you hit me, but is it an error?  Only by examining the 
context will we be able to know for sure.  Until you are able to read my
mind without even looking at me (I hope you will soon) you won't know 
my context.

>We incorporated typestate checking in an experimental language called
>NIL (Network Implementation Language), and in Hermes.  In these languages,
>any module which successfully compiles is guaranteed at execution time
>not to corrupt other modules, even if these modules are running in the
>same address space.  We thus get the effect of fine-grained ``firewalls''
>between modules without the performance penalty.   Modules belonging
>to different applications can safely coexist in one address space,
>but can communicate as cheaply as modules of the same application.
>As an additional benefit, we catch an additional class of programming
>errors at compile-time.

Wow!  I am impressed with your creativity.  Please do not send me
any information about the implementation details of your project so
that I am not obligated to IBM for possible infringement unless you have
patented these ideas.

>What do you give up to get typestate checking?  First, typestate checking
>requires that any aliasing of variables be detectable by the compiler.
>As has already been discussed on comp.lang.misc, Hermes is a pointer-free
>and hence alias-free language, so this requirement was already met.
>Other languages would have to be restricted to meet this requirement.

What is required is self-discipline not laws and restrictions.

>Second, you have to structure your program to avoid ambiguous typestates.
>For example, a variable which is initialized on some paths to a
>statement and uninitialized on other paths would have to be explicitly
>declared as a variant.  

Life does not seem to have these requirements.  All names (aliases) belong
to God who prevents the confusion that could result from literal interpretation
of each sound uttered.  

>In our experience, the potential for
>error detection, the gain in efficiency of cross-application communication,
>and the security/reliability/debugging advantages of firewalls are
>benefits which are well worth the costs.

Ambiguity is inherent in life.  Life is without price.

>Rob Strom, strom@ibm.com, (914) 784-7641
>IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY  10958

Bill Smith
pur-ee!mdbs!zed
[Specific disclaimer: The use I would like to put these ideas
 is not part of any project planned by the management of mdbs Inc.]

nick@cs.edinburgh.ac.uk (Nick Rothwell) (10/29/90)

In article <26865@megaron.cs.arizona.edu>, gudeman@cs.arizona.edu (David Gudeman) writes:
> In article  <961@skye.cs.ed.ac.uk> nick@cs.edinburgh.ac.uk (Nick Rothwell) writes:
> ]In article <1990Oct25.193935.375@arnor.uucp>, strom@arnor.uucp writes:
> ]> many program bugs are the result of issuing otherwise correct operations
> ]> *in the wrong order*
> 
> ]Just a minor point here - that's a fault with conventional procedural
> ]languages with assignable variables, and nothing to do with typechecking.
> ]Functional and logic languages don't have this problem at all.
> 
> First, some functional languages and the only well-known logic
> language (Prolog) certainly _do_ have this problem.  If you do things
> in the wrong order, you may get peculiar results like non-termination.

Ok, I stand corrected. The way I read the original article is that there
are problems with referring to variables which are unassigned or which
go out of scope (dangling pointers and the like). Higher-level languages
don't have these problems. But, yes, you still have to "do things in
the right order" in the sense you mean.

> Basically someone screwed-up somewhere, and changing the paradigm
> just changes the nature of the screw-ups, not the quantity or the
> quality.  This is not to say that changing the language itself can't
> reduce errors, just that there is no reason to assume that functional
> or logic languages by the fact of being function-based or predicate
> logic-based reduce errors.

True, but I think the fact that assignment is non-existent (or kept
to a minimum), the fact that the languages are garbage-collected and
heap-safe, make them "better" in this sense. That isn't an argument
about which paradigm to use (and that wasn't really the impression
I wanted to give).


> 					David Gudeman

		Nick.

-- 
Nick Rothwell,	Laboratory for Foundations of Computer Science, Edinburgh.
		nick@lfcs.ed.ac.uk    <Atlantic Ocean>!mcsun!ukc!lfcs!nick
~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~
 "Now remember - and this is most important - you must think in Russian."

gudeman@cs.arizona.edu (David Gudeman) (10/30/90)

In article  <1087@skye.cs.ed.ac.uk> nick@cs.edinburgh.ac.uk (Nick Rothwell) writes:
]Ok, I stand corrected. The way I read the original article is that there
]are problems with referring to variables which are unassigned or which
]go out of scope (dangling pointers and the like). Higher-level languages
]don't have these problems...

If that's what you meant, then I agree, assuming we are using the same
definition of "higher-level" (but I don't think we are).  When you say

]...the fact that assignment is non-existent (or kept
]to a minimum),

I suspect that you have a mental linkage between the term
"higher-level" and the term "applicative".  You aren't alone in this,
but I think there are two distinct concepts there, and they should be
kept seperate.  I _will_ agree with

]the fact that the languages are garbage-collected and
]heap-safe, make them "better" in this sense.

since if you don't have automatic storage managment, then you are
extremely limited in the types of first-class objects you can have.
In fact, I am tempted to define "higher-level" in terms of the
built-in data types the language suports.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

nick@cs.edinburgh.ac.uk (Nick Rothwell) (10/30/90)

In article <26931@megaron.cs.arizona.edu>, gudeman@cs.arizona.edu (David Gudeman) writes:
> In article  <1087@skye.cs.ed.ac.uk> nick@cs.edinburgh.ac.uk (Nick Rothwell) writes:
> I suspect that you have a mental linkage between the term
> "higher-level" and the term "applicative".  You aren't alone in this,
> but I think there are two distinct concepts there, and they should be
> kept seperate.

You're probably right; that's because the properties I associate with
higher level languages (less restrictions on built-in datatypes, first-class
status of data objects, extensible types, heap security, abstraction,
interfaces, modularisation and so on) are mostly seen in applicative
languages. I'm sure that a non-applicative language could support these
properties as well, but I'm not aware of one (although Eiffel comes close,
I suppose, and Modula-3, although it's fairly conventional).

> since if you don't have automatic storage managment, then you are
> extremely limited in the types of first-class objects you can have.
> In fact, I am tempted to define "higher-level" in terms of the
> built-in data types the language suports.

... and how it allows them to be used (as arguments, results, via
polymorphism, in abstractions, and so on). I'd also judge the level of the
language by the sophistication, flexibility, and *soundness* of the type
system (which excludes a lot of languages).

Note that I've refrained from mentioning "pointers"... :-)

> 					David Gudeman

-- 
Nick Rothwell,	Laboratory for Foundations of Computer Science, Edinburgh.
		nick@lfcs.ed.ac.uk    <Atlantic Ocean>!mcsun!ukc!lfcs!nick
~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~
 "Now remember - and this is most important - you must think in Russian."

strom@arnor.uucp (11/01/90)

In article <1132@skye.cs.ed.ac.uk>, nick@cs.edinburgh.ac.uk (Nick Rothwell) writes:
|> In article <26931@megaron.cs.arizona.edu>, gudeman@cs.arizona.edu (David Gudeman) writes:
|> > In article  <1087@skye.cs.ed.ac.uk> nick@cs.edinburgh.ac.uk (Nick Rothwell) writes:
|> > I suspect that you have a mental linkage between the term
|> > "higher-level" and the term "applicative".  You aren't alone in this,
|> > but I think there are two distinct concepts there, and they should be
|> > kept seperate.
|> 

I agree with David.  Applicative languages are assignment-free languages based
on function application.  High-level languages are languages in which machine
representations are hidden from the programmer --- the compiler is free to
choose data representations and to
apply ``aggressive optimizations''.  Low-level languages retain
``performance transparency'' --- the property that the reader of the
source program can determine what the implementation will be doing at least
to the degree that performance can be estimated.  Applicative languages
and imperative languages can be either high or low level.

|> You're probably right; that's because the properties I associate with
|> higher level languages (less restrictions on built-in datatypes, first-class
|> status of data objects, extensible types, heap security, abstraction,
|> interfaces, modularisation and so on) are mostly seen in applicative
|> languages. I'm sure that a non-applicative language could support these
|> properties as well, but I'm not aware of one (although Eiffel comes close,
|> I suppose, and Modula-3, although it's fairly conventional).
|> 

Hermes meets all the requirements that you list. 
(1) All datatypes, including builtin datatypes, are machine-independent.
Word size, structure layout, bit/byte order, etc. do not show through.
The implementation is free to use clever representations (e.g. storing
only a single copy of a large structure).  I am assuming that this
is what you meant by "less restrictions on builtin datatypes".
(2) All types are first-class.  That is, they can be put in tables,
sent in messages, passed as parameters,  etc.  
(3) There is a type-definition mechanism and a powerful set of
type constructors for tuples, tables, call-messages, etc.
(4) Storage leaks are avoided through typestate checking (see my earlier
posting in this thread).  This guarantees that all objects are finalized
on termination of a process.
(5) Super-lightweight processes provide "abstraction, interfaces,
modularisation and so on".

Hermes is strictly imperative.  Each process has variables and
assignments.  These variables are not visible to other processes, however.

Wirth, who developed Pascal and was a designer and advisor
for the Modula-n efforts,  supported performance transparency
and opposed complex compilers.  Our research group is exploring the
opposite philosophy.   We believe that hiding low-level details
makes programming easier, and programs more portable.  We conjecture
that starting from a more abstract model facilitates optimizations
that will generate efficient implementations on
diverse target platforms.  We're willing to give up 
performance transparency (most of the time) in exchange for performance. 

We are currently exploring some ``aggressive optimizations''
for distributed and multithreaded environments.   An example is
transparent process replication to increase concurrency and
reduce communications costs of a process which at the source level
looks like a performance bottleneck.

Other imperative languages (e.g. SETL, CLU, APL) are high-level.  
I therefore disagree that high-level in practice implies applicative.

|> > since if you don't have automatic storage managment, then you are
|> > extremely limited in the types of first-class objects you can have.
|> > In fact, I am tempted to define "higher-level" in terms of the
|> > built-in data types the language suports.
|> 
|> ... and how it allows them to be used (as arguments, results, via
|> polymorphism, in abstractions, and so on). I'd also judge the level of the
|> language by the sophistication, flexibility, and *soundness* of the type
|> system (which excludes a lot of languages).
|> 

I understand *soundness* for inferencing/checking techniques but can you
elaborate on what it means for the type system itself to be sound?  

|> Note that I've refrained from mentioning "pointers"... :-)
|> 
|> > 					David Gudeman
|> 
|> -- 
|> Nick Rothwell,	Laboratory for Foundations of Computer Science, Edinburgh.
|> 		nick@lfcs.ed.ac.uk    <Atlantic Ocean>!mcsun!ukc!lfcs!nick


-- 
Rob Strom, strom@ibm.com, (914) 784-7641
IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY  10958