[net.lang.mod2] Verbatim Mail Messages - Trichotomous String Comparison

PATTIS@WASHINGTON.ARPA (Richard Pattis) (07/10/86)

As promised in my previous message, here they are.  I apologize if my selective
display did not adequately communicate their actual contents.

Rich

--------------------------------------------------------------------------------

29-May-86 22:56:47-PDT,5665;000000000001
Return-Path: <PATTIS@WASHINGTON.ARPA>
Received: from ur-cayuga.rochester.arpa (ROCHESTER.ARPA.#Internet) by WASHINGTON.ARPA with TCP; Thu 29 May 86 22:56:39-PDT
Received: from ur-seneca.rochester.arpa (ur-seneca) by ur-cayuga.rochester.arpa id AA05496 (4.12w); Fri, 30 May 86 01:54:55 edt
Received: from ur-cayuga.rochester.arpa (ur-cayuga) by ur-seneca.rochester.arpa id AA10675 (4.12w); Fri, 30 May 86 01:56:09 edt
Errors-To: owner-info-modula-2@rochester.arpa
Received: from WASHINGTON.ARPA by ur-cayuga.rochester.arpa id AA05492 (4.12w); Fri, 30 May 86 01:54:32 edt
Date: Thu 29 May 86 22:55:13-PDT
From: Richard Pattis <PATTIS@WASHINGTON.ARPA>
Subject: A New Modula-2 Text & A Modula-2 Problem
To: info-modula-2@rochester.arpa
Message-Id: <12210732185.7.PATTIS@WASHINGTON.ARPA>


A New Modula-2 Book:

  I recently received a copy of a new Modula-2 book.  It was written by Arthur
Sale of the Univeristy of Tasmania in Australia, and it is titled "Modula-2:
Discipline & Design".  It is published by Addison-Wesley and costs about $25.
I'll quote a section from the preface to show what I think is the flavor of
this book (all typos are my own):

    "My purpose in writing this book is to bring together a precise
     discussion of Modula-2 and its features and modern concepts of
     professional large program writing.  There are few books which are
     accessible to learning students which use Modula-2 at the present
     time, and this is an inhibiting factor in its use as a teaching
     language.  However, even more important in this book is the break
     with the traditional approaches to first language texts.  It is my
     experience over ten years that learning students can cope with
     concepts of considerable abstraction, such as predicate transformers
     and invariants.  Not all the implications are realized by students,
     but the concepts can be planted and used.  The best students progress
     much faster, but even average students see the ideas which are presented
     as a result of modern computer science research as converting the dark
     art of programming into something which is understandable.  The era of
     learning programming by osmosis, which has persisted for far too long,
     is ending, and programming has become a subject in which design and
     discipline reign and can be taught"


A Modula-2 Problem:

  In Sale's discussion of strings, he decides to store each string padded with
0Cs (not just one, but 0Cs all the way up thru the last element in the array).
Of course, if the string fills the array, there is no padding at all.  Also,
note that the string terminator, 0C, compares less than any other character.
Given any two parameter strings that follow this convention, Sale writes the
following function to compare them (I've tried to transcribe the code directly
from this introductory text, using the same indentation and spacing):


  TYPE Order = (Before,Same,After);


  PROCEDURE CompareStrings (S1: ARRAY OF CHAR;  S2 : ARRAY OF CHAR) : Order;
  CONST
    NUL = 00C;  (* Terminating Marker *)
  VAR
    i : CARDINAL;
    Shortest : CARDINAL;
    Result : Order;
    ResultKnown : BOOLEAN;
  BEGIN
    Shortest:= Min(HIGH(S1),HIGH(S2));
    ResultKnown:= FALSE;
    i:= 0;
    WHILE NOT ResultKnown DO

       IF S1[i] < S2[i] THEN
            (* Found unmatched character *)
            Result:= Before;
	    ResultKnown:= TRUE;
       ELSIF S1[i] > S2[i] THEN
            (* Found unmatched character *)
            Result:= After;
	    ResultKnown:= TRUE;
       ELSE (* S1[i] = S2[i] *)
           IF S1[i] = NUL THEN
	        (*   Both strings same up to two NULs. *)
                Result:= Same;
		ResultKnown:= TRUE;
	   ELSE
	        (*   Task unresolved, must look at next component. *)
	       i:= i+1;
	       IF i > Shortest THEN
		      (*   Run out of one string's components. *)
	       	      IF HIGH(S1) = HIGH(S2) THEN (* Both together. *)
		      	   Result:= Same;
	       	      ELSIF Shortest = HIGH(S1) THEN (* S1 shorter *)
		      	   Result:= Before;
	       	      ELSE (* S2 shorter *)
		      	   Result:= After;
		      END;
		      ResultKnown:= TRUE;
	       END;
	   END;
       END (* of IF comparing S[i]'s *);

    END (* of WHILE *);
    RETURN Result;
END CompareStrings;


(1) I assert, that this function computes wrong answers.  In what cases does
    it do so?  Be as general as possible and remember the assumptions above.

(2) Describe how we could rewrite this function to correct this deficiency.

(3) Given the same assumptions, completely rewrite this function to be as
    simple as possible. It should also execute as quickly as possible.

(4) Now, remove the padding assumption; instead, assume that only one 0C is
    placed at the end of each string that doesn't fill its array.  Rewrite
    this function to work correctly with strings following this convention.

(5) What interesting conclusions can you draw from this exercise?

  For parts 3 & 4, use your best programming style.  Write these function as
clearly as possible.


  Please send your answers to me, PATTIS@WASHINGTON.  After you send me your
analysis, I'll send you mine.  I'll also keep a compendium of the analyses
that I receive, and send it out to the mailing list when responses stop.  If
you don't want your analysis shown to anyone, or if you don't want your name
attached to it, please tell me when you mail in your analysis.  If there is
enough interest in this exercise, I'll construct a similar one.


Rich
-------
30-May-86 16:40:58-PDT,3880;000000000001
Return-Path: <tomr@src.DEC.COM>
Received: from src.DEC.COM by WASHINGTON.ARPA with TCP; Fri 30 May 86 16:40:43-PDT
Received: from clark.DEC.COM (clark) by src.DEC.COM (4.22.05/4.7.34)
	id AA21224; Fri, 30 May 86 15:59:36 pdt
Received: by clark.DEC.COM (4.22.05/4.7.34)
	id AA26069; Fri, 30 May 86 16:02:04 pdt
From: tomr@src.DEC.COM (Tom Rodeheffer)
Message-Id: <8605302302.AA26069@clark.DEC.COM>
Date: 30 May 1986 1601-PDT (Friday)
To: Richard Pattis <PATTIS@WASHINGTON.ARPA>
Cc: tomr@src.DEC.COM
Subject: Re: A New Modula-2 Text & A Modula-2 Problem
In-Reply-To: Your message of Thu 29 May 86 22:55:13-PDT.
             <12210732185.7.PATTIS@WASHINGTON.ARPA>

In your summary of Sale's definition of strings, you say that a string
is padded out with as many 0C characters as are necessary to fill the
array in which the string is stored.  You do not explicitly say whether
or not 0C itself can also be a data character in a string; however, the
phrase "string terminator" implies that it cannot.  Of course, "string
terminator" itself is a misnomer, because if no padding is needed then
no string terminator will be present.  The fact that strings must be
padded until the end of the array argues for the view that 0C could be
a legal data character; otherwise only one 0C would be necessary to
terminate the string and the rest would be superfluous.  Under this
model, the universe of strings would consist of exactly those infinite
length strings that contained only 0C after some point.  Of course, the
extra padding 0C characters may be superfluous anyway, and Sale may
just be overspecifying the storage representation.  I assume that 0C is
not a legal string data character.

Sale's algorithm fails when two equal strings are stored in arrays of
different sizes, one array being exactly the length of the string, so
that no padding is required, the other being longer, so that at least
one padding character is present.  Sale's algorithm will say that the
string stored in the shorter array compares before the other.  The
problem occurs when the algorithm reaches the end of the shorter string
without finding a difference or a string terminator.  The algorithm
neglects to check if the longer string has a string terminator in the
next character position.

The clearest correct algorithm is as follows:

    TYPE Order = (Before,Same,After);

    PROCEDURE CompareStrings (s1,s2 : ARRAY OF CHAR) : Order;
    VAR
        i  : CARDINAL;
        c1 : CHAR;
	c2 : CHAR;
    BEGIN
        i := 0;
	WHILE TRUE DO
            IF i <= HIGH(s1) THEN c1 := s1[i] ELSE c1 := 0C END;
            IF i <= HIGH(s2) THEN c2 := s2[i] ELSE c2 := 0C END;
            IF c1 < c2 THEN RETURN Before END;
	    IF c1 > c2 THEN RETURN After END;
            IF c1 = 0C THEN RETURN Same END;
            i := i + 1;
	END;
    END CompareStrings;

This algorithm views each array as if it were followed by an endless
source of string terminators.  Thus, under this view, strings always
end with a string terminator.  The terminator will come either from the
array or from the view.  Given these conditions, it is easy to compare
the strings.

The algorithm may be made more efficient by writing one section of code
to handle character positions that occur in both arrays, and another
section to handle the next character position, which does not occur in
both arrays.  However, this makes the algorithm more complicated and
harder to understand.

The algorithm as written will work correctly if the arrays always
contain string terminators.  If this condition is made an assertion,
the algorithm can be optimized for it by deleting the view processing.

It is interesting how easy it is to stumble over special cases, even
for people who work with invariants and other program proof techniques.

-Tom Rodeheffer
 2-Jun-86 10:38:02-PDT,5024;000000000001
Return-Path: <"brown jonathan%e.mfenet"@LLL-MFE.ARPA>
Received: from LLL-MFE.ARPA by WASHINGTON.ARPA with TCP; Mon 2 Jun 86 10:37:55-PDT
Date: Mon, 2 Jun 86 10:38 pst
From: "brown jonathan%e.mfenet"@LLL-MFE.ARPA
Subject: Re: A Modula-2 Problem 
To: pattis@washington.arpa

> A Modula-2 Problem:
>
>   In Sale's discussion of strings, he decides to store each string padded with
> 0Cs (not just one, but 0Cs all the way up thru the last element in the array).
> Of course, if the string fills the array, there is no padding at all.  Also,
> note that the string terminator, 0C, compares less than any other character.

I interpret this to mean that a string is composed of either the characters
in the array before the first 0C, or of the entire array if there is no 0C.
The character used for padding after the first 0C is irrelevant.

> Given any two parameter strings that follow this convention, Sale writes the
> following function to compare them (I've tried to transcribe the code directly
> from this introductory text, using the same indentation and spacing):
>
>            (Sale's function)
>
> (1) I assert, that this function computes wrong answers.  In what cases does
>     it do so?  Be as general as possible and remember the assumptions above.

Sale's function will return the wrong result when the following three
conditions arise:
     1) the two strings are identical
     2) the lengths of the arrays containing the strings differ
     3) the string in the shorter array uses up the entire array

> (2) Describe how we could rewrite this function to correct this deficiency.

After finding that all the characters up through Shortest are equal, and that
HIGH(S1) # HIGH(S2), test to see if the i'th (i = Shortest + 1) character in
the longer array is a 0C before hastily assuming that there is more to the
string.

> (3) Given the same assumptions, completely rewrite this function to be as
>     simple as possible. It should also execute as quickly as possible.
>
> (4) Now, remove the padding assumption; instead, assume that only one 0C is
>     placed at the end of each string that doesn't fill its array.  Rewrite
>     this function to work correctly with strings following this convention.

I don't think it matters what characters are padded into the array after the
string terminator.  Sale probably used 0C's because they are convenient.  So
the solutions to (3) and (4) are identical.  Let me know if I'm wrong.

For my solution, I'm going to be lazy and just fix the bug in Sale's function.
If there is a simpler, quicker routine, it is not obvious to me.


  TYPE Order = (Before,Same,After);


  PROCEDURE CompareStrings (S1 : ARRAY OF CHAR;  S2 : ARRAY OF CHAR) : Order;
  CONST
    NUL = 00C;  (* Terminating Marker *)
  VAR
    i : CARDINAL;
    Shortest : CARDINAL;
    Result : Order;
    ResultKnown : BOOLEAN;
  BEGIN
    Shortest:= Min(HIGH(S1),HIGH(S2));
    ResultKnown:= FALSE;
    i:= 0;
    WHILE NOT ResultKnown DO

       IF S1[i] < S2[i] THEN
            (* Found unmatched character *)
            Result:= Before;
            ResultKnown:= TRUE;
       ELSIF S1[i] > S2[i] THEN
            (* Found unmatched character *)
            Result:= After;
            ResultKnown:= TRUE;
       ELSE (* S1[i] = S2[i] *)
           IF S1[i] = NUL THEN
                (*   Both strings same up to two NULs. *)
                Result:= Same;
                ResultKnown:= TRUE;
           ELSE
                (*   Task unresolved, must look at next component. *)
               i:= i+1;
               IF i > Shortest THEN
                       (*   Run out of one string's components. *)
                       IF HIGH(S1) = HIGH(S2) THEN (* Both together. *)
                            Result:= Same;
                       ELSIF (Shortest = HIGH(S1)) & (S2[i] # NUL) THEN
                            (* S1 shorter, but strings not equal *)
                            Result:= Before;
                       ELSIF S1[i] # NUL THEN
                            (* S2 shorter, but strings not equal *)
                            Result:= After;
                       ELSE
                            Result:= Same;
                       END;
                       ResultKnown:= TRUE;
               END;
           END;
       END (* of IF comparing S[i]'s *);

    END (* of WHILE *)
    RETURN Result;
END CompareStrings;


> (5) What interesting conclusions can you draw from this exercise?

I think you are incorrectly assuming that Sale intended the padding
characters to be significant.  I have no reason to believe this from
what you have told us.

There is a bug in Sale's function that results in an erroneous value
returned in certain (probably rare) circumstances.

My solution looks bug-free to me, but I haven't tested it so I won't
swear by it.

                        Jonathan Brown
                   brown#jon%mfe@lll-mfe.arpa
 3-Jun-86 14:13:42-PDT,3558;000000000001
Return-Path: <mcvax!daimi!erja@seismo.CSS.GOV>
Received: from seismo.CSS.GOV by WASHINGTON.ARPA with TCP; Tue 3 Jun 86 14:13:09-PDT
Return-Path: <mcvax!daimi!erja>
Received: from mcvax.UUCP by seismo.CSS.GOV with UUCP; Tue, 3 Jun 86 16:10:34 EDT
Received: by mcvax.uucp; Tue, 3 Jun 86 20:43:55 +0200 (MET)
Received: by diku.uucp; Tue, 3 Jun 86 20:21:20 -0100 (MET)
Received: by daimi.uucp (2.0/SMI-2.0)
	id AA03148; Tue, 3 Jun 86 19:43:11 -0200
Date: Tue, 3 Jun 86 19:43:11 -0200
From: mcvax!daimi!erja@seismo.CSS.GOV (Erik Jacobsen)
Message-Id: <8606031743.AA03148@daimi.uucp>
To: PATTIS@WASHINGTON.ARPA

Dear Rich.
I found your little MODULA-2 problem/exercise interesting. I am about
to write (part of) a system in MODULA-2, and working through this
exercise has been my first encounter with MODULA-2.

(1,2): The problem is that HIGH(...) is the declared length of the
       ARRAY, not the actual length of the string. One way to patch
it could be to start finding the actual length of each string, and
use these lengths instead of HIGH(...) - but perhabs more is wrong.

(3,4): Rewriting the function to be simpler, I first observe that we
       have two criteria to indicate the length of a string - either
the ARRAY is filled or it is padded with NULs. I would prefer a simpler
representation, e.g. that there always will be a NUL at the end of a
string. One way to accomplish this can be to look at a string through
this function:

    PROCEDURE CharPos(S: ARRAY OF CHAR; Pos: CARDINAL): CHAR;
    BEGIN
      IF Pos > HIGH(S) THEN
        RETURN NUL
      ELSE
        RETURN S[Pos]
      END
    END;

Rewriting CompareStrings as it stands is fairly simple. Use CharPos(Sx,i)
instead of Sx[i] - or declare som CHAR-variables. As you will always end
up with NUL, you can erase the code following "IF i>shortest". CompareStrings
still looks funny though - it's not easy to check the correctness. My way
of doing it is:

    i := 0;
    REPEAT
      Ch1 := CharPos(S1,i); Ch2 := CharPos(S2,i);
      i := i + 1
    UNTIL (Ch1 <> Ch2) OR (Ch1 = NUL);

    IF    Ch1 < Ch2 THEN
      RETURN Before
    ELSIF Ch1 > Ch2 THEN
      RETURN After
    ELSE
      RETURN Same
    END;

I have only used the assumption, that there is at least one NUL at the end
of the string, and I haven't considered efficiency (and I usually don't care).

5) Conclusions? Well, it's (often) amusing finding an error like this. Too
   bad it turned up in a textbook on programming!

Referring to Kernighan and Plauger: The Elements of Programming Style, you
could say that the author has violated the following rules:

  "Write clearly - don't be to clever".
  "Write clearly - don't sacrifice for efficiency".
      Whatever the reasons were, the original function wasn't written clearly.
  "Choose a datarepresentation that makes the program simple".
  "Terminate input by end-of-file or marker - not by count".
      Having two conditions for ending a string will make a program complicated.
  "Don't patch bad code - rewrite it".
      As was suggested in 3) and 4).
  A rule of my own: "A simple problem must have a simple solution",
      so keep rewriting your code until it doesn't look more complicated
      than the problem.

-----------------------------------------------------------------------------
      Erik Jacobsen, Computer Science Department, University of Aarhus,
               Ny Munkegade, DK-8000 Aarhus C, Denmark.
                        ...diku!daimi!erja


 5-Jun-86 11:31:27-PDT,3339;000000000001
Return-Path: <mcvax!ethz!owf!zjwa@seismo.CSS.GOV>
Received: from seismo.CSS.GOV by WASHINGTON.ARPA with TCP; Thu 5 Jun 86 11:31:09-PDT
Return-Path: <mcvax!ethz!owf!zjwa>
Received: from mcvax.UUCP by seismo.CSS.GOV with UUCP; Thu, 5 Jun 86 13:55:40 EDT
Received: by mcvax.uucp; Thu, 5 Jun 86 12:52:51 +0200 (MET)
Received: by cernvax.UUCP (4.12/4.7)
	id AA03419; Thu, 5 Jun 86 12:05:34 +0200
Received: from owf by vlsivax.ethz; Thu, 5 Jun 86 11:30:59 -0200
Received: by owf.uucp (2.0/SMI-2.0)
	id AA00915; Thu, 5 Jun 86 11:25:18 -0200
Date: Thu, 5 Jun 86 11:25:18 -0200
From: mcvax!cernvax!owf!zjwa@seismo.CSS.GOV (Juerg Wanner)
Message-Id: <8606050925.AA00915@owf.uucp>
To: pattis@WASHINGTON.ARPA
Subject: Re: Modula-2 problem



PROCEDURE CompareStrings...

Not only will this procedure produce wrong results, but eventually the
program using this procedure will crash due to a runtime error (index i
can get larger than Min(HIGH(S1), HIGH(S2)) and so the procedure attempts
to access a non-existing array element. A proper modula-2 implementation
will crash there (unless run-time checks have been turned off).

A faster procedure:

  TYPE
    Comparison = (less, equal, greater);

  PROCEDURE CompareStrings(VAR s1, s2 : ARRAY OF CHAR) : Comparison;
  VAR i, high : CARDINAL;
  BEGIN
    i := 0;
    high := Min(HIGH(s1), HIGH(s2));
    WHILE (i < high) & (s1[i] = s2[i]) & (s1[i] # 0C) & (s2[i] # 0C) DO
      INC(i);
    END;
    IF    s1[i] < s2[i] THEN RETURN less;
    ELSIF s1[i] > s2[i] THEN RETURN greater;
    ELSIF HIGH(s1) = HIGH(s2) THEN RETURN equal;
    ELSIF high = HIGH(s1) THEN
      IF s2[i+1] = 0C THEN RETURN equal;
      ELSE RETURN less;
      END;
    ELSE
      IF s1[i+1] = 0C THEN RETURN equal;
      ELSE RETURN greater;
      END;
    END;
  END CompareStrings;

The parameters are declared as VAR parameters because of speed. There is
no need to copy the strings in order to do a comparison. Of course this
is bad programming style.

A still faster version can be obtained by eliminating array access. Great
care has to be taken when implementing this because it is machine
dependant. The machine has to use byte-addressing (the Lilith doesn't):

  TYPE
    Dirty = RECORD
	      CASE : BOOLEAN OF
	      | FALSE : a  : ADDRESS;
	      | TRUE  : ch : POINTER TO CHAR;
	      END;
	    END;


  PROCEDURE CompareStrings(VAR s1, s2 : ARRAY OF CHAR) : Comparison;
  VAR i, high : CARDINAL;
      d1, d2 : Dirty;
  BEGIN
    i := 0;
    d1.a := ADR(s1);
    d2.a := ADR(s2);
    high := Min(HIGH(s1), HIGH(s2));
    WHILE (i < high) & (d1.ch^ = d2.ch^) & (d1.ch^ # 0C) & (d2.ch^ # 0C) DO
      INC(d1.a);
      INC(d2.a);
    END;
    IF    d1.ch^ < d2.ch^ THEN RETURN less;
    ELSIF d1.ch^ > d2.ch^ THEN RETURN greater;
    ELSIF HIGH(s1) = HIGH(s2) THEN RETURN equal;
    ELSIF high = HIGH(s1) THEN
      INC(d2.a);
      IF d2.ch^ = 0C THEN RETURN equal;
      ELSE RETURN less;
      END;
    ELSE
      INC(d1.a);
      IF d1.ch^ = 0C THEN RETURN equal;
      ELSE RETURN greater;
      END;
    END;
  END CompareStrings;


I hope this is of some use to you.

				      Regards

				      Juerg Wanner
				      OWF AG
				      Haldeneggsteig 5
				      CH - 8006 Zuerich
				      Switzerland
 4-Jun-86 17:52:03-PDT,5673;000000000001
Return-Path: <fluke!tikal!bobc@SUN.COM>
Received: from sun.com ([192.5.38.130].#Internet) by WASHINGTON.ARPA with TCP; Wed 4 Jun 86 17:51:55-PDT
Received: from sun.uucp by sun.com (3.2/SMI-3.0)
	id AA24092; Wed, 4 Jun 86 17:47:22 PDT
Received: by sun.uucp (1.1/SMI-3.0)
	id AA18530; Wed, 4 Jun 86 17:51:43 PDT
Received: by vax4.fluke.UUCP (4.12/5.1.1.1); Wed, 4 Jun 86 16:42:53 pdt
Received: by tikal.Teltone (4.12/4.7)
	id AA19132; Wed, 4 Jun 86 16:00:20 pdt
Date: Wed, 4 Jun 86 16:00:20 pdt
From: fluke!tikal!bobc@SUN.COM (Bob Campbell)
Message-Id: <8606042300.AA19132@tikal.Teltone>
To: fluke!sun!ucbvax!WASHINGTON.ARPA!PATTIS@SUN.COM
Subject: Re: A New Modula-2 Text & A Modula-2 Problem
Newsgroups: net.lang.mod2
In-Reply-To: <12210732185.7.PATTIS@WASHINGTON.ARPA>
Organization: Teltone Corp., Kirkland, WA
Cc: 

(1) I assert, that this function computes wrong answers.  In what cases does
    it do so?  Be as general as possible and remember the assumptions above.

    The answer that I have is two strings which are the same but one
    fills up the full string and the second is shorter then a full
    array when this happens the string which fills up the whole array
    will be less then the one which does not.
    
    VAR
	str: ARRAY[0..3] OF CHAR;
    BEGIN
	str := 'Hi!!'; str[3] := 0C;
	IF (CompareStrings('Hi!',y) <> Same) THEN
	    WriteString('Failed')
	END;
    END ....

(2) Describe how we could rewrite this function to correct this deficiency.

    Change IF i > Shortest THEN.....END; to be

    IF i > Shortest THEN
	(* Run out of one string's components. *)
	IF HIGH(S1) = HIGH(S2) THEN (* Both together. *)
	    Result := Same
	ELSIF Shortest = HIGH(S1) THEN (* S1 shorter *)
	    IF (S2[i] = NUL) THEN
		Result := Same
	    ELSE
		Result := Before
	    END
	ELSE	(* S2 shorter *)
	    Result := After
	END;
	ResultKnown := TRUE
    END
    
    In other words check for the null and return Same if the null is
    present.

(3) Given the same assumptions, completely rewrite this function to be as
    simple as possible. It should also execute as quickly as possible.

    PROCEDURE CompareStrings(S1,S2:ARRAY OF CHAR) : Order;
    CONST
	NUL = 0C; (* Terminating Marker *)
    VAR
	i : CARDINAL;
	Shortest : CARDINAL;
    BEGIN
	Shortest := Min(HIGH(S1),HIGH(S2));
	i := 0;
	(* As long as the strings are the same check  and the end of
	 * neither of the strings have been reached increment to the
	 * next.
	 *)
	WHILE i <= Shortest AND S1[i] = S2[i] DO
	    i := i + 1
	END;
	IF (i > Shortest) THEN
	    (* We are at the end of one of the strings *)
	    IF HIGH(S1) = HIGH(S2) THEN
		(* Strings are the Same length *)
		RETURN(Same)
	    ELSIF HIGH(S2) > Shortest THEN
		(* S2 is longer *)
		IF S2[i] = NUL THEN
		    (* Both strings terminate at the same place *)
		    RETURN(Same)
		ELSE
		    RETURN(Before)
		END
	    ELSE
		(* S1 is longer *)
		IF S1[i] = NUL THEN
		    (* Both strings terminate at the same place *)
		    RETURN(Same)
		ELSE
		    RETURN(After)
		END
	    END
	(* ASSERT at this else S1 can't equal S2 *)
	ELSIF S1[i] < S2[i] THEN
	    RETURN(After)
	ELSE
	    RETURN(Before)
	END
    END CompareStrings;

    This could be a little faster by using a loop statement for the
    loop.

    i := 0;
    LOOP
	IF (S1[i] # S2[i]) THEN EXIT END;
	i := i + 1;
	IF (i > Smaller) THEN EXIT END
    END; (* LOOP *)

    This however only saves one comparision (for i = 0), and is not a
    very large savings considering the while statement is a little
    clearer.

(4) Now, remove the padding assumption; instead, assume that only one
    0C is placed at the end of each string that doesn't fill its
    array.  Rewrite this function to work correctly with strings
    following this convention.

    Same code as above but with a check for a null terminator.

    PROCEDURE CompareStrings(S1,S2:ARRAY OF CHAR) : Order;
    CONST
	NUL = 0C; (* Terminating Marker *)
    VAR
	i : CARDINAL;
	Shortest : CARDINAL;
    BEGIN
	Shortest := Min(HIGH(S1),HIGH(S2));
	i := 0;
	(* As long as the strings are the same check  and the end of
	 * neither of the strings have been reached increment to the
	 * next.
	 *)
	WHILE i <= Shortest AND S1[i] = S2[i] AND S1[i] # NUL DO
	    i := i + 1
	END;
	IF (i > Shortest) THEN
	    (* We are at the end of one of the strings *)
	    IF HIGH(S1) = HIGH(S2) THEN
		(* Strings are the Same length *)
		RETURN(Same)
	    ELSIF HIGH(S2) > Shortest THEN
		(* S2 is longer *)
		IF S2[i] = NUL THEN
		    (* Both strings terminate at the same place *)
		    RETURN(Same)
		ELSE
		    RETURN(Before)
		END
	    ELSE
		(* S1 is longer *)
		IF S1[i] = NUL THEN
		    (* Both strings terminate at the same place *)
		    RETURN(Same)
		ELSE
		    RETURN(After)
		END
	    END
	(* ASSERT at this else S1 can't equal S2 or both = NUL *)
	ELSIF S1[i] < S2[i] THEN
	    RETURN(After)
	ELSIF S1[i] > S2[i] THEN
	    RETURN(Before)
	ELSE
	    RETURN(Same)
	END
    END CompareStrings;


(5) What interesting conclusions can you draw from this exercise?

    If in most cases the terminating condition would be the end of the
    string the full padding assumption would make faster code as there
    would only be two comparisions per character until they don't
    match but the null terminated method requires three comparisions
    per character.

Bob Campbell
Teltone Corporation		18520 - 66th AVE NE
P.O. Box 657			Seattle, WA 98155
Kirkland, WA 98033

{amc,dataio,fluke,hplsla,sunup,uw-beaver}!tikal!bobc
-------