[comp.lang.prolog] Seen in a time-tabling program

ok@quintus.UUCP (Richard A. O'Keefe) (02/05/88)
A couple of days ago, I saw the source code of a Prolog program
for constructing school time-tables.  While I have found enough
bad things in it to keep me going for weeks, I should also tell
you that the program apparently solves the problem its author's
interested in, on the machine he has, fast enough to be useful,
and is acceptable to the people who have to use it.

For this message, I'll confine myself to discussing just one of
the predicates.

Each day has 8 periods.  Teaching sessions come in two sizes: a
session containing one period, and a session having two periods
in succession.  The sessions are enumerated in a table

	tim(a,  1).
	tim(b,  1).
	...
	tim(h,  1).
	tim(ab, 2).
	tim(bc, 2).
	...
	tim(gh, 2).

This is not the predicate I'm interested in.  The author had an
extreme fondness for three-letter names, an equally excessive
fondness for one-letter variable names, and an aversion to comments.
As a result, I am still trying to work out what bus_cla_gro/4 does.

One of the more frequent operations in the program is determining
whether two sessions overlap.  The program contains the following
predicate to make this test:

	%   ove(+Session1, +Session2)
	%   is true when Session1 and Session2 (known to be names of
	%   sessions) have a period in common.  This is determined
	%   by checking whether their names have any letter in common.

	ove(A, B) :-
		name(A, AL),
		name(B, BL),
		member(X, AL),
		member(X, BL).

The comment there is mine:  the actual program has no such comment.

Now I don't know about you, but I can't like this.
    --	putting program-visible information in the names of atoms
	is, to say the least of it, odd.
    --	it doesn't generalise to multi-letter period names.
    --	it is inefficient in itself
    --	it causes the creation of structure (the lists AL, BL)
	which are of no real interest
    --	it can only be used to test given atoms, not to enumerate
	overlapping sessions
    --	it leaves a choice-point (actually, two!) behind, thus
	making the rest of the program less efficient.

How else could we do this?

Good relational data-base design suggests that for each entity
type <foo> there should be a relation foo(A_Foo, ~attributes~)
listing all the foos and their attributes.  (I have worded this
carefully.  Check "entity" in a data-base textbook.)
We would thus expect to see a table somewhere like

	session(a, ~something~).
	...
	session(gh, ~something~).

tim/2 is just such a table, except that it hasn't got the information
we need.

The first approach that suggests itself is to use bit-masks.
We obtain a table such as

	session_bits(a,  2'10000000).
	session_bits(b,  2'01000000).
	...
	session_bits(h,  2'00000001).
	session_bits(ab, 2'11000000).
	session_bits(bc, 2'01100000).
	...
	session_bits(gh, 2'00000011).

	overlapping_sessions(Session1, Session2) :-
		session_bits(Session1, Bits1),
		session_bits(Session2, Bits2),
		Bits1 /\ Bits2 =\= 0.

This is fully relational, doesn't leave any extra choice points around,
and is *much* faster.  But there are snags:
    --	some Prologs do not support the "Edinburgh Standard"
	bitwise operators /\ (logand), \/ (logor), \ (lognot).
    --	even those that do often have some smallish limit on integers,
	e.g. 14 bits (PDP-11 Prolog), 16 bits (some PC Prologs),
	18 bits (DEC-10 Prolog), 28 or 29 bits (many modern Prologs),
	so this solution would not work if we tried to treat periods
	in different days as different periods (we'd need 8*5=40 bits).
    --	it is "overkill" to use a bit-mask when the only possibilities
	are 0--010--0 and 0--0110--0.
    --	it is hard to extract the "size" information from bitsets.

In connexion with this last point, here is a function which computes
the number of bits in a bit-mask.  It is word-size independent.

	%   bit_count(+Mask, ?Count)
	%   is true when Mask is instantiated to a non-negative
	%   integer and Count is the number of "1" bits in the
	%   binary representation of Mask.

	bit_count(Mask, Count) :-
		integer(Mask),
		Mask >= 0,
		bit_count(Mask, 0, Count).

	bit_count(Mask, Count0, Count) :-
		(   Mask =:= 0 -> Count is Count0
		;   Next is (Mask-1) /\ Mask,
		    Count1 is Count0+1,
		    bit_count(Next, Count1, Count)
		).

To return to our subject:  teaching sessions aren't just any sort of
set of periods, they are non-empty *intervals* of periods.  So we can
represent a session by a pair of numbers: the first period of the
session and the last period of the session.  We obtain

	%   session(Session, First, Last)
	%   is true when Session is an atom naming a teaching session,
	%   First is an integer representing the first period of the
	%   session, and Last is an integer representing the last
	%   period of the session.  The eight periods are represented
	%   by the numbers 1 to 8.

	session(a,  1, 1).
	session(b,  2, 2).
	...
	session(h,  8, 8).
	session(ab, 1, 2).
	session(bc, 2, 3).
	...
	session(gh, 7, 8).

	%   session_length(Session, Length)
	%   is true when Length is the number of periods in Session.
	%   The original program called this tim/2.

	session_length(Session, Length) :-
		session(Session, First, Last),
		Length is Last-First+1.

	%   overlapping_session(?Session, +First, +Last)
	%   is true when Session is the name of a session which
	%   overlaps the interval [First,Last] of periods.

	overlapping_session(Session, GivenFirst, GivenLast) :-
		session(Session, SessionFirst, SessionLast),
		SessionFirst =< GivenLast,
		GivenFirst =< SessionLast.

	%   overlapping_sessions(Session1, Session2)
	%   is true when Session1 and Session2 are names of sessions
	%   having at least one period in common.

	overlapping_sessions(Session1, Session2) :-
		session(Session1, First, Last),
		overlapping_session(Session2, First, Last).

This will work for any representable number of periods.  Not only that,
but as we just saw with tim/2, other information needed by the program
can be derived quite simply from this table.  As a further test, there
is another variant of tim/2 in the original program:

	tim_ava(D, T, U) :-
		recorded(day, day(D), _),
		tim(T, U),
		\+ fri_123(D, T, U).

	fri_123(f, a, 1).
	fri_123(f, b, 1).
	fri_123(f, c, 1).
	fri_123(f, ab, 2).
	fri_123(f, bc, 2).
	fri_123(f, cd, 2).

I am not making this up!  I couldn't!  day->day/1 is a table of one-
letter day names.  It starts out in some funny order, and keeps
being shuffled.  I have no idea whether days can ever be dropped
from the table.  I can't replace it by a fixed table for this
predicate, because in some parts of the program the order matters,
and there is no indication as to whether this is one of them.
Note that since T determines U, we don't really need to pass U back.
We arrive at

	tim_ava(Day, Session) :-
		recorded(day, day(Day), _),
		day_periods(Day, First, Last),
		overlapping_session(Session, First, Last).

	day_periods(m, 1, 8).
	day_periods(t, 1, 8).
	day_periods(w, 1, 8).
	day_periods(h, 1, 8).
	day_periods(f, 1, 3).

My first draft of this version tested Day against the literal "f"
and FirstS against the literal 3, but that would have been a bad idea.
The day_periods/3 table separates the bounds out cleanly and
declaratively.

Enough of hygiene.  What's the payoff?  In Quintus Prolog, if we
scale the bit-mask version to 1.0,
	the "pure" version is 1.3 times slower
	the "name" version is 19 times slower.

Since the pure version is so much cleaner, and simplifies other
parts of the program, and is only 1.3 times slower, that's the one
I'd pick.