[net.micro] arrgh: repeat again: Zobkoff data compression message.

BILLW@SRI-KL.ARPA (07/01/84)
From:  William Chops Westfield <BILLW@SRI-KL.ARPA>

If this whole thing doesnt make it, Im going to give up.  You can
FTP the description from SRI-KL::<BILLW>CONCEPT.DATA-COMPRESSION
using anonymous login.  perhaps INFO-MICRO-REQUEST will FTP it
over to BRL and send it locally...  We should be running new
net software Real Soon Now...
--------------------------------------------------------------------
Date:  5 Aug 1981 2251-EDT
From:  (Leonard N Zubkoff) <Zubkoff at CMU-20C>
Subject: New Concept-100/104 Software

This is a general announcement to the ArpaNet community of an alternate
set of software that may become available for the Concept line of
terminals made by Human Designed Systems (HDS).

For the past several months, I have been engaged in a personal project
to rewrite the Concept software in order to provide a level of
functionality more in keeping with the needs and capabilities of the
Computer Science community.  While the software is not yet completely
written, it has been operational for the last 3 months and is in use in
several terminals here at Carnegie-Mellon University.

Since we at CMU have copies of the original HDS software under a
non-disclosure agreement, I am not at liberty to distribute the software
I have written beyond CMU.  I have been in contact with officials of
HDS, and they have shown interest in making this available to the
Concept user community if there is sufficient demand.

The purpose of this message is two-fold.  First, I want to solicit
comments about the software I've designed in order that I may
incorporate other features that I may have overlooked.  Second, I need
to determine whether there is a demand in the Concept user community
either to upgrade existing terminals or purchase new ones with my
software as opposed to the standard software supplied by HDS.

Let me begin by giving a brief description of the goals I set for the
software.  The software was designed explicitly for the type of
environment we have now and are developing at CMU.  At present, screen
oriented editing with Tops-20 Emacs/Tops-10 Fine/Unix Emacs is the norm
and terminals are used both at home over dialup lines and in offices at
1200 baud and 9600 baud.  In the future, Spice (personal) machines will
be the dominant resource in the department and terminals like the
Concept will have little use except as a home terminal with which to
call up one's Spice machine.  Thus the dominant use of the terminal
where sophisticated capabilities are required is in the area of screen
management by an editor.  In addition, the likelihood that most of these
terminals will ultimately be used from home over a 300 or 1200 baud
dialup line has made the issue of efficient screen management extremely
critical.  All office terminals will be supported at 9600 baud in the
near future.  Thus the issues of efficient screen management and data
transmission to the terminal emerged as the most critical areas
deficient in most terminals available commercially, and it is to
optimize the Concept terminal in these respects that the greatest part
of the development of my software has been devoted.  Since we already
have a great deal of software that supports Concept terminals, it was
also necessary that my software be able to emulate a Concept with HDS
software to such a degree that the normal Emacs, Fine, BBoard, and other
programs would operate properly without change.

In order to utilize the unique features of my software, and to gain
information about how it is performing, I have written a screen
management program that I use to run Tops-20 Emacs.  This program is
invoked like Emacs but crosspatches the terminal through to an Emacs
running as a subfork.  In general, support for the new terminal software
should be placed in the editor itself, but the nature of the
implementation of Emacs/Teco is such as to preclude doing this directly
without a great deal of work.  Emacs itself provides a very poor screen
management facility; it is not smart about making optimal use of the
primitives available in a terminal to cut down on the number of
characters sent to update the screen.  My program maintains screen
images representing the actual state of the screen and the desired state
of the screen, and attempts to perform an near-optimal transform from
the actual to desired states at appropriate intervals.  In order to
measure the improvement in screen update time, the program keep counts
of both the number of characters that Emacs sent to the program (which
would have gone to the terminal if it were not for the screen management
algorithms) and the number of characters that the program was required
to send to the terminal in order to effect the same resulting screen
image.  I term the ratio between these two numbers the compression ratio
achieved by the screen management program.  It represents a very real
measure of the actual speedup in screen redisplay provided by the
program.  In actual use, this program now achieves compression ratios
that are typically in the range from 2.5 to 4.0.  In paging through a
typical textual file (such as a Scribe manuscript file), the compression
ratio is usually between 2.5 and 3.0.  In paging through some Bliss
programs, I have achieved ratios of 4.5.  Thus in editing Bliss code
from home over a 1200 baud modem, I regularly get an effective
transmission rate to the terminal well in excess of 3600 baud.  In
addition, the new terminal software is written to enable a full screen
update to be performed at 9600 baud with no padding whatsoever, even
when achieving compression ratios of 4 to 1.  Thus the new software may
be used efficiently either locally over a high speed line or remotely
over a lower speed modem without change to the program driving the
terminal.

The following sections describe orthogonal ideas which are all (to
varying degrees) present in the current terminal: virtual terminals,
screen management support, and data compression.  Unfortunately, the
implementation of these ideas in the current software makes it very
difficult for them to be exploited.

Virtual terminals

The terminal as a whole is composed of one or more virtual terminals,
each possessing the state one would normally associate with a physical
display terminal.  A redisplay process handles the mapping of virtual
terminal screen images to non-overlapping rectangular windows on the
screen whenever the terminal is not otherwise occupied in processing
keyboard or communication line input (the currently running version only
supports a single virtual Terminal, but this should change shortly).  A
virtual terminal has a fixed number of lines and columns, independent of
the number of lines and columns actually being displayed on the screen
at any given time; the user may select whether a window narrower than
the virtual terminal it displays is to displays a truncated line, or
wraps the logical line onto the next physical line.  Within each virtual
terminal, there are four contexts.  Contexts provide the means for
switching quickly between radically different states of terminal
operation, without the high overhead of sending commands to effect all
the individual changes, and allow for a program to use the terminal
without interfering with the user's preferred settings of parameters.
At any time, the input stream is connected to exactly one of these
contexts.  A context contains the information describing where and how a
received character is to be displayed: logical cursor position; mark
position (the mark is a saved cursor position, and is a familiar idea to
EMACS users); character set; video attributes; insert, overwrite, or
overstrike mode; the width of fixed tab-stop settings; and the current
region top and region line count (a region is a horizontal band the full
width of the virtual terminal to which all operations are limited; it is
identical to the HDS notion of window if the window left and window
columns parameters are restricted to be 0 and 80, respectively).
Switching between contexts may be done either in a push/pop style or may
be done by explicitly naming the context to be connected to. In
addition, one may connect to a context specifying that the old context
is to be used to initialize the new one.  Thus, for example, a user may
be handling normal typein to the system in wrap mode (ala ITS wrap mode,
but with a blank line kept between old and new text at all times), can
enter my screen management program which changes various modes, and then
exit my program to be returned to the context to which he was previously
connected leaving the terminal again in wrap mode.

Screen management

Some of the screen management support is inherent in the proper
implementation of virtual terminals.  When moving from one virtual
terminal to another, or one context to another, it is not necessary to
send dozens of bytes of control information to the terminal to establish
the new set of parameters.  However, the compression achieved by this
technique does not apply to the sort of screen management done by screen
editors like EMACS, where one remains in one context in one virtual
terminal during the editing session.

In order to achieve the efficient screen management and data compression
described above, several techniques have been used.  

Eight bit transmission to the terminal is used so that the most commonly
needed screen management commands may be invoked with a single byte to
specify the type of command followed by whatever parameter bytes are
necessary.  The screen management program was heavily instrumented to
determine the types of operations which were most frequently needed, and
commands that minimize the total number of bytes to be sent to the
terminal have been defined.

Commands are provided with built-in repetition counts for two reasons:
it is poor practice to waste precious communication bandwidth sending
five commands when one can send a command with a parameter of five; it
also requires more processing time in the terminal to perform the five
individual operations than the single unified one.

Data compression

In order to speed display of text and programs still further, two token
dictionaries are present in the terminal.  A token, in this use, is an
all upper case, all lower case, or capitalized sequence of letters.  I
analyzed over 13 million tokens from textual-type files on CMUA and CMUC
and have stored a predefined dictionary of the most common 1024 tokens
in the terminal eproms.  When the screen manager determines that it
would be about to send one of these, it can send a command to the
terminal that specifies which token to display, the case to be used, and
whether to follow the token by a space.  These commands require two
bytes to send, thus saving a great number of characters for most uses of
the terminal.  In addition, the best 32 combinations of token number,
case, and spacing are directly displayable with a single byte command.
For example, "the " may be displayed by sending a single byte to the
terminal.  In order to handle the case of tokens not stored in the
static dictionary, there is a dynamic dictionary as well.  When token
parsing mode is enabled, the terminal will parse tokens out of the input
stream and will place them sequentially in an internal table containing
the last 256 tokens received.  The screen manager recognizes when it is
about to send a token that is already in the dynamic dictionary and can
request its display with a two byte command.  There is no transmission
overhead involved in this process since both the program and the
terminal parse the input stream and the program knows exactly what
tokens are in the terminal at all times.  This token parsing process
(static and dynamic) is a simple state machine and symbol table, but it
is responsible for a large percentage of the speedup in data
transmission rates attainable through the use of my software.

I apologize for the brevity  of the above description, and for the
overall length of this message.  In general, the software has been
designed to provide exactly the features that are most needed by our
type of environment; misfeatures have been avoided (I hope).  No
terminal supplied with a non-test version of my software has ever
crashed (requiring power-cycling, or loss of high voltage), nor can the
terminal be placed in a state where the operator does not have complete
control.  Those desiring to examine an actual list of the commands
implemented to date may ftp and peruse the files
[CMUC]<Zubkoff>Concept.mss and Concept.press.

Now I have several questions, both of a design nature and a logistics
one.  At this point, little is cast in concrete with regard to some
aspects of the design.  If this software is ever to extend beyond use at
CMU, now is the time when those who will be working with it may affect
the design.

(1) What should function keys do? Should they be programmable to send or
execute variable sequences as now, or should they send fixed character
sequences? In the best of all possible worlds, the operating system
would be capable of performing the translation from a special input code
to a user-defined string.  It does seem ridiculous to send a string to
the terminal just so that it can regularly send it back.  Is this
practical, or should the terminal conform to the world?

(2) Is transmitting part of the screen back useful?  The primitives
available in the terminal are generally dead wrong for any reasonable
notion of text editing.  Should this be provided at all?  Is printing
directly from the screen useful, or should hooks merely be provided to
allow the host to send text directly to an attached printer?

(3) Which pieces of status information are interesting enough to be
displayed on a status line or screen, and which are not of interest
except in bizarre cases?

(4) If a meta key is available, how is it best handled?  Setting the
high order bit of the input byte appears best.  Would this be acceptable
to most systems?

(5) Would transmission of packets with CRC to the terminal be beneficial
to cut the overall error rate over phone lines?

(6) Do you think that you would be interested in upgrading existing
Concept-100 and Concept-104 terminals to this level of functionality, or
would only consider purchasing new ones so equipped?  If there is
sufficient demand, I expect it would not require a great deal of work to
port my software to the newer hardware in the Concept-108.  Would $300
be a reasonable figure if HDS were to offer an upgrade kit for existing
Concepts?

(7) Assuming an upgrade is being considered, the terminal board must be
jumpered to accomodate 16k of dynamic ram (newer Concepts have this
already) and two jumpers installed to permit the replacement of the 2716
proms with 2732s.  Would you want to purchase a kit to perform this
in-house or would you feel that it was necessary to return the terminal
boards to HDS?

I ask these last two questions due to the fact that any release of this
software beyond CMU will have to come through HDS, and they must be
convinced that enough people in the computer science world care about
having it.  I shall be happy to receive any comments on the above
questions and will be glad to discuss my Concept software further with
anyone who is interested.  Please address all replies and questions to
Zubkoff @ CMUC.  Unless requested otherwise, I will make responses
publicly available.

					Leonard N Zubkoff
-------

-------