[comp.unix.shell] The space/time/effort triangle

martin@mwtech.UUCP (Martin Weitzel) (05/25/91)

In article <1991May17.211650.2025@ultra.com> marc@mercutio.ultra.com (Marc Kwiatkowski {Host Software-AIX}) writes:
[...]
>If you'll allow me to get on my soapbox, I'd like to campaign
>for sed.  sed's syntax is somewhat obtuse, but it takes far
>less time to load than awk.  Since most of shell-script
>execution time is spent creating pipes and filter-processes, this
>can be a real win.
[...]
In my years of practical programming I often found that there is a
general tradeoff between the following three goals:

	The TIME a program needs to complete its task.

	The SPACE a program occupies in memory while running.

	The EFFORT required to write, debug, and maintain a program.

(I also think that a further aspect can to a large degree be subsummed
under the last goal, that is: How well does the program lend itself as a
basic framework from which other programs may be derived in the future.)

When speaking about "optimization", I often sketch the above three goals
on a sheet of paper as follows:

		TIME  ---------------- SPACE
		    \                  /
	             \                /
	              \              /
	               \            /
	                \          /
	                 \        /
	                  \      /
	                   \    /
	                    \  /
	                   EFFORT

Now you can assume that any reasonable% program can be associated with
some location inside this triangle. "Optimizing" a program generally
means that you can come closer to at most two of the mentioned goals.
E.g. it is well known that you can trade space for time and vice versa;
further there are often more or less tricky algorithms that allow a
reduction of space AND time, but make the program more difficult to code
or later to understand by someone else.

%: If you like, you may even include "poorly written" programs in the
above picture. Those are located outside the triangel, i.e. you can make
them better with respect to ALL of the above goals.

Back to the topic: Using sed instead of awk is surely an option in
in many cases. Further I too sometimes get a little "carried away"
when I tried to write "real programs" (i.e. such that check conditions,
jump around, and exploit the hold area) with sed. But the price is
usually high in that you get fast away from the goal I name EFFORT:

	- You have to think harder to get your algorithm right. (Todays
	  programmers are used to think in "high level" constructs like
	  "if-else" and "while").

	- The program is more cryptic and hence more difficult to
	  debug, understand for someone else, and to reuse later.

	- Future requirements may be beyond the abilities of sed,
	  so you might have to rewrite it in awk later.

All this, of course, is no argument if you have a stable, debuged algorithm
which, when done in awk, dominates the runtime of some time critical or
often executed task. But how much faster this will be if you do it with
sed (compared to other possible restructurings of the total algorithm)
deserves some investigations in advance.

On my system (80386 with ISC UNIX) I've just tried and found that the
difference in startup time is about 15 to 20% more for awk than for sed.
The interesting point here is, that any already running other awk process
will result in about the same speedup, because mapping the running code
segment of some other process is somewhat faster the execing from disk
(or rather out of the disk cache). Of course (and to be fair) I should
say that any concurrently running sed process also speeds up starting
sed, so sed wins again.

Whether some sed or awk program that is NOT dominated by its startup
time is faster as the other, IMHO totally depends on the algorithm,
though I'd say I'd not be astonished if sed were about 50% faster
for simple tasks. (I just tried a few *very* simple things which seem
to support these estimations - at least on my system. Maybe it's the
burden for awk to split every input line into fields which makes this
big difference.)

To summarize:

	If you have a very simple task which you can immedietly code
	for sed without hard thinking => use sed, not awk.

	If your task is slightly more complex and you need some
	sort of "real" algorithm to solve it (but you are sure it
	will not become too complex for sed) => start with whatever
	tool you feel more comfortable with.

	If you expect your task to be extended one day beyond the
	abilities of sed => consider using awk in the first place.

	If your task is non-trivial though your are sure it is not
	too complex for sed and you normally would tend to use awk,
	but you think that execution speed of awk dominates so far
	that you could really benefit from sed, you have two
	options:
		=> use sed from the start
		=> write some prototype with awk but with sed in
		   mind (i.e. no variable, no functions if you have
		   nawk, no complex nesting of control structures)
	           and convert it to sed later.

Soweit meine 3.5 Pfennige (based on todays exchange rate for US $ :-))
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

torek@elf.ee.lbl.gov (Chris Torek) (05/28/91)

In article <1150@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes:
>When speaking about "optimization", I often sketch the above three goals
>on a sheet of paper as follows:
>
>		TIME  ---------------- SPACE
>		    \                  /
>	             \                /
>	              \              /
>	               \            /
>	                \          /
>	                 \        /
>	                  \      /
>	                   \    /
>	                    \  /
>	                   EFFORT
>
>Now you can assume that any reasonable% program can be associated with
>some location inside this triangle. ...

I think this should be a `shape of constant diameter' rather than a
triangle. :-)

(A solid with a constant diameter cross-section will work as a roller.
If you make a manhole [personhole?] cover out of such a shape, the
cover will not fit through its hole.  A solid disc is the most obvious
shape, but you can construct a `triangular' constant-diameter object by
drawing the above as an equilateral triangle, then using a compass to
draw an arc from each pair of adjacent corners, centered on their
opposite corner.  I.e., set the compass point at EFFORT and draw an arc
from SPACE to TIME, then set it at SPACE and draw from TIME to EFFORT,
etc.)
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov