martin@mwtech.UUCP (Martin Weitzel) (05/25/91)
In article <1991May17.211650.2025@ultra.com> marc@mercutio.ultra.com (Marc Kwiatkowski {Host Software-AIX}) writes: [...] >If you'll allow me to get on my soapbox, I'd like to campaign >for sed. sed's syntax is somewhat obtuse, but it takes far >less time to load than awk. Since most of shell-script >execution time is spent creating pipes and filter-processes, this >can be a real win. [...] In my years of practical programming I often found that there is a general tradeoff between the following three goals: The TIME a program needs to complete its task. The SPACE a program occupies in memory while running. The EFFORT required to write, debug, and maintain a program. (I also think that a further aspect can to a large degree be subsummed under the last goal, that is: How well does the program lend itself as a basic framework from which other programs may be derived in the future.) When speaking about "optimization", I often sketch the above three goals on a sheet of paper as follows: TIME ---------------- SPACE \ / \ / \ / \ / \ / \ / \ / \ / \ / EFFORT Now you can assume that any reasonable% program can be associated with some location inside this triangle. "Optimizing" a program generally means that you can come closer to at most two of the mentioned goals. E.g. it is well known that you can trade space for time and vice versa; further there are often more or less tricky algorithms that allow a reduction of space AND time, but make the program more difficult to code or later to understand by someone else. %: If you like, you may even include "poorly written" programs in the above picture. Those are located outside the triangel, i.e. you can make them better with respect to ALL of the above goals. Back to the topic: Using sed instead of awk is surely an option in in many cases. Further I too sometimes get a little "carried away" when I tried to write "real programs" (i.e. such that check conditions, jump around, and exploit the hold area) with sed. But the price is usually high in that you get fast away from the goal I name EFFORT: - You have to think harder to get your algorithm right. (Todays programmers are used to think in "high level" constructs like "if-else" and "while"). - The program is more cryptic and hence more difficult to debug, understand for someone else, and to reuse later. - Future requirements may be beyond the abilities of sed, so you might have to rewrite it in awk later. All this, of course, is no argument if you have a stable, debuged algorithm which, when done in awk, dominates the runtime of some time critical or often executed task. But how much faster this will be if you do it with sed (compared to other possible restructurings of the total algorithm) deserves some investigations in advance. On my system (80386 with ISC UNIX) I've just tried and found that the difference in startup time is about 15 to 20% more for awk than for sed. The interesting point here is, that any already running other awk process will result in about the same speedup, because mapping the running code segment of some other process is somewhat faster the execing from disk (or rather out of the disk cache). Of course (and to be fair) I should say that any concurrently running sed process also speeds up starting sed, so sed wins again. Whether some sed or awk program that is NOT dominated by its startup time is faster as the other, IMHO totally depends on the algorithm, though I'd say I'd not be astonished if sed were about 50% faster for simple tasks. (I just tried a few *very* simple things which seem to support these estimations - at least on my system. Maybe it's the burden for awk to split every input line into fields which makes this big difference.) To summarize: If you have a very simple task which you can immedietly code for sed without hard thinking => use sed, not awk. If your task is slightly more complex and you need some sort of "real" algorithm to solve it (but you are sure it will not become too complex for sed) => start with whatever tool you feel more comfortable with. If you expect your task to be extended one day beyond the abilities of sed => consider using awk in the first place. If your task is non-trivial though your are sure it is not too complex for sed and you normally would tend to use awk, but you think that execution speed of awk dominates so far that you could really benefit from sed, you have two options: => use sed from the start => write some prototype with awk but with sed in mind (i.e. no variable, no functions if you have nawk, no complex nesting of control structures) and convert it to sed later. Soweit meine 3.5 Pfennige (based on todays exchange rate for US $ :-)) -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83
torek@elf.ee.lbl.gov (Chris Torek) (05/28/91)
In article <1150@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes: >When speaking about "optimization", I often sketch the above three goals >on a sheet of paper as follows: > > TIME ---------------- SPACE > \ / > \ / > \ / > \ / > \ / > \ / > \ / > \ / > \ / > EFFORT > >Now you can assume that any reasonable% program can be associated with >some location inside this triangle. ... I think this should be a `shape of constant diameter' rather than a triangle. :-) (A solid with a constant diameter cross-section will work as a roller. If you make a manhole [personhole?] cover out of such a shape, the cover will not fit through its hole. A solid disc is the most obvious shape, but you can construct a `triangular' constant-diameter object by drawing the above as an equilateral triangle, then using a compass to draw an arc from each pair of adjacent corners, centered on their opposite corner. I.e., set the compass point at EFFORT and draw an arc from SPACE to TIME, then set it at SPACE and draw from TIME to EFFORT, etc.) -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov