berndmz@unido.UUCP (Zimmermann) (11/07/85)
I have a file where each line contains a keyword, a colon and text. The keyword would be a string not containing a colon and the file is sorted by key. I want to join lines with the same key so that the key and the colon appears only once and the text potions are concatenated somehow. key1 : textA key1 : textA key2 : textB \ key2 : textB textC textD key2 : textC ======\ key3 : textE textF key2 : textD ======/ key3 : textE / key3 : textF I think this problem isn't so exotic. E.g. it may occur when you generate makefiles automatically.
chris@umcp-cs.UUCP (Chris Torek) (11/08/85)
Sounds like a problem for `awk'. If your separator is :, use -F:; or if the first `word' will do, that is unnecessary. Here is an awk script to take <key> <space> <value> lines and join together all the <value>s: # $1 is the key, $2 through $NF are the values NF > 0 { if (NF == 1) key[$1] = key[$1] ""; else for (i = 2; i <= NF; i++) key[$1] = key[$1] " " $i; } END { for (i in key) print i key[i]; } -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
augart@h-sc1.UUCP (Steven Augart) (11/08/85)
In article <2155@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes: >Sounds like a problem for `awk'. Yes, but this script isn't quite what's required... Here's what I wrote for it ... awk '{ if ($1 != lastfield) { if (lastfield != "") printf "\n"; printf "%s :", lastfield = $1 } printf " %s", $3 } END {print}' <yourfilename> This will write the requested output to stdout, and has the additional advantage that you can't run out of memory, whereas an internal array may overflow on a non-virtual-memory machine. This assumes that the file format given (with a space around the colon on both sides) is accurate. -- Steven Augart swa%tardis@harvard.harvard.edu
chris@umcp-cs.UUCP (Chris Torek) (11/09/85)
The original example in <429@unido.UUCP> had, if I recall correctly, repeated keys that were not in sequence. If all your keys are in sequence then the associative array feature of awk is unnecessary and is indeed wasteful, as you point out. For another example, here is the last part of my current `generic makefile'. It runs through a set of programs, each assumed to consist of one `.c' file, and finds what they `#include'. The output of `cc -M', which of course consists of `key: text' lines, is joined into lines no more than 78 columns long. This is then inserted into the makefile itself, so that it will have an accurate dependency list---including <sys/foo.h> files, which have here a tendency toward rapid change. The -M option is available only in late 4.2 and 4.3 `cc's, unfortunately. depend: for i in ${SUBDIR}; do (cd $$i; make ${MFLAGS} depend); done for i in ${STD} ${NSTD} ${KMEM} ${SETUID}; do \ cc -M $$i.c | sed -e 's/\.o//' | awk '{ if ($$1 != prev) { \ if (rec != "") print rec; rec = $$0; prev = $$1; } \ else { if (length(rec $$2) > 78) { print rec; rec = $$0; } \ else rec = rec " " $$2 } } \ END { print rec }'; done >makedep echo '/^# DO NOT DELETE THIS LINE/+2,$$d' >eddep echo '$$r makedep' >>eddep echo 'w' >>eddep cp Makefile Makefile.bak ed - Makefile < eddep rm eddep makedep echo '# DEPENDENCIES MUST END AT END OF FILE' >>Makefile echo '# IF YOU PUT STUFF HERE IT WILL GO AWAY' >>Makefile echo '# see make depend above' >>Makefile # Files listed in ${NSTD} have explicit make lines given below. # DO NOT DELETE THIS LINE -- make depend uses it # DEPENDENCIES MUST END AT END OF FILE # IF YOU PUT STUFF HERE IT WILL GO AWAY # see make depend above -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu