berndmz@unido.UUCP (Zimmermann) (11/07/85)
I have a file where each line contains a keyword, a colon and text. The keyword would be a string not containing a colon and the file is sorted by key. I want to join lines with the same key so that the key and the colon appears only once and the text potions are concatenated somehow. key1 : textA key1 : textA key2 : textB \ key2 : textB textC textD key2 : textC ======\ key3 : textE textF key2 : textD ======/ key3 : textE / key3 : textF I think this problem isn't so exotic. E.g. it may occur when you generate makefiles automatically.
chris@umcp-cs.UUCP (Chris Torek) (11/08/85)
Sounds like a problem for `awk'. If your separator is :, use -F:;
or if the first `word' will do, that is unnecessary. Here is an
awk script to take <key> <space> <value> lines and join together
all the <value>s:
# $1 is the key, $2 through $NF are the values
NF > 0 {
if (NF == 1)
key[$1] = key[$1] "";
else
for (i = 2; i <= NF; i++)
key[$1] = key[$1] " " $i;
}
END {
for (i in key)
print i key[i];
}
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP: seismo!umcp-cs!chris
CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.eduaugart@h-sc1.UUCP (Steven Augart) (11/08/85)
In article <2155@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes: >Sounds like a problem for `awk'. Yes, but this script isn't quite what's required... Here's what I wrote for it ... awk '{ if ($1 != lastfield) { if (lastfield != "") printf "\n"; printf "%s :", lastfield = $1 } printf " %s", $3 } END {print}' <yourfilename> This will write the requested output to stdout, and has the additional advantage that you can't run out of memory, whereas an internal array may overflow on a non-virtual-memory machine. This assumes that the file format given (with a space around the colon on both sides) is accurate. -- Steven Augart swa%tardis@harvard.harvard.edu
chris@umcp-cs.UUCP (Chris Torek) (11/09/85)
The original example in <429@unido.UUCP> had, if I recall correctly,
repeated keys that were not in sequence. If all your keys are in
sequence then the associative array feature of awk is unnecessary
and is indeed wasteful, as you point out.
For another example, here is the last part of my current `generic
makefile'. It runs through a set of programs, each assumed to
consist of one `.c' file, and finds what they `#include'. The
output of `cc -M', which of course consists of `key: text' lines,
is joined into lines no more than 78 columns long. This is then
inserted into the makefile itself, so that it will have an accurate
dependency list---including <sys/foo.h> files, which have here a
tendency toward rapid change. The -M option is available only in
late 4.2 and 4.3 `cc's, unfortunately.
depend:
for i in ${SUBDIR}; do (cd $$i; make ${MFLAGS} depend); done
for i in ${STD} ${NSTD} ${KMEM} ${SETUID}; do \
cc -M $$i.c | sed -e 's/\.o//' | awk '{ if ($$1 != prev) { \
if (rec != "") print rec; rec = $$0; prev = $$1; } \
else { if (length(rec $$2) > 78) { print rec; rec = $$0; } \
else rec = rec " " $$2 } } \
END { print rec }'; done >makedep
echo '/^# DO NOT DELETE THIS LINE/+2,$$d' >eddep
echo '$$r makedep' >>eddep
echo 'w' >>eddep
cp Makefile Makefile.bak
ed - Makefile < eddep
rm eddep makedep
echo '# DEPENDENCIES MUST END AT END OF FILE' >>Makefile
echo '# IF YOU PUT STUFF HERE IT WILL GO AWAY' >>Makefile
echo '# see make depend above' >>Makefile
# Files listed in ${NSTD} have explicit make lines given below.
# DO NOT DELETE THIS LINE -- make depend uses it
# DEPENDENCIES MUST END AT END OF FILE
# IF YOU PUT STUFF HERE IT WILL GO AWAY
# see make depend above
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP: seismo!umcp-cs!chris
CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu