[net.unix] Joining Textlines containing "Key: Text."

berndmz@unido.UUCP (Zimmermann) (11/07/85)

I have a file where each line contains a keyword, a colon and text. The keyword
would be a string not containing a colon and the file is sorted by key.
I want to join lines with the same key so that the key and the colon appears
only once and the text potions are concatenated somehow.


key1 : textA				key1 : textA
key2 : textB		     \		key2 : textB textC textD
key2 : textC		======\		key3 : textE textF
key2 : textD		======/
key3 : textE		     /
key3 : textF

I think this problem isn't so exotic. E.g. it may occur when you generate
makefiles automatically.

chris@umcp-cs.UUCP (Chris Torek) (11/08/85)

Sounds like a problem for `awk'.  If your separator is :, use -F:;
or if the first `word' will do, that is unnecessary.  Here is an
awk script to take <key> <space> <value> lines and join together
all the <value>s:

# $1 is the key, $2 through $NF are the values
NF > 0 {
	if (NF == 1)
		key[$1] = key[$1] "";
	else
		for (i = 2; i <= NF; i++)
			key[$1] = key[$1] " " $i;
}

END {
	for (i in key)
		print i key[i];
}
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

augart@h-sc1.UUCP (Steven Augart) (11/08/85)

In article <2155@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes:
>Sounds like a problem for `awk'.  
Yes, but this script isn't quite what's required...
Here's what I wrote for it ...
awk '{ if ($1 != lastfield) { if (lastfield != "") printf "\n"; printf "%s :", lastfield = $1 } printf " %s", $3 } END {print}' <yourfilename>
This will write the requested output to stdout, and has the additional
advantage that you can't run out of memory, whereas an internal array
may overflow on a non-virtual-memory machine.
This assumes that the file format given (with a space around the colon
on both sides) is accurate.
-- 
Steven Augart
swa%tardis@harvard.harvard.edu

chris@umcp-cs.UUCP (Chris Torek) (11/09/85)

The original example in <429@unido.UUCP> had, if I recall correctly,
repeated keys that were not in sequence.  If all your keys are in
sequence then the associative array feature of awk is unnecessary
and is indeed wasteful, as you point out.

For another example, here is the last part of my current `generic
makefile'.  It runs through a set of programs, each assumed to
consist of one `.c' file, and finds what they `#include'.  The
output of `cc -M', which of course consists of `key: text' lines,
is joined into lines no more than 78 columns long.  This is then
inserted into the makefile itself, so that it will have an accurate
dependency list---including <sys/foo.h> files, which have here a
tendency toward rapid change.  The -M option is available only in
late 4.2 and 4.3 `cc's, unfortunately.

depend:
	for i in ${SUBDIR}; do (cd $$i; make ${MFLAGS} depend); done
	for i in ${STD} ${NSTD} ${KMEM} ${SETUID}; do \
	    cc -M $$i.c | sed -e 's/\.o//' | awk '{ if ($$1 != prev) { \
		if (rec != "") print rec; rec = $$0; prev = $$1; } \
		else { if (length(rec $$2) > 78) { print rec; rec = $$0; } \
		else rec = rec " " $$2 } } \
		END { print rec }'; done >makedep
	echo '/^# DO NOT DELETE THIS LINE/+2,$$d' >eddep
	echo '$$r makedep' >>eddep
	echo 'w' >>eddep
	cp Makefile Makefile.bak
	ed - Makefile < eddep
	rm eddep makedep
	echo '# DEPENDENCIES MUST END AT END OF FILE' >>Makefile
	echo '# IF YOU PUT STUFF HERE IT WILL GO AWAY' >>Makefile
	echo '# see make depend above' >>Makefile

# Files listed in ${NSTD} have explicit make lines given below.

# DO NOT DELETE THIS LINE -- make depend uses it

# DEPENDENCIES MUST END AT END OF FILE
# IF YOU PUT STUFF HERE IT WILL GO AWAY
# see make depend above
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu