[comp.unix.questions] file manipulation

jpd@tardis.cl.msu.edu (Joe P. DeCello) (06/18/91)

What is this best (easiest) way in which to get the first word
of each line in a text file to a single line in a new text file?

^L
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Joseph P. DeCello III                e-mail:  jpd@cad.msu.edu
Michigan State University             phone:  (517) 353-3027
Specialized Computing Support Services

rouben@math9.math.umbc.edu (Rouben Rostamian) (06/18/91)

In article <1991Jun17.200748.19324@msuinfo.cl.msu.edu> jpd@tardis.cl.msu.edu (Joe P. DeCello) writes:
>
>What is this best (easiest) way in which to get the first word
>of each line in a text file to a single line in a new text file?

Depends what you mean why "word".  If a "word" is taken to mean any
contiguous sequence of non-blank characters, then following should do:

sed 's/[ T]*\([^ T]*\).*/\1/'  infile >outfile

where I have typed a "T" to show a tab character.  You should type a
tab where you see a "T".

--
Rouben Rostamian                          Telephone: (301) 455-2458
Department of Mathematics and Statistics  e-mail:
University of Maryland Baltimore County   bitnet: rostamian@umbc.bitnet
Baltimore, MD 21228,  U.S.A.              internet: rouben@math9.math.umbc.edu

jit@cellbio.duke.edu (Jit Keong Tan) (06/18/91)

In article <1991Jun17.200748.19324@msuinfo.cl.msu.edu> jpd@tardis.cl.msu.edu (Joe P. DeCello) writes:
>
>What is this best (easiest) way in which to get the first word
>of each line in a text file to a single line in a new text file?
>
>^L
>--
>=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>Joseph P. DeCello III                e-mail:  jpd@cad.msu.edu
>Michigan State University             phone:  (517) 353-3027
>Specialized Computing Support Services

I ran this in an emacs/src dir:

[63] jit@slic % ls -l t* | nawk '{ printf "%s %s === ", $5,$9}' -
770 temacs.opt === 33296 term.c === 14111 termcap.c === 2202 termchar.h === 1714
 termhooks.h === 1717 terminfo.c === 1475 termopts.h === 177 testemacs.com === 6
 966 tparam.c === [64] jit@slic % 

(outputs got wrapped by the screen)
-- 

--------------------------------------------------------
Jit Keong Tan     | internet: jit@slic.cellbio.duke.edu
(919) 684-8098    | bitnet  : tan00001@dukemc.bitnet

jpd@tardis.cl.msu.edu (Joe P. DeCello) (06/18/91)

To rephrase my previous question:

Suppose I have a file containing several lines of text.
Each line is an entry for a database (or whatever) and
let's say each line contains 8 fields.  The fields are
separated by colons.  I would like to be able to output
the first field of each line into a new file.  I would 
like these fields to be on one line in the new file and
separated by commas.  The best response to my previous
posting was this:

awk '{printf "%s ", $1' < infile > outfile

I changed this to :

awk '{printf "%s, ", $1' < infile > outfile

to get separation by commas, but now I need to break off
the first field of each line from the infile at the colon.

Thanks for those who have replied to the original question
and to those who reply to this one.
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Joseph P. DeCello III                e-mail:  jpd@cad.msu.edu
Michigan State University             phone:  (517) 353-3027
Specialized Computing Support Services

lewis@tramp.Colorado.EDU (LEWIS WILLIAM M JR) (06/18/91)

In article <1991Jun18.014539.22085@msuinfo.cl.msu.edu> jpd@tardis.cl.msu.edu (Joe P. DeCello) writes:
>
>To rephrase my previous question:
>
. . .
>The best response to my previous posting was this:
>
>awk '{printf "%s ", $1' < infile > outfile
>
>I changed this to :
>
>awk '{printf "%s, ", $1' < infile > outfile
>
 your solution simply needs another line.  Also, note addition of closing
 brace:
	awk '
		{printf "%s, ", $1}
		END {printf "\n"}
	' < infile > outfile

weimer@garden.ssd.kodak.com (Gary Weimer (253-7796)) (06/18/91)

In article <1991Jun18.151428.22784@colorado.edu>,
lewis@tramp.Colorado.EDU (LEWIS WILLIAM M JR) writes:
|> In article <1991Jun18.014539.22085@msuinfo.cl.msu.edu>
jpd@tardis.cl.msu.edu (Joe P. DeCello) writes:
|> >
|> >To rephrase my previous question:
|> >
|> . . .
|> >The best response to my previous posting was this:
|> >
|> >awk '{printf "%s ", $1' < infile > outfile
|> >
|> >I changed this to :
|> >
|> >awk '{printf "%s, ", $1' < infile > outfile
|> >
|>  your solution simply needs another line.  Also, note addition of closing
|>  brace:
|> 	awk '
|> 		{printf "%s, ", $1}
|> 		END {printf "\n"}
|> 	' < infile > outfile

Execpt you left out the part that said: "but now I need to break off
the first field of each line from the infile at the colon."

What he really wants is:

awk -F: '{if (NR==1) {printf "%s", $1} else {printf ", %s", $1}}' < in > out

NOTES:
-F:             -- tells awk to use ':' as the field separator
if (NR==1) etc. -- eliminates the trailing comma

weimer@ssd.kodak.com ( Gary Weimer )

jba@gorm.ruc.dk (Jan B. Andersen) (06/19/91)

jpd@tardis.cl.msu.edu (Joe P. DeCello) writes:


>To rephrase my previous question:

>Suppose I have a file containing several lines of text.
>Each line is an entry for a database (or whatever) and
>let's say each line contains 8 fields.  The fields are
>separated by colons.

Very similar to /etc/passwd then.

>I would like to be able to output
>the first field of each line into a new file.  I would 
>like these fields to be on one line in the new file and
>separated by commas.

Easy. We'll use cut(1) to select field no. 1 using ':' as the delimiter,
and will then use tr(1) to translate the newlines into commas:

  $ cat OLDFILE | cut -d: -f1 | tr "\012" "," > NEWFILE

The only problem is what to do with the last comma?

>=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>Joseph P. DeCello III                e-mail:  jpd@cad.msu.edu
>Michigan State University             phone:  (517) 353-3027
>Specialized Computing Support Services
-- 
      /|  / Jan B. Andersen                        /^^^\     .----------------.
     / | /  RUC, Hus 19,1     jba@dat.ruc.dk      { o_o }    | SIMULA does it |
    /--|/   Postbox 260       DG-passer@ruc.dk     \ o / --> | with CLASS     |
`--'   '    DK-4000 Roskilde  Postmaster@ruc.dk --mm---mm--  `----------------'

mouse@thunder.mcrcim.mcgill.edu (der Mouse) (06/20/91)

In article <1991Jun18.195126.9916@gorm.ruc.dk>, jba@gorm.ruc.dk (Jan B. Andersen) writes:
> jpd@tardis.cl.msu.edu (Joe P. DeCello) writes:
>> I would like to be able to output the first field of each line into
>> a new file.  I would like these fields to be on one line in the new
>> file and separated by commas.

> Easy. We'll use cut(1) to select field no. 1 using ':' as the
> delimiter, and will then use tr(1) to translate the newlines into
> commas:

>   $ cat OLDFILE | cut -d: -f1 | tr "\012" "," > NEWFILE

> The only problem is what to do with the last comma?

Not the only problem; you're also missing the trailing newline.

< OLDFILE awk -F: 'BEGIN { pref = ""; } { printf("%s%s",pref,$1); pref = ","; } END { printf("\n"); }' > NEWFILE

					der Mouse

			old: mcgill-vision!mouse
			new: mouse@larry.mcrcim.mcgill.edu

jos@and.nl (J. Horsmeier) (06/24/91)

In article <1991Jun17.200748.19324@msuinfo.cl.msu.edu> jpd@tardis.cl.msu.edu (Joe P. DeCello) writes:
>
>What is this best (easiest) way in which to get the first word
>of each line in a text file to a single line in a new text file?
>=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>Joseph P. DeCello III                e-mail:  jpd@cad.msu.edu

Hi there, 

use awk(1), as follows: 

	awk '{ print $1 }' file_in > file_out


Jos


+----------------------------------------------------------------------+
|O   J.A. Horsmeier AND Software B.V.        phone : +31 10 4367100   O|
|O                  Westersingel 106/108     fax   : +31 10 4367110   O|
|O                  3015 LD  Rotterdam NL    e-mail: jos@and.nl       O|
|O--------------------------------------------------------------------O|
|O               I am a Hamburger (F. Zappa 1974)                     O|
+----------------------------------------------------------------------+