[comp.unix.questions] sorting and reversing lines of a file

lang@pearl.PRC.Unisys.COM (Francois-Michel Lang) (01/26/89)

I need utilities to do two things:
(1) reverse the order of lines in a file
    but leave the lines themselves intact.
    The Unix utility does just the opposite of this.

    E.g., if the file "f" contains
       line1 
       line2
       line3
    I want to produce
       line 3
       line 2
       line 1
    I have an awk program to do this,
    but I'm sure some clever soul out there can do much better.

(2) sort a file by length of input lines.
    Again, I have a script to do this which uses awk, sort, and sed,
    but I'm sure it can be done better.

I'd prefer no C programs!
Many thanks.
----------------------------------------------------------------------------
Francois-Michel Lang
Paoli Research Center, Unisys Corporation lang@prc.unisys.com (215) 648-7256
Dept of Comp & Info Science, U of PA      lang@cis.upenn.edu  (215) 898-9511

jlo@elan.UUCP (Jeff Lo) (01/27/89)

In article <9056@burdvax.PRC.Unisys.COM> lang@pearl.PRC.Unisys.COM (Francois-Michel Lang) writes:
>I need utilities to do two things:
>(1) reverse the order of lines in a file
>    but leave the lines themselves intact.
>    The Unix utility does just the opposite of this.
>
>    E.g., if the file "f" contains
>       line1 
>       line2
>       line3
>    I want to produce
>       line 3
>       line 2
>       line 1
	    ^ spaces?

>    I have an awk program to do this,
>    but I'm sure some clever soul out there can do much better.

I assume you meant for
	line1
	line2
	line3
to become
	line3
	line2
	line1

Assuming this is the case, you can use a shell script like this:

#!/bin/sh

for file in $* ; do
	ed - $file << __EOF__
g/^/.m0
w
q
__EOF__
done

This will use "ed" to reverse the order of the lines in each file
passed as an argument to the script. This and many other neat "ed"
tricks are in the quiz program. Try: quiz function ed-command
-- 
Jeff Lo
..!{ames,hplabs,uunet}!elan!jlo
Elan Computer Group, Inc.
(415) 322-2450

gwyn@smoke.BRL.MIL (Doug Gwyn ) (01/27/89)

In article <9056@burdvax.PRC.Unisys.COM> lang@pearl.PRC.Unisys.COM (Francois-Michel Lang) writes:
>(1) reverse the order of lines in a file
>    but leave the lines themselves intact.
>    The Unix utility does just the opposite of this.

You mean it doesn't leave the lines intact or it doesn't reverse their order?
Which utility?

Anyway, check for the existence of a "rev" utility on your system.
It's standard on all Research UNIXes I know of, including 4BSD, but
not on UNIX System V.

On a SVID-conformant system, make yourself a shell function or script
that does:

	pr -n9 -t | sort +0nr -1 | cut -f2-

Sorting isn't the most efficient way to do this task, but it's easy
to implement this way.  If you have a LOT of file reversing to do,
then probably a real software engineering task should be undertaken.

guy@auspex.UUCP (Guy Harris) (01/27/89)

>>(1) reverse the order of lines in a file
>>    but leave the lines themselves intact.
>>    The Unix utility does just the opposite of this.
>
>You mean it doesn't leave the lines intact or it doesn't reverse their
>order?

Probably that it does neither; it leaves the lines' order intact, but
doesn't leave the lines themselves intact, because...

>Which utility?

...he probably meant "rev", and...

>Anyway, check for the existence of a "rev" utility on your system.
>It's standard on all Research UNIXes I know of, including 4BSD, but
>not on UNIX System V.

...the "rev" on my Sun, which is probably the one from 4BSD, reverses
the *characters* in each *line* of the file, but doesn't reverse the
lines themselves.  The 4.3BSD source is source to a version of "rev"
that does that.  I don't have V7 source or documentation handy, so I
can't check whether the V7 version reversed the characters of lines and
left the lines in the same order or reversed the order of lines but left
the lines intact (but I seem to remember that the documentation may have
claimed that it did the latter).

The 4.3BSD source has an SCCS ID from which I infer that it was written
at Berkeley; "rev" may have been one of those utilities mentioned in the
man page but not distributed (I seem to remember "speak", or whatever
the Votrax-driving utility was, being one of them), or one of the ones
distributed, but not in source form (such as "ppt" or the "ching"
programs), so somebody at Berkeley may have written their own, but (if
the intent was to reverse the lines) gotten it wrong.

leo@philmds.UUCP (Leo de Wit) (01/27/89)

In article <9056@burdvax.PRC.Unisys.COM> lang@pearl.PRC.Unisys.COM (Francois-Michel Lang) writes:
|
|I need utilities to do two things:
|(1) reverse the order of lines in a file
|    but leave the lines themselves intact.
|    The Unix utility does just the opposite of this.
|
|    E.g., if the file "f" contains
|       line1 
|       line2
|       line3
|    I want to produce
|       line 3
|       line 2
|       line 1
|    I have an awk program to do this,
|    but I'm sure some clever soul out there can do much better.

tail -r f

|(2) sort a file by length of input lines.
|    Again, I have a script to do this which uses awk, sort, and sed,
|    but I'm sure it can be done better.

Don't know if this is better; at least it is different (8-):

#! /bin/sh
# Usage: lensort [files]

sed '
h
s/./Z/g
G
s/\n/A/' $* |
sort|
sed 's/^Z*A//'

The trick used here is to prepend each line with a copy of itself with
all characters substituted by a 'Z' and terminated by a A; lexically
sorting will thus be done in order of line lengths (e.g. ZZZApqr comes
before ZZZZAabcd). Afterwards the Z*A prefixes are removed by sed.

|I'd prefer no C programs!
Sorry about tail -f (8-).

|Many thanks.
Hope it helped,
                 Leo.

leo@philmds.UUCP (Leo de Wit) (01/27/89)

In article <936@philmds.UUCP> leo@philmds.UUCP (that's me) writes:
|In article <9056@burdvax.PRC.Unisys.COM> lang@pearl.PRC.Unisys.COM (Francois-Michel Lang) writes:
||
||I need utilities to do two things:
||(1) reverse the order of lines in a file
|
|tail -r f

If the intention was to SORT the lines reversed (this is not clear,
even from the example), this should be

sort -r f

   []
|Sorry about tail -f (8-).
Of course I meant tail -r here.

                 Leo.

gph@hpsemc.HP.COM (Paul Houtz) (01/28/89)

gwyn@smoke.BRL.MIL (Doug Gwyn ) writes:

>	pr -n9 -t | sort +0nr -1 | cut -f2-
>
>Sorting isn't the most efficient way to do this task, but it's easy
>to implement this way.  If you have a LOT of file reversing to do,
>then probably a real software engineering task should be undertaken.
----------

Umm, please explain why sorting isn't the most eficient way to do
this.

Is there a rapid way to begin reading at the end of a file in Unix?
If not, then wouldn't a sort be just as fast as any other way?

gwyn@smoke.BRL.MIL (Doug Gwyn ) (01/28/89)

In article <899@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes:
>...the "rev" on my Sun, which is probably the one from 4BSD, reverses
>the *characters* in each *line* of the file, but doesn't reverse the
>lines themselves.

Oops, sorry -- I should have checked before posting.
Maybe "rev" combined with a "sideways" utility ... :-)

Anyway, the pr|sort|cut approach I suggested should suffice.

gwyn@smoke.BRL.MIL (Doug Gwyn ) (01/28/89)

In article <810031@hpsemc.HP.COM> gph@hpsemc.HP.COM (Paul Houtz) writes:
>Umm, please explain why sorting isn't the most eficient way to do this.

Because the general sorting utility does more work than is
necessary to simply reverse the order of the records (lines).
It's of order N*logN whereas only 2*N is required for this task.

The best approach for a specialized implementation would be to
make a preliminary sequential scan to find the positions of the
start of each line (and probably the line length), then in the
second phase seek to the appropriate start places and copy out
the lines.  You should also implement a smart buffering scheme
to make this approach maximally efficient.

jlw@lznv.ATT.COM (J.L.WOOD) (01/30/89)

In article <9056@burdvax.PRC.Unisys.COM>, lang@pearl.PRC.Unisys.COM (Francois-Michel Lang) writes:
> 
> I need utilities to do two things:
> (1) reverse the order of lines in a file
>     but leave the lines themselves intact.
>     The Unix utility does just the opposite of this.
> 
>     E.g., if the file "f" contains
>        line1 
>        line2
>        line3
>     I want to produce
>        line 3
>        line 2
>        line 1

Try good ol' ed:

ed a
a
line1
line2
line3
.
,p
line1
line2
line3
g/^/m0
,p
line3
line2
line1
Q

Could anything be easier.  The operative line is g/^/m0.
Ie first mark all lines and then one at a time move them
to follow line zero.

Joe Wood
jlw@lznv.ATT.COM

fmr@cwi.nl (Frank Rahmani) (01/30/89)

> 
> In article <9056@burdvax.PRC.Unisys.COM> lang@pearl.PRC.Unisys.COM (Francois-Michel Lang) writes:
>>I need utilities to do two things:
>>(1) reverse the order of lines in a file
>>    but leave the lines themselves intact.
you might consider:
	#! /bin/sh
	exec cat "$@" | exec tail -r
I use this all the time to reverse my .newsrc as I read nearly all
groups and would never see the last groups if I didn't start from
the end once in a while.
fmr@cwi.nl
-- 
It is better never to have been born. But who among us has such luck?
Maintainer's Motto:
	If we can't fix it, it ain't broke.
These opinions are solely mine and in no way reflect those of my employer.  

jon@jonlab.UUCP (Jon H. LaBadie) (01/30/89)

In article <9056@burdvax.PRC.Unisys.COM>, lang@pearl.PRC.Unisys.COM (Francois-Michel Lang) writes:
> 
> I need utilities to do two things:
> (1) reverse the order of lines in a file
>     but leave the lines themselves intact.
>     E.g., if the file "f" contains
>        line1 
>        line2
>        line3
>     I want to produce
>        line 3
>        line 2
>        line 1
> 
Try the global command of ed(1), or ex/vi(1).  If vi is your interactive
editor, just type (from anywhere in the file):

	:g/^/m0

This globally finds every line with a beginning, and one at a time,
moves them to the beginning of the file.  Because of the speed, I'm
sure the data is not reorganized until it is written back to the file.

If you need to do this non-interactively, try a shell script invoking
ed with a "here document".  Something like this should work.

	ed - ${1:?"Need a file name"} <<-!
		g/^/m0
		w
		q
	!

Put it in a file called flip, and flip those lines.
-- 
Jon LaBadie
{att, ulysses, princeton, bcr}!jonlab!jon

lorensen@dwaskill.stars.flab.Fujitsu.JUNET (Bill Lorensen) (01/30/89)

How about:
	grep -n . file | sort -r | sed -e 's/^.*://'

Of course, as leo@philmds.UUCP (Leo de Wit) said:
	tail -r file
does the trick nicely.
--
Bill Lorensen
	US Mail:GE Corporate Research and Development
		P.O. Box 8
		Bldg KW Room C207A
		Schenectady, NY 12301
	Office: (518) 387-6744 or 8*833-3874
	Fax:	(518) 387-6560 or 8*833-6560
	E-Mail: lorensen@ge-crd.arpa
		lorensen@crd.steinmetz.ge.com

david@hcr.UUCP (David Fiander) (02/01/89)

In article <9056@burdvax.PRC.Unisys.COM> lang@pearl.UUCP writes:
>
>I need utilities to do two things:
>(1) reverse the order of lines in a file
>    but leave the lines themselves intact.
>    The Unix utility does just the opposite of this.

<Sanity check: I'm about to think of something Doug Gwyn hasn't>

If you don't need the file reversing as a filter,  which it in practice won't
behave as anyway, then ed will do just what you want:

	Script started on Tue Jan 31 12:47:35 1989
	$ cat demo
	line 1
	line 2
	line 3
	line 4
	line 5
	$ ed demo <<EOF
*>	> g/^/m0
*>	> w
*>	> q
*>	> EOF
	35
	35
	$ cat demo
	line 5
	line 4
	line 3
	line 2
	line 1
	$
	script done on Tue Jan 31 12:48:56 1989

Pretty nifty, eh?
--------
David Fiander (...!hcr!david)

"Jolt:  All the sugar and twice the caffeine of the leading colas"

hansen@pegasus.ATT.COM (Tony L. Hansen) (02/11/89)

>>(1) reverse the order of lines in a file
>>    but leave the lines themselves intact.
>>    The Unix utility does just the opposite of this.

This sounds like a job for "tac" which has been posted at least a couple of
times to the net. Here's the man page.

    NAME
	tac - concatenate and print files in reverse

    SYNOPSIS
	tac file ...

    DESCRIPTION
	Tac reads each file in sequence and writes it on the standard
	output with the lines in reverse order. Thus:

		tac file

	prints the file in reverse, and:

		tac file1 file2 >file3

	concatenates the first two files, reversed, and places the result
	on the third.

		tac file | tac

	will reproduce the contents of the file. If no input file is given,
	or if the argument - is encountered, tac reads from the standard
	input.

    WARNING
	Command formats such as

		tac file1 file2 >file1

	will cause the original data in file1 to be lost, therefore, take
	care when using shell special characters.

    SEE ALSO
	cat(1).

					Tony Hansen
				att!pegasus!hansen, attmail!tony

scm@datlog.co.uk ( Steve Mawer ) (02/21/89)

In article <2585@pegasus.ATT.COM> hansen@pegasus.ATT.COM (Tony L. Hansen) writes:
>    WARNING
>	Command formats such as
>
>		tac file1 file2 >file1
>
>	will cause the original data in file1 to be lost, therefore, take
>	care when using shell special characters.
>
>    SEE ALSO
>	cat(1).

Note also that formats such as 

	   cat file1 file2 > file2

will cause your disk to become *very* full (as well as losing the original
contents of file2).

I can't speak for tac, since it won't do a simple first-to-last read of the
files.

-- 
Steve C. Mawer        <scm@datlog.co.uk> or < {backbone}!ukc!datlog!scm >
                       Voice:  +44 1 863 0383 (x2153)

leo@philmds.UUCP (Leo de Wit) (02/26/89)

In article <1774@dlvax2.datlog.co.uk> scm@datlog.co.uk ( Steve Mawer ) writes:
   []
|Note also that formats such as 
|
|	   cat file1 file2 > file2
|
|will cause your disk to become *very* full (as well as losing the original
|contents of file2).

Not necessarily so:

Script started on Sun Feb 26 12:38:24 1989
philmds> cat /etc/motd
Ultrix V2.0-1 System #2: Mon Oct 26 15:31:26 MET 1987
philmds> ls -l file[12]
-rw-r-----  1 leo           177 Feb 26 12:37 file1
-rw-r-----  1 leo           273 Feb 26 12:37 file2
philmds> cat file1 file2 >file2
cat: input file2 is output
philmds> 

script done on Sun Feb 26 12:39:11 1989

All you need is a smart cat ( :-); this one will probably do an
fstat(2) (on 1), which not all Unixes (Unices ?) support.

	 Leo.

P.S. In situations like this one could consider using the 'overwrite'
program, as presented in Brian W.Kernighan / Rob Pike's 'The Unix
programming environment':

$ overwrite file2 cat file1 file2

This situation also arises in cases like:

$ sed -f sedfile datafile >/tmp/sed$$; cp /tmp/sed$$ datafile; rm /tmp/sed$$

which can be written like

$ overwrite datafile sed -f sedfile datafile

'Overwrite' also takes care of various exception conditions: a sed
command that failed, an interrupt etc. which would otherwise mess up
the output and/or leave temporary files.

guy@auspex.UUCP (Guy Harris) (02/27/89)

>All you need is a smart cat ( :-); this one will probably do an
>fstat(2) (on 1), which not all Unixes (Unices ?) support.

What is it that not all UNIXes support?  All the ones I know of support
"fstat", and both 4.xBSD and S5 (and, as I remember, all versions going
back to V7) supported "cat" checking whether its output was the same
file as any of its inputs.

morrell@hpsal2.HP.COM (Michael Morrell) (03/01/89)

/ hpsal2:comp.unix.questions / leo@philmds.UUCP (Leo de Wit) /  4:00 am  Feb 26, 1989 /
In article <1774@dlvax2.datlog.co.uk> scm@datlog.co.uk ( Steve Mawer ) writes:
   []
|Note also that formats such as 
|
|	   cat file1 file2 > file2
|
|will cause your disk to become *very* full (as well as losing the original
|contents of file2).

Not necessarily so:

Script started on Sun Feb 26 12:38:24 1989
philmds> cat /etc/motd
Ultrix V2.0-1 System #2: Mon Oct 26 15:31:26 MET 1987
philmds> ls -l file[12]
-rw-r-----  1 leo           177 Feb 26 12:37 file1
-rw-r-----  1 leo           273 Feb 26 12:37 file2
philmds> cat file1 file2 >file2
cat: input file2 is output
philmds> 

script done on Sun Feb 26 12:39:11 1989
----------

Interesting.  I never noticed that "cat" would complain if one of its input
files was the saem as its output.  Unfortunately, although this feature
prevents the disk from filling up, it still causes the original contents of
file2 to be lost, resulting in file2 being a copy of file1 (at least, this is
the behavior on HP-UX).  Since the shell is truncating file2 before cat is
invoked, you need a smarter shell, not a smarter cat, to avoid this problem.

   Michael

gwyn@smoke.BRL.MIL (Doug Gwyn ) (03/02/89)

In article <14660007@hpsal2.HP.COM> morrell@hpsal2.HP.COM (Michael Morrell) writes:
>...  Since the shell is truncating file2 before cat is
>invoked, you need a smarter shell, not a smarter cat, to avoid this problem.

"Smartness" has nothing to do with it; "> file2" is DEFINED to truncate
(i.e. overwrite) an existing file.  The shell is doing exactly what it's
told.  The problem lies in the user's brain somewhere...

rbj@nav.icst.nbs.gov (Root Boy Jim) (03/15/89)

? From: Francois-Michel Lang <lang@pearl.prc.unisys.com>

? I need utilities to do two things:
? (1) reverse the order of lines in a file
?     but leave the lines themselves intact.
?     The Unix utility does just the opposite of this.

You don't want to do this, but it does work!

sed -e '1{;h;d;}' -e '$!{;G;h;d;}' -e '$G'

Well, up to a point anyway. An `ls -l /etc | wc -l' produces 140 lines
on our system. Piping the ls to the sed command produced only about 70
lines, giving a total of 4000 characters. That seems to be the limit.

Don't forget to quote the `!' if using csh.  The semicolons are an
undocumented feature.  An awk script would be a better way to do this.
Tail -r seems to be the best buggestion unless you are reversing
really BIG files. In that case you probably do want to write a C
program. There's just no escaping the buffering problem.

	Catman Rshd <rbj@nav.icst.nbs.gov>
	Author of "The Daemonic Versions"