[net.unix] sort problems

naiman@pegasus.UUCP (Ephrayim J. Naiman) (01/20/85)

<munch, munch>

Okay you sort experts why do these two sorts not work ?

sort << !
 Abe
Aba
!

sort << !
Abe
$Aba
!

The first one with no matter what combination of sort options I use
I can't Aba to come out first (except of course -r).
The second one, the '$' seems to screw something up and
it comes back with that line empty.

		Any ideas ?

Thanx

-- 
==> Ephrayim J. Naiman @ AT&T Information Systems Laboratories (201) 576-6259
Paths: [ihnp4, allegra, ahuta, maxvax, cbosgd, lzmi, ...]!pegasus!naiman

mzal@pegasus.UUCP (Mike Zaleski) (01/21/85)

   From: naiman@pegasus.UUCP (Ephrayim J. Naiman)
   Okay you sort experts why do these two sorts not work ?

   sort << !
    Abe
   Aba
   !

This output is correct.  Space is less than 'A' in the ASCII
sequence, which is how sort sorts by default.  The -b option
should cause leading blanks to be ignored.  When I tried it with
this input, it did not work.

   sort << !
   Abe
   $Aba
   !

Try escaping the $ with a backslash when entering this to the shell.
By the way, the ASCII value of '$' is also less than that of 'A'.

-- Mike^Z   [allegra!, ihnp4!] pegasus!mzal   Zaleski@Rutgers

avolio@grendel.UUCP (Frederick M. Avolio) (01/21/85)

> Okay you sort experts why do these two sorts not work ?
> sort << !
>  Abe
> Aba
> !
> 
> sort << !
> Abe
> $Aba
> !
> 

Well, they do work. " Abe" *is* less than "Aba".  Blank is 040  and  A
is 0101.

In the second example the shell is reading the input and passing it to
sort.  $Aba  is interpreted as a shell variable, which probably had no
value.  Hence, the line is blank. (In fact, the C shell tosses it  out
immediately as undefined.  The Bournse shell doesn't care.)
-- 
Fred Avolio
301/731-4100 x4227
UUCP:  {seismo,decvax}!grendel!avolio
ARPA:  grendel!avolio@seismo.ARPA

ka@hou3c.UUCP (Kenneth Almquist) (01/22/85)

> Okay you sort experts why do these two sorts not work ?
> 
> sort << !
>  Abe
> Aba
> !
>
> With no matter what combination of sort options I use
> I can't Aba to come out first (except of course -r).

Try "sort -b +0".  The +0 causes it to sort on the first field, and
the -b causes it to ignore leading blanks in determining the start
of the first field.

> sort << !
> Abe
> $Aba
> !
> 
> The '$' seems to screw something up and it comes back with that line empty.

This has nothing to do with sort.  The shell will normally expand
shell variables inside text included by "<<".  Since $Aba has not
been set, it is replaced with the null string.  To suppress this
behavior, substitute "<<\!" for "<<!".  See sh(1) for more details.
					Kenneth Almquist

ken@turtlevax.UUCP (Ken Turkowski) (01/22/85)

In article <2041@pegasus.UUCP> mzal@pegasus.UUCP (Mike Zaleski) writes:
>
>   From: naiman@pegasus.UUCP (Ephrayim J. Naiman)
>   Okay you sort experts why do these two sorts not work ?
>
>   ...
>
>   sort << !
>   Abe
>   $Aba
>   !
>
>Try escaping the $ with a backslash when entering this to the shell.
>By the way, the ASCII value of '$' is also less than that of 'A'.
>
>-- Mike^Z   [allegra!, ihnp4!] pegasus!mzal   Zaleski@Rutgers

Even better, put quotes around the !:

sort << '!'
Abe
$Aba
!

Which produces the output:

$Aba
Abe

To quote from the Bourne shell documentation:

<< word
	The shell is read up to a line the same as word, or end of
	file.  The resulting document becomes the standard input.  If
	any character of word is quoted, no interpretation is placed
	upon the characters of the document; otherwise parameter and
	command substitution occurs, \newline is ignores, and \ is used
	to quote the characters \ $ ' and the first character of word.

The C shell has analogous propeties.
-- 

Ken Turkowski @ CADLINC, Menlo Park, CA
UUCP: {amd,decwrl,nsc,seismo,spar}!turtlevax!ken
ARPA: turtlevax!ken@DECWRL.ARPA

naiman@pegasus.UUCP (Ephrayim J. Naiman) (01/23/85)

<munch, munch>

Thanx for all the mail regarding the sort problems.
As it turned out the -b option to ignore leading blanks
only works in field compares.  Thanx again for all your help.
-- 

==> Ephrayim J. Naiman @ AT&T Information Systems Laboratories (201) 576-6259
Paths: [ihnp4, allegra, ahuta, maxvax, cbosgd, lzmi, ...]!pegasus!naiman

liberte@uiucdcs.UUCP (01/25/85)

On our 4.2bsd, the effect of the -b flag was reversed.  That is, instead
of ignoring leading blanks, it ignores trailing blanks.  

Why?  Possibly because the default action (in the 4.1 distribution) is
screwy.  Instead of specifying field numbers, you specify how many fields to
skip, and then count the leading spaces as part of the next field after
those skipped.

Possibly because the documentation (7th edition) may be misleading.
"The notation +pos1 -pos2 restricts a sort key to a field beginning at pos1
and ending just BEFORE pos2."  Later it says that "...fields are nonempty
nonblank strings separated by blanks."  In any event, it is confusing.
While the b flag on pos1 ignores leading blanks, what is the intended effect
of the b flag on pos2?  The actual effect is to INCLUDE the blanks before
pos2 in the sort key.  So watch out for the global -b.

Partly because of these problems, and partly because I was confused when I
did it, I changed our version of sort to always, really ignore blanks unless
a tab char is specified.  I should have also had the b flag reverse this
effect.  This is what most people want.  But I now think it was a mistake to
change sort at all, even by Berkeley.  There is value in the old way.

Daniel LaLiberte
ihnp4!uiucdcs!liberte
liberte@uiucdcs.Uiuc.ARPA
U of Illinois, Urbana-Champaign, 
Dept of Computer Science
(217) 333-8426

rwl@uvacs.UUCP (Ray Lubinsky) (01/28/85)

------------------------------------------------------------------------------

> sort << !
>  Abe
> Aba
> !

   The explanation for the first is simple: sort(1) compares the ASCII values
of the first character of each line.  The first character of the line " Abe" is
the space character (octal 040), whereas the first character of the line "Aba"
is 'A' (octal 101).  The lines are sorted in ascending numerical order, hence
"Aba" comes out second, since 040 < 101.

> 
> sort << !
> Abe
> $Aba
> !
> 

   This is a little more obscure.  If you had just done this from standard
input (just typing "sort", then entering lines until you enter the end-of-file
character) you would have found the "$Aba" line would have come first (octal
045 comes before octal 101).  Unfortunately, the way you did it left each line
open to interpretation by the Bourne shell, which thought that "$Aba" was a
shell variable and replaced the line with its value (nothing).

   If you really need to alphabetically sort lines which may be preceded by
tabs or spaces, use the stream editor (sed(1)) to modify the lines on the
fly (eg):  sed 's/^[ @]*//' | sort  (where '@' should be replaced by your
system's tab character).

------------------------------------------------------------------------------

Ray Lubinsky		     University of Virginia, Dept. of Computer Science
			     uucp: decvax!mcnc!ncsu!uvacs!rwl