tes@whuts.UUCP (STERKEL) (07/21/87)
<*> I am not certain if this is a neophyte question (comp.unix.question) or a request for software (comp.sources.wanted). If you respond with sources, please, please, please make certain that it is fully compatible with vanilla SysV. Thanks. <*> Now for the problem: I need a multiple field sort that maintains sub-field order. Let me draw a picture: File contents field 0 field 1 field x field 2 field y field 3 field z ------- ------- ------- ------- ------- ------- ------ ... aaa ... ... ... ... ... ... bbb ... ... ... ... ... ... bba ... aaa ... ... ... ... bba ... aab ... axa ... ... bba ... aab ... aya ... ... bba ... aac ... xxx ... ... bba ... aac ... xxx ... ... ccc ... ... ... ... ... ... dcc ... ... ... ... ... ... ecc ... ... ... ... ... ... fcc ... aaa ... ... ... ... fcc ... bbb ... ... ... ... fcc ... ccc ... aaa ... ... fcc ... ccc ... aab ... etc note: "..." denotes that field contents do not affect sort order. In narrative form: Multiple Field Sorts sort on the first field, then on the second field if the first field has two or more records with identical contents, then on the third field if the second field has two or more records with identical contents, then on the fourth...etc. An inefficient implementation of this has been: cat file | sort on field3 | sort on field2 | sort on field1 > sorted BUT, this only for sorts that are "bubble" and/or "shell". Using Sort(1) "scrambles" the previous sorts on each pass, leaving me with no easy way to use sort(1) to do multiple field sorts. Any hints? -- Terry Sterkel {clyde|harvard|cbosgd|allegra|ulysses|ihnp4}!whuts!tes [opinions are obviously only my own]
boykin@custom.UUCP (Joseph Boykin) (07/22/87)
In article <2459@whuts.UUCP>, tes@whuts.UUCP (STERKEL) writes: > <*> > I am not certain if this is a neophyte question (comp.unix.question) > or a request for software (comp.sources.wanted). If you respond with > sources, please, please, please make certain that it is fully > compatible with vanilla SysV. Thanks. > <*> > Now for the problem: > I need a multiple field sort that maintains sub-field order. Let > me draw a picture: ..... > In narrative form: Multiple Field Sorts sort on the first field, > then on the second field if the first field has two or more records > with identical contents, then on the third field if the second field > has two or more records with identical contents, then on the > fourth...etc. UNIX Sort (as well as our own PC/SORT) allows you to do exactly what you want. If you specify more than one sort key on the command line the file is sorted by the first key, if there is more than one line with the same information, the second key is used. There are a maximum of ten sort keys which may be specified. If all keys are used and they call compare equally, the entire line is used for a final check. I believe this is exactly what you are looking for. The format of doing this is as follows sort +pos1 [-pos2 [+pos3] [-pos4]]] file... +pos1 and -pos2 specify the first key with the field starting at position 1 and ending right before position 2. If the ending position is not specified, the end of the physical line is used. There are a bunch of options which help delimit what a field looks like, sorting order, etc. Check sort(1) for details. Joe Boykin Custom Software Systems ...necntc!custom!boykin -- Joe Boykin Custom Software Systems ...necntc!custom!boykin
avr@hou2d.UUCP (Adam V. Reed) (07/22/87)
In article <2459@whuts.UUCP>, tes@whuts.UUCP (STERKEL) writes: > I need a multiple field sort that maintains sub-field order. > Using Sort(1) "scrambles" the previous sorts on each pass, > leaving me with no easy way to use sort(1) to do multiple > field sorts. > > Any hints? Yes. Use cut(1) to merge the relevant fields in the proper order into a single field, paste(1) the result in front of the original, sort(1) the result on the merged field, and then cut(1) the merged field away. This is a classic UNIX one-liner pipeline problem. Adam Reed
root@hobbes.UUCP (John Plocher) (07/22/87)
+---- (STERKEL) writes the following in article <2459@whuts.UUCP> ---- | I need a multiple field sort that maintains sub-field order. Let | | An inefficient implementation of this has been: | cat file | sort on field3 | sort on field2 | sort on field1 > sorted | | BUT, this only for sorts that are "bubble" and/or "shell". | Using Sort(1) "scrambles" the previous sorts on each pass, | leaving me with no easy way to use sort(1) to do multiple | field sorts. +---- I use the following command to sort a "rolodex"(tm) type file into post office acceptable order (by zip then by route then by name): (The file is of the form "name","addr",...,"route","zip","phone"; thus the need for 8.1 and 5.1 to skip the quotes for sorting numeric fields) sort -t, -y +8.1 -8.7 -n +5.1 -6.0 -d +0 -1 -o names.4 names.4 ie: field 8 field 5 field 0 (zip) (Rural Route/Box) (name) Sort(1) does the stable sorts for you. This is using Sys5r2. - John ps. Yes I do backup the database before doing the in-place sort! -- John Plocher uwvax!geowhiz!uwspan!plocher plocher%uwspan.UUCP@uwvax.CS.WISC.EDU
chris@mimsy.UUCP (Chris Torek) (07/23/87)
In article <758@custom.UUCP> boykin@custom.UUCP (Joseph Boykin) writes: >UNIX Sort (as well as our own PC/SORT) allows you to do exactly what you >want. ... unless you need a numeric sort on one field but an alphabetical sorts on another, or if any of the other global sort flags conflict for two fields. It should not be too hard to write a stable sort for such applications, especially if you let /usr/bin/sort do the hard work. (Note the merge option!) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) Domain: chris@mimsy.umd.edu Path: seismo!mimsy!chris
roy@phri.UUCP (Roy Smith) (07/23/87)
In article <2459@whuts.UUCP> tes@whuts.UUCP (STERKEL) writes: > I need a multiple field sort that maintains sub-field order. > [...] > An inefficient implementation of this has been: > cat file | sort on field3 | sort on field2 | sort on field1 > sorted > > BUT, this only for sorts that are "bubble" and/or "shell". Using Sort(1) > "scrambles" the previous sorts on each pass, leaving me with no easy way > to use sort(1) to do multiple field sorts. What you want to do is "sort +2 +1 +0 file > sorted". The +N arguments mean to skip N fields. This is a bit counter-intuitive, but it boils down to "+2" meaning skip 2 fields, i.e. use the third field. The following "+1" and +0" arguments mean use the second and first field as secondary and tertiary sort keys; exactly what you want (if I understood your question right). The sort(1) man page says that if all keys compare equal, it sorts on the whole input line. This is why it "scrambles" your file when you do a multi-pass sort. It would be nice if sort had a "stable" option (i.e. make two lines which compare equal on all specified keys stay in the original input order; exactly what you need to do the multi-pass sort you describe) but as far as I know, it doesn't. Not being a real sorting guru, I don't know what impact that option would have on performance. But I can tell you that the "sort +2 +1 +0" is a lot faster than "sort +2 | sort +1 | sort +0", unless you happen to be on a Sequent, and maybe not even then. You wanted a SysV answer. I'll admit that this is a 4.N answer, but since I don't think sort has changed much (on the outside) since v6, I'm pretty confident that it works on SysV too. The Unix sort(1) utility is a pretty amazingly powerful tool, especially with the "+N.M" stuff to skip fields and characters within fields. Unfortunately, the sort man page is about as obtruse as man pages get. I've been reading that man page on and off for about 10 years and I'm still not sure I understand all the details. -- Roy Smith, {allegra,cmcl2,philabs}!phri!roy System Administrator, Public Health Research Institute 455 First Avenue, New York, NY 10016
gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/23/87)
In article <2459@whuts.UUCP> tes@whuts.UUCP (STERKEL) writes: > I need a multiple field sort that maintains sub-field order. The standard UNIX "sort" utility can be asked to compare multiple keys, which will probably do what you want. If the original record order must be retained even when ALL keys match, then add an additional key field consisting of sequence numbers, sort on the desired multiple keys followed by the sequence number key, and finally strip off the sequence numbers. Sorts that inherently preserve original record order when keys (all) match are called "stable". As you discovered, the n-way merge sorting used by the UNIX "sort" utility is not stable. My favorite in-core stable sorting technique is "list merge" as described in Knuth Vol. 3, which is a good place to start if you need to learn more about sorting techniques.
gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/23/87)
In article <7651@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: -In article <758@custom.UUCP> boykin@custom.UUCP (Joseph Boykin) writes: ->UNIX Sort (as well as our own PC/SORT) allows you to do exactly what you ->want. - ... unless you need a numeric sort on one field but an alphabetical -sorts on another, or if any of the other global sort flags conflict -for two fields. I don't see what the problem is. The bdinr flags are specifiable per-field as well as globally.
kimcm@olamb.UUCP (Kim Chr. Madsen) (07/23/87)
In article <2459@whuts.UUCP>, tes@whuts.UUCP (STERKEL) writes: > > An inefficient implementation of this has been: > cat file | sort on field3 | sort on field2 | sort on field1 > sorted > > BUT, this only for sorts that are "bubble" and/or "shell". > Using Sort(1) "scrambles" the previous sorts on each pass, > leaving me with no easy way to use sort(1) to do multiple > field sorts. You can do it in one pass through standard sort(1), by specifying the fields in order. sort +1 -2 +3 -4 +2 -3 datafile should sort "datafile" by primary key field 2 secondary key field 4 tertiary key field 3 and ignoring cointents of field 1. Kim Chr. Madsen.
kdw1@sphinx.uchicago.edu (Keith Waclena) (07/23/87)
In article <7651@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: !In article <758@custom.UUCP> boykin@custom.UUCP (Joseph Boykin) writes: !!UNIX Sort (as well as our own PC/SORT) allows you to do exactly what you !!want. ! ! ... unless you need a numeric sort on one field but an alphabetical !sorts on another, or if any of the other global sort flags conflict !for two fields. Hmm. My sort(1) man page says that you can apply the -n option (and -dbfir) to individual sort fields, as in: $ sort +2n -3 +5df -6 +7n file I wonder: is this System V specific (we run SysVr2.1)? Do you BSD users have these options? Keith -- Keith Waclena University of Chicago Graduate Library School 1100 E. 57th Street, Chicago, Illinois 60637 ...ihnp4!gargoyle!sphinx!kdw1 #include <disclaimer.h> kdw1@sphinx.UChicago.{EDU,BITNET,MAILNET,CSNET}
root@hobbes.UUCP (John Plocher) (07/23/87)
+---- Chris Torek writes the following in article <7651@mimsy.UUCP> ---- | >UNIX Sort allows you to do exactly what you want. | ... unless you need a numeric sort on one field but an alphabetical | sorts on another, or if any of the other global sort flags conflict | for two fields. +---- Oops! You must have missed the fine print buried in the middle of the man page :-) The manual page for Sys5r2 sort(1) states: (after giving the options for dictionary order, upper/lower case folding, ASCII only, "month", and numeric compares) : "When ordering options appear before restricted sort key specifications, the requested ordering rules are applied globally to all sort keys. When attached to a specific sort key (described below), the specified ordering options override all global ordering options for that key. "The notation +pos1 -pos2 restricts a sort key to one beginning at pos1 and ending at pos2. ... In the 7th edition manual which "reflects the state of the Berkeley system, December 1979" it states: "The notation +pos1 -pos2 restricts a sort key to one beginning at pos1 and ending at pos2. Pos1 and pos2 each have the form m.n, optionally followed by one or more of the flags *bdfinr*, where m tells a number of fields to skip from the beginning of the line and n tells a number of chars to skip further. If any flags are present they override all global ordering options for this key. ... I hope that the behavior of sort(1) has not changed in 4.3BSD! - John -- John Plocher uwvax!geowhiz!uwspan!plocher plocher%uwspan.UUCP@uwvax.CS.WISC.EDU
chris@mimsy.UUCP (Chris Torek) (07/23/87)
>In article <7651@mimsy.UUCP> chris@mimsy.UUCP I wrote: >... unless ... global sort flags conflict for two fields. In article <6153@brl-smoke.ARPA> gwyn@brl-smoke.ARPA (Doug Gwyn) replies: >The bdinr flags are specifiable per-field as well as globally. Yes, one can indeed specify flags for individual fields. I missed that when I scanned the manual; I saw `global sort flags' and alarm bells went off. I do think the manual entry could be a bit clearer about this (although the 4.3BSD entry does use an example with `+2n'). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) Domain: chris@mimsy.umd.edu Path: seismo!mimsy!chris
guy%gorodish@Sun.COM (Guy Harris) (07/23/87)
> Hmm. My sort(1) man page says that you can apply the -n option (and > -dbfir) to individual sort fields, as in: > > $ sort +2n -3 +5df -6 +7n file > > I wonder: is this System V specific (we run SysVr2.1)? Do you BSD users > have these options? *V7* users had those options, which is why both BSD and S5 users have them. From the 4.2BSD manual page: The notation "+'pos1' -'pos2'" restricts a sort key to a field beginning at "pos1" and ending just before "pos2". "Pos1" and "pos2" each have the form "'m'.'n'", optionally followed by one or more of the flags "bdfinr", where "m" tells a number of fields to skip from the beginning of the line and "n" tells a number of characters to skip further. *If any flags are present they override all the global ordering options for this key.* If the "b" option is in effect "n" is counted from the first nonblank in the field; "b" is attached independently to "pos2". Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guyoups:uptions,h
boykin@custom.UUCP (Joseph Boykin) (07/24/87)
In article <7651@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes: > In article <758@custom.UUCP> boykin@custom.UUCP (Joseph Boykin) writes: > >UNIX Sort (as well as our own PC/SORT) allows you to do exactly what you > >want. > > ... unless you need a numeric sort on one field but an alphabetical > sorts on another, or if any of the other global sort flags conflict > for two fields. It should not be too hard to write a stable sort > for such applications, especially if you let /usr/bin/sort do the > hard work. (Note the merge option!) > -- > In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) > Domain: chris@mimsy.umd.edu Path: seismo!mimsy!chris As I said in my original posting, UNIX Sort will do exactly what he wants, this includes the possibility of doing a numeric sort on one field and alphabetical on another. Sort has global options which effect how the sort is done, i.e. do a numeric sort on all the specified fields. Each field may specify it's own sub-options, each of which override the global options. Specifically a sort key has the form m.no where 'm' is the 'm'th field (0 relative), 'n' is the number of characters from the start of that field and 'o' is zero or more of the flags b d f i n or r. If any of these are present they override the global ordering options (but not the non-ordering options). Joe Boykin Custom Software Systems ...necntc!custom!boykin
m5@bobkat.UUCP (Mike McNally ) (07/24/87)
In article <2459@whuts.UUCP> tes@whuts.UUCP (STERKEL) writes: ><*> > I need a multiple field sort that maintains sub-field order. Let > me draw a picture: Try the sort(1) program. It can be given more than one field on which to sort. -- Mike McNally, mercifully employed at Digital Lynx --- Where Plano Road the Mighty Flood of Forest Lane doth meet, And Garland fair, whose perfumed air flows soft about my feet... uucp: {texsun,killer,infotel}!pollux!bobkat!m5 (214) 238-7474
allbery@ncoast.UUCP (Brandon Allbery) (07/29/87)
As quoted from <6153@brl-smoke.ARPA> by gwyn@brl-smoke.ARPA (Doug Gwyn ): +--------------- | I don't see what the problem is. The bdinr flags are specifiable | per-field as well as globally. +--------------- Yes, sort can do it all. ...but its man page does nothing. Will someone who knows all of what sort can do *please* write and post a PD man page for it? Something more readable than the trash that AT&T wrote? -- Brandon S. Allbery, moderator of comp.sources.misc and comp.binaries.ibm.pc {{harvard,mit-eddie}!necntc,well!hoptoad,sun!cwruecmp!hal}!ncoast!allbery ARPA: necntc!ncoast!allbery@harvard.harvard.edu Fido: 157/502 MCI: BALLBERY <<ncoast Public Access UNIX: +1 216 781 6201 24hrs. 300/1200/2400 baud>>
gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/31/87)
In article <3679@ncoast.UUCP> allbery@ncoast.UUCP (Brandon Allbery) writes: >Will someone who >knows all of what sort can do *please* write and post a PD man page for it? >Something more readable than the trash that AT&T wrote? Actually, AT&T rewrote the "sort" manual entry for SVR2. Unfortunately, in the process of clarifying things, they broke some of the specs. The SVR3 manual corrected this. It now seems no worse than most manual entries; i.e. the specification is unambiguous (but is not a tutorial).