tcianflo@nugipsy.UUCP (05/13/87)
Take a typical name and address file with many different names, but some common names, such as: Name1, J Smith, A | Smith, C | - family #1 Smith, E | Name2, R Smith, B | Smith, D | - family #2 Smith, F | Name3, P Using the sort utility, and sorting alphabetically by last name, you get family #1 and family #2 interleaved, of course, by virtue of their same last names and first initials. How would I set up the file so that I could get the sort utility to: 1) keep family members together without mixing up names with other families of the same last name 2) force the listing of a family group to be in a specified order, for example, head-of-house followed by children. Is there some scheme using additional fields in the file, or perhaps some other utility I should be using? Thanks in advance for any help you can send my way. -- => Regards, Tom Cianflone @ Gould Computer Systems Division <= => ...!{seismo,sun,pur-ee,brl-smoke}!gould!tcianflone <= => ...!ihnp4!{codas,allegra}!novavax!gould!tcianflone <= => NOTE: Disregard header info. Email to above paths only. <=
boykin@custom.UUCP (Joseph Boykin) (05/15/87)
In article <303@nugipsy.UUCP>, tcianflo@nugipsy.UUCP (Tom Cianflone) writes: > Using the sort utility, and sorting alphabetically by last name, > you get family #1 and family #2 interleaved, of course, by virtue > of their same last names and first initials. > How would I set up the file so that I could get the sort > utility to: > 1) keep family members together without mixing up names > with other families of the same last name > 2) force the listing of a family group to be in a specified > order, for example, head-of-house followed by children. You would, obviously, have to place some other information in the file. However, sort does allow you to specify multiple sort fields. It will sort using the first key, then sort lines which are common by the subsequent sort keys. That is, you could have a file which has: Last First X where 'X' is A for head of household, B for spouse, C for kids, (D for pets!), etc. To sort by family, and have it sorted by "rank" within family, issue the following command: sort +0 -1 +2 -3 filename This can be expanded to provide the information you were looking for such as keeping people within the same family together, etc. Joe Boykin Custom Software Systems ...{necntc, frog}!custom!boykin
maciolek@gecrd1.UUCP (05/15/87)
In article <651@custom.UUCP> boykin@custom.UUCP (Joseph Boykin) writes: >In article <303@nugipsy.UUCP>, tcianflo@nugipsy.UUCP (Tom Cianflone) writes: >> How would I set up the file so that I could get the sort >> utility to: >> 1) keep family members together without mixing up names >> with other families of the same last name >> 2) force the listing of a family group to be in a specified >> order, for example, head-of-house followed by children. > >You would, obviously, have to place some other information in the >file. However, sort does allow you to specify multiple sort fields. >It will sort using the first key, then sort lines which >are common by the subsequent sort keys. That is, you could have a >file which has: > Last First X > >where 'X' is A for head of household, B for spouse, C for kids, >(D for pets!), etc. To sort by family, and have it sorted by >"rank" within family, issue the following command: > > sort +0 -1 +2 -3 filename > >This can be expanded to provide the information you were looking >for such as keeping people within the same family together, etc. Well, this won't work either. You'll wind up with something like Smith Alice A Smith Bob A Smith Carol A Smith Doris B Smith Ted C Smith Bill C and won't know whether Ted Smith is a child of Alice, Bob or Carol. To make sure that families get grouped according to head-of-household, you will have to be a little fancier. Here are a couple of ideas... First - all family names are prefaced by the head-of-household for that family. This is quite redundant and space-consuming, but it gets the job done...unless two families have heads-of-household with the same name. Thus, a file could be entered : Smith Alice Smith Alice C Smith Bill Smith Bob Smith Bob B Smith Doris Smith Bob C Smith Ted Smith Carol As I said, this makes for a lot of redundancy, especially in families with many spouses :-) or children, since all family-members who ar not heads-of-household are prefixed by the name of their head-of-household. Another solution that seems more feasible would be to group all family members onto the same LINE as the head-of-household. This is significant, of course, because sort(1) has this notion of "lines" which are delimited by newline characters. Everything between a pair of newlines is treated as a unit. SO...all you have to do is write a little filter which reads a file like this: Smith Bob A Smith Doris B Smith Ted C Smith Carol A Smith Alice A Smith Bill C and produces a file like this: Smith Bob A ^MSmith Doris B ^MSmith Ted C Smith Carol A Smith Alice A ^MSmith Bill C by joining all the lines which follow an 'A' (head-of-household) line UNTIL another head-of-household is encountered. I use control-M as a separator here, though any character which would not appear in the text would be fine. Now, when you run this file through sort(1), the output will be sorted by head-of-household, with identically-named heads of household sorted next by spouse, then by child. Take the output and run it through an inverse of your original filter which converts ^M's back to newlines: filter1 <infile | sort | filter2 >outfile An advantage is that you don't have to look up the syntax for specifying the key field positions to sort(1). The caveat here is that in the initial file, the head of household always has to precede spouse and child entries, AND the spouse and child entries will not be sorted under head-of-household. Having now beaten this subject to death, I hope I don't see umpty-zillion followups by nit-pickers. And thank you for your support. -- Mike Maciolek seismo!rochester!pt.cs.cmu.edu!cadre!pitt!gecrd1!maciolek -consulting for- General Electric "Epoxy can be cured."
dan@hrc.UUCP (Dan Troxel) (05/04/89)
How can I use the 'sort' command, to sort by two fields? Example: _4 test _E _1 test _E _2 test _E should be _2 test _E _3 test _F _4 test _E _1 test _E _3 test _F ^ ^ ^ ^ sorted by -- Dan Troxel @ Handwriting Research Corporation WK 1-602-957-8870 Camelback Corporate Center 2821 E. Camelback Road Suite 600 Phoenix, AZ 85016 ncar!noao!asuvax!hrc!dan hrc!dan@asuvax.asu.edu
mchinni@pica.army.mil (Michael J. Chinni, SMCAR-CCS-E) (05/04/89)
> From: Dan Troxel <dan@hrc.uucp> > Subject: sort question > Date: 3 May 89 20:15:01 GMT > To: info-unix@sem.brl.mil > > How can I use the 'sort' command, to sort by two fields? > Example: > > _4 test _E _1 test _E > _2 test _E should be _2 test _E > _3 test _F _4 test _E > _1 test _E _3 test _F > Try: sort +10 -11 +2 -3 filename This tells sort to sort first on a field starting in col. 10 and ending just before col. 11, and secondly a field starting in col. 2 and ending just before col. 3. /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/ Michael J. Chinni Chief Scientist, Simulation Techniques and Workplace Automation Team US Army Armament Research, Development, and Engineering Center User to skeleton sitting at cobweb () Picatinny Arsenal, New Jersey and dust covered terminal and desk () ARPA: mchinni@pica.army.mil "System been down long?" () UUCP: ...!uunet!pica.army.mil!mchinni /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
davidsen@sungod.steinmetz (William Davidsen) (05/04/89)
In article <199448@hrc.UUCP> dan@hrc.UUCP (Dan Troxel) writes: | How can I use the 'sort' command, to sort by two fields? | Example: | | _4 test _E _1 test _E | _2 test _E should be _2 test _E | _3 test _F _4 test _E | _1 test _E _3 test _F Try "sort +2" Look at the +n.m and -n.m stuff. No you *don't* need "+2 +0", read the man page... bill davidsen (davidsen@crdos1.crd.GE.COM) {uunet | philabs}!crdgw1!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
jckfield@ihlpb.ATT.COM (Kelvin Fielding) (05/05/89)
In article <199448@hrc.UUCP>, dan@hrc.UUCP (Dan Troxel) writes: > How can I use the 'sort' command, to sort by two fields? > Example: > > _4 test _E _1 test _E > _2 test _E should be _2 test _E > _3 test _F _4 test _E > _1 test _E _3 test _F > ^ ^ > ^ ^ > sorted > by > -- sort +0 -0 +2 <filename> will do the trick. Explanation: sort on field 1 +0 and ending on field 1 -0 then sort on field 1 +2. _ , _ ' ) / // /-< _ // , __o ____ / ) </_</__\/ <__/ / < o
decot@hpisod2.HP.COM (Dave Decot) (05/05/89)
> How can I use the 'sort' command, to sort by two fields? > Example: > > _4 test _E _1 test _E > _2 test _E should be _2 test _E > _3 test _F _4 test _E > _1 test _E _3 test _F > ^ ^ > ^ ^ > sorted > by sort +0 -1 +2
guy@auspex.auspex.com (Guy Harris) (05/06/89)
>Try: sort +10 -11 +2 -3 filename > >This tells sort to sort first on a field starting in col. 10 and ending just >before col. 11, No, it doesn't, at least not if you're talking about the standard UNIX versions of "sort". It tells "sort" to sort first on the 10th *field* (that is, the 11th field on the line - after all, it's written in C :-)). Fields are normally separated by white space; with the "-t" flag, they're separated by tab columns.
gph@hpsemc.HP.COM (Paul Houtz) (05/09/89)
guy@auspex.auspex.com (Guy Harris) writes: >Try: sort +10 -11 +2 -3 filename > >This tells sort to sort first on a field starting in col. 10 and ending just >before col. 11, No, it doesn't, at least not if you're talking about the standard UNIX versions of "sort". It tells "sort" to sort first on the 10th *field* (that is, the 11th field on the line - after all, it's written in C :-)). Fields are normally separated by white space; with the "-t" flag, they're separated by tab columns. ---------- Right. There is no way to do a true column sort using this utility as you can on IBM or MPE systems and here is why: Sort requires a FIELD DELIMITER character. That means that there is SOME character that will never be sorted. If you need to sort data from a foreign file about which you have little information except column numbers to sort on, you are (as far as I know) out of luck with 'sort'. If, however, you do know of some character which will NEVER appear as data in a file, then you can do a column sort. For instance, say you know that the charactyer '^' never appears in your data file. Then you can simply say: sort -t^ -0.0 + 0.3 This command assumes that there is only ONE field in the file, and that is the entire record. It then sorts the 0+0th column of the 0+0th field thru the 0+4th column of the 0+0th field, actually accomplishing a sort of column 1 thru 4. Here is the worst one I have seen on Unix. I converted this myself from a sort done on an IBM System/34. This is a good example of a COMMON type of sort done in the commercial world which you never see on Unix: sort -dt'\012' +0.6 -0.8 +0.13 -0.15 +0.15r -0.17r +0.8 -0.13 DISK-SUMARY >SUM1 This guy sorts the summary file using the newline character as a field delimiter (i.e., no fields), and you can tell what column ranges are being sorted by subtracting 1 from the 'x' field of the 0.x parms. It sorts the 5 thru 7 columns in ascending order, the 12 thru 14 columns in ascending order, the 14 thru 16 columns in DESCENDING order, (the "r" after the column) then the 7 thru 12 columns in ascending order. If anyone ever tries to tell you that UNIX is user friendly, you can now barf on them. Paul Houtz HP Technology Access Center 10670 N. Tantau Avenue Cupertino, Ca 95014 (408) 725-3864 hplabs!hpda!hpsemc!gph gph%hpsemc@hplabs.HP.COM
chris@mimsy.UUCP (Chris Torek) (05/10/89)
In article <810050@hpsemc.HP.COM> gph@hpsemc.HP.COM (Paul Houtz) writes: >Right. There is no way to do a true column sort using this utility as you >can on IBM or MPE systems and here is why: Sort requires a FIELD DELIMITER >character. That means that there is SOME character that will never be >sorted. But (as you yourself point out) you can set the field delimiter to newline, effectively making it vanish, then use the 0.n format to specify column n. >Here is the worst one I have seen on Unix. I converted this myself from a >sort done on an IBM System/34. This is a good example of a COMMON type of >sort done in the commercial world which you never see on Unix: > >sort -dt'\012' +0.6 -0.8 +0.13 -0.15 +0.15r -0.17r +0.8 -0.13 ... > >This guy sorts the summary file using the newline character as a field >delimiter (i.e., no fields), and you can tell what column ranges are >being sorted by subtracting 1 from the 'x' field of the 0.x parms. Or simply think of columns as numbered from 0 (if you count from 0 to 1023 on your fingers, as I do :-) ). >If anyone ever tries to tell you that UNIX is user friendly, you can now >barf on them. Why? (Actually, Unix is not *meant* to be `user friendly'---if that means `taking occasional users by the hand and leading them from each little stepping stone on to the next'. It is meant to get the job done simply, tersely, without back-talk. If you use a system every day, you can get tired of wading through six levels of menus. And if Unix looks a little old and creaky in the user-interface area, well, it *was* designed around printing terminals and dumb CRTs. But then again, that is all I have at home. [An H19 with the Heath ROMs can hardly be called clever :-) .]) If this sort of thing is not done often in Unix, why not? Perhaps because it is a bad idea not to delimit fields. (I believe that any fixed limit---this includes fixed field widths---is always too small. I sometimes wonder what IBMers do about people with last names like `de Martinesquez y de la Capillostraglio'; I *know* what they do with people who, like me, put down an initial for a first name and a name for a middle initial.) But if, like Houtz, you are forced to make the best of a bad design, and you dislike all the `+0.x -0.y', instead of sulking, you *could* write a small shell script to convert whatever column format you prefer into what sort requires. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
gph@hpsemc.HP.COM (Paul Houtz) (05/17/89)
chris@mimsy.UUCP (Chris Torek) writes: >In article <810050@hpsemc.HP.COM> gph@hpsemc.HP.COM (Paul Houtz) writes: >>Right. There is no way to do a true column sort using this utility as you >>can on IBM or MPE systems and here is why: Sort requires a FIELD DELIMITER >>character. That means that there is SOME character that will never be >>sorted. > >But (as you yourself point out) you can set the field delimiter to >newline, effectively making it vanish, then use the 0.n format to >specify column n. Wouldn't it be nice if the whole world spoke Unix. It would make any minor user Unfriendliness seem like such a minor issue. Oh well. Unfortunately there will probably always be systems out there that need to talk to non- unix systems. The problem with having to set the field delimiter is that you have to decide what to set it TO. Now, if you are reading from a file that has binary data in it, then it is possible that a newline character could appear in the binary data. This seems to me like it might be a problem. Sort would think it found the end of line. I can write a sort program that will do this column sorting for me, but what a pain. It's too bad there isn't one for unix, like there is for all the other major operating systems. (On the other hand, I'll be that some third party out there has already written a true column sort for unix. I just haven't found it yet. Any takers?)
gph@hpsemc.HP.COM (Paul Houtz) (05/17/89)
I wrote: > Here is the worst one I have seen on Unix. I converted this myself from a >sort done on an IBM System/34. This is a good example of a COMMON type of >sort done in the commercial world which you never see on Unix: > >sort -dt'\012' +0.6 -0.8 +0.13 -0.15 +0.15r -0.17r +0.8 -0.13 DISK-SUMARY >SUM1 > >This guy sorts the summary file using the newline character as a field >delimiter (i.e., no fields), and you can tell what column ranges are >being sorted by subtracting 1 from the 'x' field of the 0.x parms. > > It sorts the 5 thru 7 columns in ascending order, > the 12 thru 14 columns in ascending order, > the 14 thru 16 columns in DESCENDING order, (the "r" after the column) > then the 7 thru 12 columns in ascending order. Dr. T. Andrews, Systems, CompuData, Inc. DeLand, writes: Ah, yes, that's an ugly command. Now, what is the command to run the general "sort" program on the "friendly" op sys where you would prefer, performing the same sort? Okay, in MPE, you do this: sort input DISK-SUMARY output SUM1 key 5,7;12,14;14,16,DESC;7,12 end The "key" parm says to sort column 5 thru 7 ascending, 12 thru 14 ascending 14 thru 16 descending, 7 thru 12 ascending. That seems much clearer to me.
chris@mimsy.UUCP (Chris Torek) (05/17/89)
In article <810054@hpsemc.HP.COM> gph@hpsemc.HP.COM (Paul Houtz) writes: >... if you are reading from a file that has binary data in it, then it >is possible that a newline character could appear in the binary data. >This seems to me like it might be a problem. Sort would think it >found the end of line. If your file is of binary data, you have more of a problem than that. Sort(1) sorts ASCII text files, not binary files. (Numeric sorts are done by conversion to and from numeric values.) (Somehow this argument seems rather like saying that quicksort is bad because if you sort nearly-sorted lists, it runs $O(n^2)$. Indeed it does, but that just means you use a different algorithm [Shell sorts work well; or if it is almost completely sorted, a bubble sort may outperform anything else!].) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
gph@hpsemc.HP.COM (Paul Houtz) (05/17/89)
chris@mimsy.UUCP (Chris Torek) writes: >In article <810054@hpsemc.HP.COM> gph@hpsemc.HP.COM (Paul Houtz) writes: >>... if you are reading from a file that has binary data in it, then it >>is possible that a newline character could appear in the binary data. >>This seems to me like it might be a problem. Sort would think it >>found the end of line. > >If your file is of binary data, you have more of a problem than that. >Sort(1) sorts ASCII text files, not binary files. (Numeric sorts are >done by conversion to and from numeric values.) What unix sort do you use if you have a data file with mixed binary and ascii fields? IBM has a number of sort utilities that will do this. The MPE and MPE XL sort utility handles this case fine. The VMS sort utility handles this too. Unless there is a sort utility on Unix that I haven't heard of, I don't think unix does this. (Please don't tell me that it isn't a good idea to mix ascii and binary data in the same file).
morrell@hpsal2.HP.COM (Michael Morrell) (05/18/89)
/ hpsal2:comp.unix.questions / gph@hpsemc.HP.COM (Paul Houtz) / 10:04 am May 16, 1989 / The problem with having to set the field delimiter is that you have to decide what to set it TO. Now, if you are reading from a file that has binary data in it, then it is possible that a newline character could appear in the binary data. This seems to me like it might be a problem. Sort would think it found the end of line. ---------- Maybe I'm confused, but how do you do a column sort on binary data? Aren't columns numbers defined within each line, implying some "newline" char? Michael
gwyn@smoke.BRL.MIL (Doug Gwyn) (05/18/89)
In article <810056@hpsemc.HP.COM> gph@hpsemc.HP.COM (Paul Houtz) writes: >What unix sort do you use if you have a data file with mixed binary >and ascii fields? Your first problem is to define exactly what constitutes a "record" for such a case. The other systems you mention support (require?) file attributes such as record size; on UNIX files are just structureless byte arrays. The new-line terminator convention allows text files to be dealt with on a line-oriented basis, but there is no widespread UNIX convention for binary file structures.
chris@mimsy.UUCP (Chris Torek) (05/18/89)
In article <810056@hpsemc.HP.COM> gph@hpsemc.HP.COM (Paul Houtz) writes: > What unix sort do you use if you have a data file with mixed binary >and ascii fields? ... (Please don't tell me that it isn't >a good idea to mix ascii and binary data in the same file). Oh well, answer number zero down the tubes. :-) Anyway, I would write a little filter to de-binarify the file (and if necessary, another to re-binate it, or perhaps the same one to do both). That seems easier than writing a special sorter for it. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
andrew@root.co.uk (Andrew Dingwall) (05/23/89)
In article <810054@hpsemc.HP.COM> gph@hpsemc.HP.COM (Paul Houtz) writes: >>In article <810050@hpsemc.HP.COM> gph@hpsemc.HP.COM (Paul Houtz) writes: >>>Right. There is no way to do a true column sort using this utility as you >>>can on IBM or MPE systems and here is why: Sort requires a FIELD DELIMITER >>>character. That means that there is SOME character that will never be >>>sorted. >> >>But (as you yourself point out) you can set the field delimiter to >>newline, effectively making it vanish, then use the 0.n format to >>specify column n. > No, in a previous job, I made extensive use of sort +0.x -0.y ... to do column sorts on newline-delimited records without needing to specify a field delimiter. Admittedly, that was on unix V7 (and a long time ago!), but I have tried the same on a System V system and it still seems to work. The only thing that I found necessary to make the scheme work is the newline at the end of the record and no nulls or non-ascii characters in the record body. > The problem with having to set the field delimiter is that you have to >decide what to set it TO. Now, if you are reading from a file that has >binary data in it, then it is possible that a newline character could appear >in the binary data. This seems to me like it might be a problem. Sort >would think it found the end of line. > > I can write a sort program that will do this column sorting for me, but >what a pain. It's too bad there isn't one for unix, like there is for >all the other major operating systems. > > (On the other hand, I'll be that some third party out there has already >written a true column sort for unix. I just haven't found it yet. Any >takers?) Yes, we have written a binary sort called binsort. It works like the unix sort except that it works on fixed-length binary records and sorts by column position. It understands all the usual unix data types (char, int, float, double etc), together with data types more usually found in the commercial world (cobol COMP (natural byte order signed binary) and COMP-3 (bcd)). The command-line interface is similar to the unix sort (+m.n -m.n etc) and appropriate options are supported (-d -f -u -c -r -m -o -T). I'm not sure under what circumstances it might be made available but, as UniSoft are a commercial organisation, it would probably cost money! Andrew Dingwall andrew@root.co.uk