d3e608%pnli.pnl.gov@pnlg.pnl.gov (02/15/91)
I'm trying to write a csh script to extract a random multiline record from a file. Awk does a nice job of getting the records. Now how do I choose one randomly? Thanks, Jeff Nye, K6-10 email: d3e608%pnli.pnl.gov@pnlg.pnl.gov Battelle PNL voice: (509) 376-4571, days, PDT PO Box 999 Richland, WA 99352
jik@athena.mit.edu (Jonathan I. Kamens) (04/24/91)
In article <26019@adm.brl.mil>, d3e608%pnli.pnl.gov@pnlg.pnl.gov writes: |> I'm trying to write a csh script to extract a random multiline |> record from a file. Awk does a nice job of getting the records. Now how |> do I choose one randomly? The standard awk has no built-in way of getting a random number. However, the FSF version of awk, gawk, has a built-in rand() function that returns a random number between 0 and 1. You can multiply the return value by the number of records to pick one of them randomly. If installing and using gawk is not an option, then another possibility is to use the jot(1) program to generate a random number, if your system has jot. You would call "jot -r 1 max" where "max" is the number of records from which you are selecting. Either call this in your script and then pass the result into awk, or call it inside awk if you've got a version of awk that'll let you call a program and pipe the results into getline. If you don't have jot, you can get it from a number of different ftp sites. I found the source to it in /help/dist/jot.c on lilac.berkeley.edu, and the man page in /pub/Library/Computer/doc.4.3/ucs/jot.1 on ocf.berkeley.edu, using archie (if you don't know how to use archie, see the article mentioned in the instructions appended to the end of this message). Finally, if you've got access to a version of date(1) that will output the date as a number of seconds since the beginning of Unix time, then you can just take that number of seconds as the seed for a random number generator written inside awk. Your generator can be as symbol for taking the time modulus the number of records from which you're choosing, or as complex as one of the good random number generators mentioned in Knuth. There are probably other ways to get a random number, and other people will point them out :-). Of course, if you were using perl instead of awk, you wouldn't need to ask this question at all, since perl has a built-in function for getting random numbers, and will have that function on any machine that supports perl. -- Jonathan Kamens USnail: MIT Project Athena 11 Ashford Terrace jik@Athena.MIT.EDU Allston, MA 02134 Office: 617-253-8085 Home: 617-782-0710 -- Subject: Information about finding sources (READ THIS BEFORE POSTING) Newsgroups: comp.sources.wanted,alt.sources.wanted Available via anonymous ftp from pit-manager.mit.edu (18.72.1.58) in the file /pub/usenet/comp.sources.wanted/Information_about_finding_sources_(READ_THIS_BEFORE_POSTING) Available from mail-server@pit-manager.mit.edu by sending a message containing send usenet/comp.sources.wanted/Information_about_finding_sources_(READ_THIS_BEFORE_POSTING) Send a message containing "help" to get general information about the mail server.
jik@athena.mit.edu (Jonathan I. Kamens) (04/24/91)
In article <1991Apr24.041134.14519@athena.mit.edu>, I write: |> You would call "jot -r 1 max" where "max" is the number of records from which |> you are selecting. I should have said "jot -r 1 1 max", rather than "jot -r 1 max". The former tells jot to produce 1 number between 1 and max; the latter tells jot to produce 1 number between max and 100, which is the default maximum value. -- Jonathan Kamens USnail: MIT Project Athena 11 Ashford Terrace jik@Athena.MIT.EDU Allston, MA 02134 Office: 617-253-8085 Home: 617-782-0710
urban@cbnewsl.att.com (john.urban) (04/24/91)
> article <26019@adm.brl.mil>, d3e608%pnli.pnl.gov@pnlg.pnl.gov writes: > I'm trying to write a csh script to extract a random multiline > record from a file. Awk does a nice job of getting the records. Now how Several version of UNIX System V have nawk (new awk). Regular awk does not have the rand() function however nawk does. To see if you version of UNIX has nawk, type in: $ type nawk If you've got nawk, then you can do something like: nawk 'BEGIN {print rand()}' < /dev/null to get a random number on the screen UNIX System V/386 Release 3.2 (and higher) also have the random(1) command [Part of the XENIX System V and UNIX System V/386 merged product] So you could use: random(1) although it doesn't appear to be total random. I tried: $ i=100 $ while [ $i -gt 0 ] > do > random 10 > i=`expr $i - 1` > done | sort -n | uniq -c 25 0 10 4 38 5 3 6 24 10 Sincerely, John Urban
oz@yunexus.yorku.ca (Ozan Yigit) (04/24/91)
In article <1991Apr24.041134.14519@athena.mit.edu> jik@athena.mit.edu (Jonathan I. Kamens) writes: > The standard awk has no built-in way of getting a random number. You mean the "old" awk. The standard awk (sometimes known as nawk, and is a part of most up-to-date UN*X distributions these days) as documented in TheBook, has rand(). Accept no substitutes. ;-) > There are probably other ways to get a random number, and other people will >point them out :-). Sure, like one can program a good random number generator, even in old awk. Here is the integer minimal standard random number generator[1] in old awk. It has been tested for correctness. BEGIN { a = 16807 m = 2147483647 q = 127773 # m div a r = 2836 # m mod a seed = 123 # use awk -f ... seed=<whatever> } { test = a * (seed % q) - r * int(seed / q); if (test > 0) seed = test; else seed = test + m; print rand = seed } > Of course, if you were using perl instead of awk ... But, perl [or c, icon, python ...] is *not* awk. ;-) oz --- [1] Park, Stephen K. and Keith W. Miller, ``Random Number Generators: Good Ones are Hard to Find'', Communications of the ACM 31 (10), 1988 pp. 1192-1201 --- A pencil is so far the most effective software | internet: oz@nexus.yorku.ca design tool I have found... -- Amanda Walker | uucp: utai/utzoo!yunexus!oz
nelson@berlioz.nsc.com (Taed Nelson) (04/25/91)
> The standard awk has no built-in way of getting a random number. > > However, the FSF version of awk, gawk, has a built-in rand() function that The current version (1986) of AWK has rand() as well. On the SunOS, it has been renamed to be NAWK (New AWK).
guy@auspex.auspex.com (Guy Harris) (04/27/91)
>The current version (1986) of AWK has rand() as well. On the SunOS, it has >been renamed to be NAWK (New AWK). SunOS didn't rename it; that name came from System V Release 3.1, which offers the "old AWK" as "awk" and "oawk", and the "new AWK" as "nawk". The theory was, I guess, that people with scripts that depended on the "old AWK" should change them to use "oawk", so that in the future "awk" could become the "new AWK". However, according to the S5R4 documentation, "awk" is still the old AWK....
gwyn@smoke.brl.mil (Doug Gwyn) (04/29/91)
In article <7453@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >The theory was, I guess, that people with scripts that depended on the >"old AWK" should change them to use "oawk", so that in the future "awk" >could become the "new AWK". However, according to the S5R4 >documentation, "awk" is still the old AWK.... We did this sort of thing with C compilers, too. The idea is: PHASE 0 (existing practice): All applications use "awk". PHASE 1: New awk installed as "nawk", old awk remains as "awk", "oawk" linked to "awk". PHASE 2 (transition): Applications that require the new behavior invoke "nawk". Old applications are tested against "nawk" and if they really do require the old version, they are changed to invoke "oawk"; otherwise they continue to invoke "awk". PHASE 3: "awk" is changed to be a link to "nawk". PHASE 4 (catch-up): Applications that rely on "oawk"-specific behavior are updated to work with regular (new) "awk".
martin@mwtech.UUCP (Martin Weitzel) (05/09/91)
In article <1991Apr24.041134.14519@athena.mit.edu> jik@athena.mit.edu (Jonathan I. Kamens) writes: > > [...] another possibility is >to use the jot(1) program to generate a random number, if your system has jot. SysV User's should note that they have a similar program, called random. It's range of output values is restricted to 0..255 (the upper limit must be specified as a calling argument). If you need larger numbers, you can, of course, call random twice, multiply the result of the first call by 256, add the result of the second call, and take the whole thing modulo the largest number you want. -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83
suhre@shark.dsd.trw.com (Maurice E. Suhre) (05/14/91)
In article <1136@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes: +In article <1991Apr24.041134.14519@athena.mit.edu> jik@athena.mit.edu (Jonathan I. Kamens) writes: +> +> [...] another possibility is +>to use the jot(1) program to generate a random number, if your system has jot. + +SysV User's should note that they have a similar program, called random. +It's range of output values is restricted to 0..255 (the upper limit must +be specified as a calling argument). If you need larger numbers, you +can, of course, call random twice, multiply the result of the first +call by 256, add the result of the second call, and take the whole thing +modulo the largest number you want. +-- +Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83 Be advised that the operations mentioned (multiplication, modulo) will, in general, change the distribution of the random numbers. That is, if you started with uniformly distributed random numbers, then what you will come out with will not be uniformly distributed. -- Maurice Suhre suhre@trwrb.dsd.trw.com
guy@auspex.auspex.com (Guy Harris) (05/15/91)
>SysV User's should note that they have a similar program, called random. Well, *some* S5 users, anyway. As I remember, it was in "/usr/games" in S3 or in early S5; the games appear to have gone away in a later S5 release, alas. >It's range of output values is restricted to 0..255 (the upper limit must >be specified as a calling argument). If you're thinking of the same "random" I'm thinking of (I'm thinking of the one I picked up from some S5 release and put into SunOS), that's only the case if you use the "-e" flag to get it to return a random exit status. Its "normal" mode is to act as a filter, randomly choosing which lines of the standard input to write to the standard output....
gwyn@smoke.brl.mil (Doug Gwyn) (05/15/91)
In article <1991May13.221822.27731@gumby.dsd.TRW.COM> suhre@shark.dsd.trw.com.UUCP (Maurice E. Suhre) writes: -In article <1136@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes: -+If you need larger numbers, you can, of course, call random twice, multiply -+the result of the first call by 256, add the result of the second call, and -+take the whole thing modulo the largest number you want. -Be advised that the operations mentioned (multiplication, modulo) will, -in general, change the distribution of the random numbers. That is, if you -started with uniformly distributed random numbers, then what you will -come out with will not be uniformly distributed. Be advised that the procedure Martin suggested does preserve uniformity.
suhre@ondine.dsd.trw.com (Maurice E. Suhre) (05/16/91)
In article <16154@smoke.brl.mil> gwyn@smoke.brl.mil (Doug Gwyn) writes: +In article <1991May13.221822.27731@gumby.dsd.TRW.COM> suhre@shark.dsd.trw.com.UUCP (Maurice E. Suhre) writes: +-In article <1136@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes: +-+If you need larger numbers, you can, of course, call random twice, multiply +-+the result of the first call by 256, add the result of the second call, and +-+take the whole thing modulo the largest number you want. +-Be advised that the operations mentioned (multiplication, modulo) will, +-in general, change the distribution of the random numbers. That is, if you +-started with uniformly distributed random numbers, then what you will +-come out with will not be uniformly distributed. + +Be advised that the procedure Martin suggested does preserve uniformity. There seems to have been a shortage of correct analysis on my part, and perhaps elsewhere. Yes, the multiplication as described above does retain the property of uniformity of distribution (assuming that it was uniform going in). No, the modulo strategy does not preserve uniformity. An example for those who are getting confused or have misunderstood: Suppose that you have a uniform random number generator which generates the integers 0, 1, 2,...,9. Then we have p(0) = p(1) ... p(9) = 0.1 Now, suppose we take these numbers modulo 7. Then the numbers 7, 8, and 9 are mapped into 0, 1, and 2 respectively. The new probabilities become p(0) = p(1) = p(2) = 0.2, and p(3) ... p(6) = 0.1 as before. Any modulus operation other than by 2 will result in a non-uniform distribution. The solution is to scale all numbers by the same factor, either compressing or expanding the range. If integers are needed, then you could toss out any random numbers which are outside the range of interest. -- Maurice Suhre suhre@trwrb.dsd.trw.com
dfenyes@thesis1.med.uth.tmc.edu (David Fenyes) (05/23/91)
In article <1991May13.221822.27731@gumby.dsd.TRW.COM> suhre@shark.dsd.trw.com.UUCP (Maurice E. Suhre) writes: >In article <1136@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes: >+SysV User's should note that they have a similar program, called random. >+It's range of output values is restricted to 0..255 (the upper limit must >+be specified as a calling argument). If you need larger numbers, you >+can, of course, call random twice, multiply the result of the first >+call by 256, add the result of the second call, and take the whole thing >+modulo the largest number you want. >Be advised that the operations mentioned (multiplication, modulo) will, >in general, change the distribution of the random numbers. That is, if you >started with uniformly distributed random numbers, then what you will >come out with will not be uniformly distributed. The dist. will be changed-- from uniform on 0...255 to uniform on 0...65535. The multiplication 'spreads' the pdf of the first call form 1/256 in each 'bin' from 0...255 to 1/256 in every 256'th bin from 0 to 65280. Addition of the second number convolves its pdf (0..255) with the first, 'filling out' the summed pdf to a unform distribution of 1/65536 from 0...65535. If the modulo is a power of 2, then the distribution of the pdf below the mod value will be uniform. (The distribution above the value will be perfectly mapped to the dist. below so that each 'bin' gets the same amount of added probability). If the mod value is NOT a power of 2, so that 633536 MOD n != 0, then the dist will be: from 0 to 65536 % n - 1: ([65536 / n] + 1) / 65536 from 65536 % n up: ([63336 / n]) / 65536 where [x] means the greatest integer <= x. Regards, David --------- David Fenyes dfenyes@thesis1.med.uth.tmc.edu University of Texas Medical School Houston, Texas