dmr@alice.att.com (Dennis Ritchie) (11/17/90)
This group has turned into a sewer, as bad as talk.origins (even the people I might politically agree with are acting dumb). But I do have a serious question. I've got a boxcar with 10,000 8-mm tapes, each containing 10,000,000 100-byte binary records without any substructure. I need to get these records sorted. Can someone quote me a cost and expected delivery time? This is just the start of a burgeoning business; next week I expect a full trainload of 100 such boxcars. What will this cost and how long will it take your company to handle it? Since I understand that this will run into big bucks, I trust your company can give a brief account of its technical approach before I write the purchase orders. Dennis Ritchie
schow@bcarh185.bnr.ca (Stanley T.H. Chow) (11/20/90)
In article <11628@alice.att.com> dmr@alice.att.com (Dennis Ritchie) writes: >This group has turned into a sewer, as bad as talk.origins >(even the people I might politically agree with are acting dumb). > >But I do have a serious question. I've got a boxcar with 10,000 8-mm tapes, >each containing 10,000,000 100-byte binary records >without any substructure. I need to get these records sorted. >Can someone quote me a cost and expected delivery time? Ordinarily, I avoid topics like this. Seeing that it is Dennis Ritchie, just my be I can get rich on this after all, so here is my proposal. :-) Assumptions: Input - you hand me a boxcar of tapes, (figuratively speaking) Output - I throw you 10,000 tapes back; in sequence, one tape at a time (hope you have a good catching glove). (Each output tape has 10,000,000 records exactly). Sort Key - is the entire 100-byte record, duplicates are allowed. Observations: 1) Key space is 2 ** 800 (roughly 6 * 10 **240) 2) total "file" size = 10,000 * 10,000,000 * 100 bytes = 10 ** 13 bytes. 3) Companies like SyncSort Inc. (and others) make their money soving problems like this. I will give you two quotes: 1) Minimum turn-around time. I will need a bunch of tape drives along with a bunch of disk drives connected by a network. Stuf an input tape into each tape drive, read the records and throw each record into a disk. Unload the input tapes, load blank output tapes, and copy 10,000,000 records onto each tapes. Elapse time: approximately two times R/W time of a single tape. Cost: option a: you provide the hardware I charge 1% of the hardware cost for processing. option b: I provide the hardware Sorry, I want to get paid in my life time, so I can't wait for the delivery of all that hardware. 2) Minimum hardware cost. I will buy a really cheap CPU (Commodore 64 comes to mind), hook up a single tape drive. [I will skip the gory details of how. If you really want to know how, look up any sort program on the mainframes - SyncSort is a good example. They will tell you how to do it will three drives, cutting down to a single drive is simple, but tedious] Elapse time: approximately the expected life time of the universe. Cost: option a: Total payment up front - $10K (negotiable) option b: pay as you go: nothing down, $10 a month for the duration. (negotiatable). > >This is just the start of a burgeoning business; next week >I expect a full trainload of 100 such boxcars. >What will this cost and how long will it take your company >to handle it? By a remarkable coincidence (and extreme lazyness), my quotes for this problem are exactly the same as the previous smaller problem. This conclusively demonstrates the advantage of dealing with a service-oriented compnay like us - "The same price for all your problems". Stanley Chow BitNet: schow@BNR.CA BNR UUCP: ..!uunet!bnrgate!bcarh185!schow (613) 763-2831 ..!psuvax1!BNR.CA.bitnet!schow Me? Represent other people? Don't make them laugh so hard.
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/20/90)
In article <11628@alice.att.com> dmr@alice.att.com (Dennis Ritchie) writes: [ real words from a Real Programmer ] > But I do have a serious question. I've got a boxcar with 10,000 8-mm tapes, > each containing 10,000,000 100-byte binary records > without any substructure. I need to get these records sorted. > Can someone quote me a cost and expected delivery time? I'm afraid I'm not in a position to contract out NYU's computer time to AT&T, but I'll try to answer your question. Let me assume a mainframe with 100M memory usable for data, internal speed about ten times a Sun 4, and a 10-terabyte 5000-disk farm. To sort as much data as will fit into memory will take 20 seconds with my current software, for a total CPU time of 23 days. The final merge shuttles a total of 170 terabytes to and from disk, using approximately 5 days of CPU time and relatively little I/O time. If we can read (by some miracle) 100K a second on a hundred separate tape drives, reading all your data takes 11 days. Add another 11 days for writing back to tape (though we suggest a better storage method). Total: Only as much time as Noah spent in his ark. We'll give you a guaranteed time of twelve weeks, expected nine. The disk farm is the biggest cost: including controllers and an IBM to handle it all, the disks could cost up to $10M initially. Assuming we do this all the time and already have disks, you just pay something more than maintenance costs plus floor space plus personnel plus CPU time for two months. So (without any experience with how much a company might actually charge for a project of such a size) I'd estimate nine weeks and $1,000,000. > This is just the start of a burgeoning business; next week > I expect a full trainload of 100 such boxcars. > What will this cost and how long will it take your company > to handle it? Approximately 100 times the cost and approximately 100 times the time. However, a 500000-disk farm is much more than 100 times less realistic than a 5000-disk farm; and I don't think anyone's going to embark on a project taking a couple of decades to complete. ---Dan
rh@smds.UUCP (Richard Harter) (11/20/90)
In article <11628@alice.att.com>, dmr@alice.att.com (Dennis Ritchie) writes: > This group has turned into a sewer, as bad as talk.origins > (even the people I might politically agree with are acting dumb). .... Snicker .... > But I do have a serious question. I've got a boxcar with 10,000 8-mm tapes, > each containing 10,000,000 100-byte binary records > without any substructure. I need to get these records sorted. > Can someone quote me a cost and expected delivery time? > This is just the start of a burgeoning business; next week > I expect a full trainload of 100 such boxcars. > What will this cost and how long will it take your company > to handle it? > Since I understand that this will run into big bucks, I trust > your company can give a brief account of its technical approach > before I write the purchase orders. Are you ever in luck! Trust me, I've got the answer to your problem. My cousin Sid, who is a part time sales executive for the Digicomputronics division of Nocturnal Aviation and Screen Door Corporation will be in touch with you very shortly. As you may know Digicomputronics holds the patent on the inverted-Q-list linear boxcar sort algorithm. Sid will be with you shortly. All you will need to do is to sign a non-disclosure agreement and give Sid a certified non-refundable check for $10,000 and you will be in business. [If you don't have the ten G's handy, get him drunk (easy to do) and offer him a bottle of Chiva Regal.] Oh, by the way. You do have a computer with 2*100 bytes of fast mnemory and ten thousand tape drives, don't you? -- Richard Harter, Software Maintenance and Development Systems, Inc. Net address: jjmhome!smds!rh Phone: 508-369-7398 US Mail: SMDS Inc., PO Box 555, Concord MA 01742 This sentence no verb. This sentence short. This signature done.
tneff@bfmny0.BFM.COM (Tom Neff) (11/21/90)
In article <11628@alice.att.com> dmr@alice.att.com (Dennis Ritchie) writes: >But I do have a serious question. I've got a boxcar with 10,000 8-mm tapes, >each containing 10,000,000 100-byte binary records >without any substructure. I need to get these records sorted. >Can someone quote me a cost and expected delivery time? Dennis, it'll take some work to gauge the problem, then we'll be on track. We're not one of those outfits that tries to railroad you into one solution, only to switch you later on. Our engineers will go to work decoupling the problem, as we train them to do, and I'm not just blowing smoke when I tell you that the solution we tender will really make the grade. By the way, have you got a dozen or so empty rolling stock we can use for tempfiles?? :-) -- When considering "victim's rights," remember || vv Tom Neff that an innocent defendant is also a victim. ^^ || tneff@bfmny0.BFM.COM
gsh7w@astsun.astro.Virginia.EDU (Greg Hennessy) (11/24/90)
In article <16051@bfmny0.BFM.COM> tneff@bfmny0.BFM.COM (Tom Neff) writes:
#Dennis, it'll take some work to gauge the problem, then we'll be on
#track.
Are you trying to guage the reaction that you get?
--
-Greg Hennessy, University of Virginia
USPS Mail: Astronomy Department, Charlottesville, VA 22903-2475 USA
Internet: gsh7w@virginia.edu
UUCP: ...!uunet!virginia!gsh7w
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/26/90)
In article <11628@alice.att.com>, dmr@alice.att.com (Dennis Ritchie) writes: > But I do have a serious question. I've got a boxcar with 10,000 8-mm tapes, > each containing 10,000,000 100-byte binary records > without any substructure. I need to get these records sorted. > Can someone quote me a cost and expected delivery time? This is a very interesting question, because the answer isn't entirely a computer science one. This is clearly outside the scope of internal sorting. I'd go straight to Knuth V3, and of course to the specs of the 8-mm tape drives. However, the absolute *minimum* is obviously that each tape must be located, put in a drive, removed from the drive, and restored to its place. Say 5--10 minutes of handling time. We're talking about an absolute minimum of somewhere around 25 man-weeks of "operator" time just to read each record once. The cost is thus going to depend on the cost of labour. For the 100-boxcar version, it might even be worth building special tape-handling machinery. > Since I understand that this will run into big bucks, I trust > your company can give a brief account of its technical approach > before I write the purchase orders. How will "trade secret" do (:-)? Non-"technical" factors such as "we're going to ship it to the Far East" might easily dominate technical ones for a problem this size. I trust that any company approached for real with this problem would start by saying "why do you need to have these things sorted; can we help you change your set-up so they're generated in the right order; ...". The fastest way to do anything is after all to ensure that it doesn't _need_ doing. -- I am not now and never have been a member of Mensa. -- Ariadne.
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/27/90)
In article <4373@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes: > However, the absolute *minimum* is obviously that each > tape must be located, put in a drive, removed from the drive, and restored > to its place. Say 5--10 minutes of handling time. We're talking about an > absolute minimum of somewhere around 25 man-weeks of "operator" time just > to read each record once. Yes. In my analysis I assumed a reasonable number of operators to load the tape farm at the right times. This was a significant fraction of the $1000000 cost---after all, the real cost for one person is about $100K a year. ---Dan