thad@cup.portal.com (Thad P Floryan) (02/01/90)
ken@cs.rochester.edu (Ken Yap) in <1990Jan30.051455.7500@cs.rochester.edu>
writes:
Oh come on, what happened to reusing Unix tools?
Here's a cheap bsplit, done in sh (hooray for redirection on builtins,
fooey to csh on this point).
for i in 1 2 3 4 5
do
dd bs=10k count=1 of=part$i
done < foo
Edit as appropriate.
Nothing's "wrong" with the above; thanks for the example and posting!
But, per "Edit as appropriate", one has to know beforehand how many parts
the original will be split into, and, here's the clinker to the above, your
example does NOT perserve sequential order if there are more than 9 parts
such that one could do (later, when repacking after a uucp) "zcat part* | .."
because "part10" collates after "part1" but BEFORE "part2". In other words,
an "ls part*" would sequence part1, part10, part2, part3, ... , part9 which
is the incorrect order.
The output of bsplit (and my xsplit) preserves collating sequence per partaa,
partab, partac, ... partzz thus preserving the split-order.
The benefit of "bsplit" is evident when uucp'ing the 40- or 60- or 90-part
distributions from, say, osu-cis. Consider just the GNU gcc; it's a 20+ part
archive on osu-cis, and the split-sequence is maintained (for zcat and UNIX'
wildcarding) due to "aa", "ab", ..., "bh" suffixes on the filenames.
Thad Floryan [ thad@cup.portal.com (OR) ..!sun!portal!cup.portal.com!thad ]
ken@cs.rochester.edu (Ken Yap) (02/02/90)
> for i in 1 2 3 4 5 > do > dd bs=10k count=1 of=part$i > done < foo > Edit as appropriate. > Nothing's "wrong" with the above; thanks for the example and posting! > But, per "Edit as appropriate", one has to know beforehand how many parts > the original will be split into, and, here's the clinker to the above, your > example does NOT perserve sequential order if there are more than 9 parts > such that one could do (later, when repacking after a uucp) "zcat part* | .." > because "part10" collates after "part1" but BEFORE "part2". In other words, > an "ls part*" would sequence part1, part10, part2, part3, ... , part9 which > is the incorrect order. True. When I say edit as appropriate, I really mean add the bells and whistles as needed. It would take very little shell hacking to add the features you want. Probably something along the lines of: get the size from ls; loop, keeping track of the bytes written so far, a digits and a tens counter (and a hundereds counter, if you're greedy); increment as appropriate, exiting the loop when the size of the file has been reached. Another approach would be to precompute the number of parts needed and generate fixed width numbers by prepadding with zeros, then trimming to the final width with sed. I can see some people retching in the aisles now, but hey, sh scripts are easy to get working. :-) I'm not against C bsplit in any way. I'd probably get that myself if I needed that function. I just wanted to point out that sh and many other Unix tools have lots of underused features* and that sometimes writing a shell script is faster than hacking C. No doubt somebody will suggest a perl version next. :-) * Here's another example: <foo exec in a shell script will cause the rest of the script to read from file foo. Similarly for >.
les@chinet.chi.il.us (Leslie Mikesell) (02/03/90)
In article <1990Feb1.232040.26182@cs.rochester.edu> ken@cs.rochester.edu writes: >> for i in 1 2 3 4 5 >> do >> dd bs=10k count=1 of=part$i >> done < foo > >Another approach would be to precompute the number of parts >needed and generate fixed width numbers by prepadding with zeros, then >trimming to the final width with sed. The fixed width numbers are easy with something like: (3 digits) case $i in ?) i=00$i ;; ??) i=0$i ;; esac The real problem, though is that you can't feed the script from a pipe. dd is almost unique among the unix tools in that it uses read() rather than fread() and will fail to read the requested amount if the input pipeline cannot stay ahead. >No doubt somebody will suggest a perl version next. :-) Good idea... Les Mikesell les@chinet.chi.il.us
thad@cup.portal.com (Thad P Floryan) (02/03/90)
jbm@uncle.UUCP (John B. Milton) in <679@uncle.UUCP> mentions: What about this: uucp -r osu-cis!~/gnu/bsplit.c /usr/spool/uucppublic Thanks! That file (bsplit) is NOT listed in the "GNU.how-to-get" file; one must either peruse "ls-lR.Z" or roam osu-cis' directories via ftp or telnet. Thad Floryan [ thad@cup.portal.com (OR) ..!sun!portal!cup.portal.com!thad ]
ken@cs.rochester.edu (Ken Yap) (02/04/90)
|The real problem, though is that you can't feed the script from a |pipe. dd is almost unique among the unix tools in that it |uses read() rather than fread() and will fail to read the |requested amount if the input pipeline cannot stay ahead. There is a good reason for that, the semantics of reading large blocks from tape have to be preserved. If stdio were used, the block size would be whatever stdio happened to use. No doubt dd could be taught to tell the difference between tape and pipes but nobody wants to mess with nostalgia. :-)
res@cbnews.ATT.COM (Robert E. Stampfli) (02/06/90)
>The real problem, though is that you can't feed the script from a >pipe. dd is almost unique among the unix tools in that it >uses read() rather than fread() and will fail to read the >requested amount if the input pipeline cannot stay ahead. Yes. One way of dealing with this, although it may not be the most efficient, is to change instances of ... | dd -args | ... to ... | dd bs=whatever | dd -args | ... This has worked in a pinch for me several times. -- Rob Stampfli / att.com!stampfli (uucp@work) / kd8wk@w8cqk (packet radio) 614-864-9377 / osu-cis.cis.ohio-state.edu!kd8wk!res (uucp@home)