beaulieu@netcom.UUCP (Bob Beaulieu) (09/16/89)
I have a text file that is very large (26,000+ lines) and would like to break it down to 5-6 smaller files. Is there an easy way to handle this? I have tried vi but, it seems to hold 5000 lines in its buffer. The same goes for ed and ex. Thanks for any help. -- Bob Beaulieu 277-b Tyrella Avenue Mountain View, CA 94043 (415) 967-4678
cpcahil@virtech.UUCP (Conor P. Cahill) (09/16/89)
In article <2388@netcom.UUCP>, beaulieu@netcom.UUCP (Bob Beaulieu) writes: > I have a text file that is very large (26,000+ lines) and would like > to break it down to 5-6 smaller files. Is there an easy way to handle > this? Try split(1) which allows you to split the file into different segments by # of lines. If you want to have some form of logical split of the data, use split(1) to break the file into manageable parts and then piece the parts you want together. -- +-----------------------------------------------------------------------+ | Conor P. Cahill uunet!virtech!cpcahil 703-430-9247 ! | Virtual Technologies Inc., P. O. Box 876, Sterling, VA 22170 | +-----------------------------------------------------------------------+
ok@cs.mu.oz.au (Richard O'Keefe) (09/17/89)
In article <2388@netcom.UUCP>, beaulieu@netcom.UUCP (Bob Beaulieu) writes: > I have a text file that is very large (26,000+ lines) and would like > to break it down to 5-6 smaller files. If you have split, it may do what you want: split -5000 foobaz where foobaz contains 26,123 lines, will create xaa # lines 1- 5,000 xab # lines 5,001-10,000 xac # lines 10,001-15,000 xad # lines 15,001-20,000 xae # lines 20,001-25,000 xaf # lines 25,001-26,123 If you want 'zabbo' used as the prefix instead of 'x', say split -5000 foobaz zabbo and you'll get zabbo{aa,ab,ac,ad,ae,af} produced instead. If you haven't got split, I can mail a version which is rather sexier. Of course you could always do this with 'awk', use awk -f split-5000-by-6.awk foobaz where the file split-5000-by-6.awk contains these lines: 1 <= NR && NR <= 5000 { print $0 > "xaa" } 5001 <= NR && NR <= 10000 { print $0 > "xab" } 10001 <= NR && NR <= 15000 { print $0 > "xac" } 15001 <= NR && NR <= 20000 { print $0 > "xad" } 20001 <= NR && NR <= 25000 { print $0 > "xae" } 25001 <= NR && NR <= 30000 { print $0 > "xaf" } There Is Always Another Way...
fischer@iesd.auc.dk (Lars P. Fischer) (09/17/89)
In article <2388@netcom.UUCP> beaulieu@netcom.UUCP (Bob Beaulieu) writes: >I have a text file that is very large (26,000+ lines) and would like >to break it down to 5-6 smaller files. Is there an easy way to handle >this? I have tried vi but, it seems to hold 5000 lines in its buffer. >The same goes for ed and ex. Try emacs(1). Handles files with up to 2^31 characters. /Lars -- Copyright 1989 Lars Fischer; you can redistribute only if your recipients can. Lars Fischer, fischer@iesd.auc.dk, {...}!mcvax!iesd!fischer Department of Computer Science, University of Aalborg, DENMARK. Our audience is programmers, because the UNIX environment was designed fundamentally for programming. -- Kernighan & Pike
max@lgc.UUCP (Max Heffler @ Landmark Graphics) (09/18/89)
In article <2121@munnari.oz.au>, ok@cs.mu.oz.au (Richard O'Keefe) writes: > If you have split, it may do what you want: > Of course you could always do this with 'awk', use > awk -f split-5000-by-6.awk foobaz > There Is Always Another Way... I forgot about split and just used dd... -- Max Heffler uucp: ..!uunet!lgc!max Landmark Graphics Corp. phone: (713) 579-4751 333 Cypress Run, Suite 100 Houston, Texas 77094
meissner@tiktok.dg.com (Michael Meissner) (09/19/89)
In article <FISCHER.89Sep17141429@rosser.iesd.auc.dk> fischer@iesd.auc.dk (Lars P. Fischer) writes: | In article <2388@netcom.UUCP> beaulieu@netcom.UUCP (Bob Beaulieu) writes: | >I have a text file that is very large (26,000+ lines) and would like | >to break it down to 5-6 smaller files. Is there an easy way to handle | >this? I have tried vi but, it seems to hold 5000 lines in its buffer. | >The same goes for ed and ex. | | Try emacs(1). Handles files with up to 2^31 characters. That really depends on the emacs implementation. GNU emacs for example, requires that all text, global data, and buffer space fit within 2^24 bytes. This is because the upper 8 bits are used to encode the type and are also used for garbage collection. -- Michael Meissner, Data General. Uucp: ...!mcnc!rti!xyzzy!meissner If compiles were much Internet: meissner@dg-rtp.DG.COM faster, when would we Old Internet: meissner%dg-rtp.DG.COM@relay.cs.net have time for netnews?
stein-c@acsu.Buffalo.EDU (Craig Steinberger) (09/19/89)
In article <2388@netcom.UUCP> beaulieu@netcom.UUCP (Bob Beaulieu) writes: >I have a text file that is very large (26,000+ lines) and would like >to break it down to 5-6 smaller files. Is there an easy way to handle >this? I have tried vi but, it seems to hold 5000 lines in its buffer. >The same goes for ed and ex. There is a program called csplit that should do the trick.
guy@auspex.auspex.com (Guy Harris) (09/22/89)
> >I have a text file that is very large (26,000+ lines) and would like > >to break it down to 5-6 smaller files. Is there an easy way to handle > >this? I have tried vi but, it seems to hold 5000 lines in its buffer. > >The same goes for ed and ex. > >There is a program called csplit that should do the trick. There is a program called "csplit" in some, but not all, versions of UNIX that might do the trick; it splits based on "context" (which is presumably what the "c" in "csplit" stands for). From the SunOS 4.0 man page: DESCRIPTION csplit reads the file whose name is filename and separates it into n+1 sections, defined by the arguments argument1 through argumentn. If the filename argument is a `-', the standard input is used. By default the sections are placed in files named xx00 through xxn. n may not be greater than 99. These sections receive the following portions of the file: xx00 From the start of filename up to (but not including) the line indicated by argument1 (see OPTIONS below for an explanation of these arguments.) xx01: From the line indicated by argument1 up to the line indicated by argument2. xxn: From the line referenced by argumentn to the end of filename. However, it is, as noted, not present in all versions of UNIX; it doesn't come with 4.xBSD, for instance. "split", which splits based on line count, is present in all versions of UNIX AT&T has shipped, and is, as such, more likely to be present in any given version of UNIX (it is in 4.xBSD).
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (09/22/89)
You can use sed to break it if you don't have any fancy tools. sed -n "1,1000p" big.file >part.1 Obviously you will want to pick the breakpoints by content. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "The world is filled with fools. They blindly follow their so-called 'reason' in the face of the church and common sense. Any fool can see that the world is flat!" - anon
fischer@iesd.auc.dk (Lars P. Fischer) (09/25/89)
In article <1226@xyzzy.UUCP> meissner@tiktok.dg.com (Michael Meissner) writes: >| Try emacs(1). Handles files with up to 2^31 characters. > >That really depends on the emacs implementation. GNU emacs for >example, requires that all text, global data, and buffer space fit >within 2^24 bytes. This is because the upper 8 bits are used to >encode the type and are also used for garbage collection. OK, so I blew it. Sorry. If you need to edit files with more than 200k lines (80 chars/line), don't use emacs. In all other cases, do :-). (Only 16M chars per session? You mean I can't say "emacs /dev/xy0c"? Anybody out there has a *real* editor?? :-). /Lars -- Copyright 1989 Lars Fischer; you can redistribute only if your recipients can. Lars Fischer, fischer@iesd.auc.dk, {...}!mcvax!iesd!fischer Department of Computer Science, University of Aalborg, DENMARK. Our audience is programmers, because the UNIX environment was designed fundamentally for programming. -- Kernighan & Pike