mario@wjvax.UUCP (Mario Dona) (10/05/90)
Does anyone know of a way of extracting information from a text file which contains variable length fields, and outputting it in a different format? For example, I have a text file which contains names and addresses as shown below. John Doe 1563 Meadow Lane San Jose, CA 94325 more stuff ---> John Smith 345 N. First Street Oakland, CA 94356 more stuff ---> Henry & Martha Coe 4567 Poplar Ave. Newark, NJ 78389 more stuff---> ^ ^ ^ | | | | | |_____ column 45 | |______________________________ column 20 |_________________________________________________ column 1 I want to extract the names and addresses and output it in the following format: John Doe 1563 Meadow Lane San Jose, CA 94325 John Smith 245 N. First Street Oakland, CA 94356 Henry & Martha Coe 4567 Poplar Ave. Newark, NJ 7838 Can this be done using AWK, and if so how? Or is there some other way? Mario Dona ...!{ !decwrl!qubix, ames!oliveb!tymix, pyramid}!wjvax!mario The above opinions are mine alone and not, in any way, those of WJ.
jlp@cbnewsh.att.com (jon.peticolas) (10/05/90)
From article <1661@wjvax.UUCP>, by mario@wjvax.UUCP (Mario Dona): > Does anyone know of a way of extracting information from a text file > which contains variable length fields, and outputting it in a different > format? For example, I have a text file which contains names and > addresses as shown below. > John Doe 1563 Meadow Lane San Jose, CA 94325 more stuff ---> > John Smith 345 N. First Street Oakland, CA 94356 more stuff ---> > Henry & Martha Coe 4567 Poplar Ave. Newark, NJ 78389 more stuff---> > ^ ^ ^ > | | | > | | |_____ column 45 > | |______________________________ column 20 > |_________________________________________________ column 1 > I want to extract the names and addresses and output it in the following > format: > John Doe > 1563 Meadow Lane > San Jose, CA 94325 > John Smith > 245 N. First Street > Oakland, CA 94356 > Henry & Martha Coe > 4567 Poplar Ave. > Newark, NJ 7838 > Can this be done using AWK, and if so how? Or is there some other way? Yes { print substr($0,1,19); print substr($0,20,24); print substr($0,45); } Provided that all of the whitespace made up of space characters (no tabs). -Jon There's a time and place for spontaneity.
bob@wyse.wyse.com (Bob McGowen x4312 dept208) (10/06/90)
In article <1661@wjvax.UUCP> mario@wjvax.UUCP (Mario Dona) writes: > >Does anyone know of a way of extracting information from a text file >which contains variable length fields, and outputting it in a different >format? For example, I have a text file which contains names and >addresses as shown below. > >John Doe 1563 Meadow Lane San Jose, CA 94325 more stuff ---> ... >^ ^ ^ >| | |_____ column 45 >| |______________________________ column 20 >|_________________________________________________ column 1 > >I want to extract the names and addresses and output it in the following >format: > >John Doe >1563 Meadow Lane >San Jose, CA 94325 ..... >Can this be done using AWK, and if so how? Or is there some other way? > The answer re awk is that it depends. If all the fields are separated by spaces, in varying numbers, then you have problems using awk. If the file has (or can be recreated with) tabs (or some other separator character like a colon or |) between each set of fields then it is relatively easy. (Ie. if the white space marked below with the carets were a tab in each instance.) John Doe 1563 Meadow Lane San Jose, CA 94325 more stuff ---> ^^^^^^^^^^^ ^^^^^^^^^ ^ You could then use the following awk code to process your file: awk -F'->' '{printf "%s\n%s\n%s\n",$1,$2,$3}' file_to_process Note that the -> is used to represent a literal tab. If you use some other character to separate the fields, then substitute it. If the file is spaces between all the visible printing characters, your primary problem will be cases where the names are of variable size. For instance, San Jose and New Orleans vs Denver and Oakland or Meadow Lane vs Meadow Lane Court. E-mail me if you would like to discuss this in more detail. Bob McGowan (standard disclaimer, these are my own ...) Product Support, Wyse Technology, San Jose, CA ..!uunet!wyse!bob bob@wyse.com
2113av@gmuvax2.gmu.edu (John Porter) (10/06/90)
Mario:
Couldn't be easier, using awk. Use the substr() function:
{ print substr($0,1,18)
print substr($0,19,27)
print substr($0,45,len($0)-44)
print ""
}
or some such (whatever numbers).
Good luck! john.porter
glee@tigris.uucp (Godfrey Lee) (10/08/90)
In article <1661@wjvax.UUCP> mario@wjvax.UUCP (Mario Dona) writes: >Does anyone know of a way of extracting information from a text file >which contains variable length fields, and outputting it in a different >format? Looks more like fixed length fields below. >For example, I have a text file which contains names and >addresses as shown below. > >John Doe 1563 Meadow Lane San Jose, CA 94325 >John Smith 345 N. First Street Oakland, CA 94356 >Henry & Martha Coe 4567 Poplar Ave. Newark, NJ 78389 > >^ ^ ^ >| | |_____ column 45 >| |______________________________ column 20 >|_________________________________________________ column 1 > >I want to extract the names and addresses and output it in the following >format: > >John Doe >1563 Meadow Lane >San Jose, CA 94325 > >John Smith >245 N. First Street >Oakland, CA 94356 > >Henry & Martha Coe >4567 Poplar Ave. >Newark, NJ 7838 > >Can this be done using AWK, and if so how? Or is there some other way? Simple, for example: awk -f address.awk <infile >outfile -------------- address.awk ---------------------------- { name = substr ( $0, 1, 19 ) add1 = substr ( $0, 20, 25 ) add2 = substr ( $0, 45 ) printf ( "%s\n%s\n%s\n\n", name, add1, add2 ) } ---------end of address.awk ---------------------------- Of course, it would be simpler if you just separate the fields with a single character such as ':', e.g. John Doe:1563 Meadow Lane:San Jose, CA 94325:more stuff ---> John Smith:345 N. First Street:Oakland, CA 94356:more stuff ---> Henry & Martha Coe:4567 Poplar Ave.:Newark, NJ 78389:more stuff---> then all you need is: awk -F: '{ printf ("%s\n%s\n%s\n\n", $1, $2, $3 ) }' <infile >outfile ^^^ | +---- defines field separator to be ':' Good luck. -- Godfrey Lee cognos!alzabo!tigris!glee