parsnips@dasys1.UUCP (David Parsons) (08/31/89)
I would like to use a Bourne shell script to extract records from a simple database of fixed-length fields terminated with a new-line character. I've tried R'ing TFM to no avail. The problem... the database consists of addresses... positions 99 and 100 in each record contain a two-position abbreviation for the state. It's easy to get cut to read those two characters, and grep to identify the state I want to extract, but how the ^#$&! do you then copy the ENTIRE record thus identified to another file??? Using grep alone is no good because the abbreviation appears in various other places in the record... I KNOW there's a simple answer to this, but it's making me crazy (we gotta buy a database, I know, I know...) Any help would be appreciated! Dave David Parsons ..cmcl2!{ccnysci,cucard,hombre}!dasys1!parsnips -- David Parsons Big Electric Cat Public UNIX ..!cmcl2!{ccnysci,cucard,hombre}!dasys1!parsnips
davej@mrsvr.UUCP (David Johnson x4-6506) (08/31/89)
From article <10596@dasys1.UUCP>, by parsnips@dasys1.UUCP (David Parsons): > The problem... the database consists of addresses... positions 99 and 100 > in each record contain a two-position abbreviation for the state. It's easy > to get cut to read those two characters, and grep to identify the state I > want to extract, but how the ^#$&! do you then copy the ENTIRE record > thus identified to another file??? Using grep alone is no good because > the abbreviation appears in various other places in the record... > > David Parsons > Big Electric Cat Public UNIX I tried e-mail and it bounced and bounced and . . . Assume your "database" (file) is called database. Assume your state abbreviation is in a shell vbl called $abbr Try this: cut -c99,100 database | grep -n "$abbr" | sed 's/:..*$/p/' | ed - foo This should print all of the matching lines on stdout. Good Luck. -- David J. Johnson - Computer People Unlimited, Inc. @ GE Medical Systems gemed!python!davej@crd.ge.com - OR - sun!sunbird!gemed!python!davej "What a terrible thing it is to lose one's mind." - Dan Quayle
moran@tron.UUCP (Harvey R Moran) (09/01/89)
In article <10596@dasys1.UUCP> parsnips@dasys1.UUCP (David Parsons) writes: >I would like to use a Bourne shell script to extract records from a simple >database of fixed-length fields terminated with a new-line character. > >I've tried R'ing TFM to no avail. > >The problem... the database consists of addresses... positions 99 and 100 >in each record contain a two-position abbreviation for the state. It's easy >to get cut to read those two characters, and grep to identify the state I >want to extract, but how the ^#$&! do you then copy the ENTIRE record >thus identified to another file??? Using grep alone is no good because >the abbreviation appears in various other places in the record... > >-- >David Parsons >Big Electric Cat Public UNIX >..!cmcl2!{ccnysci,cucard,hombre}!dasys1!parsnips Mail from here to to dasys1 bounces. Well, someone else might find it useful, so ... # -------- snip -------- snip -------- snip -------- snip -------- #! /bin/sh # This is a shell archive. Remove anything before this line, then unpack # it by saving it into a file and typing "sh file". To overwrite existing # files, type "sh file -c". You can also feed this as standard input via # unshar, or by typing "sh <file", e.g.. If this archive is complete, you # will see the following message at the end: # "End of shell archive." # Contents: agrep.sh agrep.c test.data # Wrapped by moran@tron on Fri Sep 1 07:11:18 1989 PATH=/bin:/usr/bin:/usr/ucb ; export PATH if test -f 'agrep.sh' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'agrep.sh'\" else echo shar: Extracting \"'agrep.sh'\" \(1850 characters\) sed "s/^X//" >'agrep.sh' <<'END_OF_FILE' X#!/bin/sh X# agrep.sh -- anchored grep X# X# Usage: agrep beginning column match_pattern [file_name_or_list] X# X# Intended usage is searching fixed length ascii records for matches at a X# particular column position. Written in response to a posted example X# desire to search for a match with 2 character state designators X# in columns 99 and 100 of records containing (USA) mailing addresses. X# X# This uses brute force to generate a match pattern anchored X# at the column of interest. If the records get long enough, this technique X# will probably die a horrible death by exceeding a pattern buffer length X# allocation in grep. X# X# There are also potential pitfalls associated with full regular X# expressions because the calculation of APAT below has "$2" in it X# which exposes $2 to possible unintended wildcard expansion by the shell. X# If *I* needed this capability, I would re-write "agrep" in a few lines X# of equivalent C code. A (garbiginous) sample is in this shar as "agrep.c" X# X# Harvey Moran moran@tron.UUCP or moran@tron.UUCP@umbc3.UMBC.EDU 9/1/89 X Xcase $# in X0|1|2) echo "Usage: $0 beginning_column match_pattern [file_name_or_list]" X echo "Acceptable range of beginning_column is 1 to 300" X exit 1 X ;; Xesac XDOTS50=".................................................." # 50 dots in a row XDOTS300="${DOTS50}${DOTS50}${DOTS50}${DOTS50}${DOTS50}${DOTS50}" X Xif [ $1 -lt 1 -o $1 -gt 300 ] Xthen X echo "Usage: $0 beginning_column match_pattern file_name_or_list" X echo "Acceptable range of beginning_column is 1 to 300" X exit 1 Xfi XAPAT="^"`echo "${DOTS300}" | cut -c1-$1`"$2" Xshift Xshift X X# NEWEXP is now the original match_pattern prefixed with a left anchor X# and a number of dots (match any character) to position X# the original match_pattern at the column of interest. X Xecho grep "\"${APAT}\"" "$@" END_OF_FILE if test 1850 -ne `wc -c <'agrep.sh'`; then echo shar: \"'agrep.sh'\" unpacked with wrong size! fi chmod +x 'agrep.sh' # end of 'agrep.sh' fi if test -f 'agrep.c' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'agrep.c'\" else echo shar: Extracting \"'agrep.c'\" \(785 characters\) sed "s/^X//" >'agrep.c' <<'END_OF_FILE' X X/* X * agrep.c -- garbiginous version of "anchored grep" X * Yes, I know this is not portable, but it is supposed to be a clue X * rather than a program and it "works" X * (given correct arguments) under Ultrix 3.1 X * X * Making this trash into a program is left as an exercise to "the student". X * X * Usage: X * agrep anchor_column_number pattern [files] X */ X Xmain(ac, av, envp) /* Yeah, I know envp is not portable */ Xint ac; Xchar *av[], *envp[]; X{ X static char *dots50 = ".................................................."; X static char dots300[1024] = "^"; X int i; X for (i = 0; i < 6; ++i ) X (void) strcat(dots300, dots50); X dots300[atoi(av[1])] = '\0'; X strcat(dots300, av[2]); X av[1] = "agrep"; X av[2] = dots300; X execve("/bin/grep", av+1, envp); X} END_OF_FILE if test 785 -ne `wc -c <'agrep.c'`; then echo shar: \"'agrep.c'\" unpacked with wrong size! fi # end of 'agrep.c' fi if test -f 'test.data' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'test.data'\" else echo shar: Extracting \"'test.data'\" \(244 characters\) sed "s/^X//" >'test.data' <<'END_OF_FILE' X-------------------- test as: agrep 10 MD test.data, and variations thereof X123456789VA234567890 XabcdefghiMD2zyxwvuts XabcdefghiNH2zyxwvuts X12MD56789NJ2zyxwvuts XabcdefghiMD2zyxwvuts X123MD6789NH2zyxwvuts XabcdefghiCA23MD67890 XMD3456789FL2zyxwvuts END_OF_FILE if test 244 -ne `wc -c <'test.data'`; then echo shar: \"'test.data'\" unpacked with wrong size! fi # end of 'test.data' fi echo shar: End of shell archive. exit 0 -- # Harvey Moran # moran%tron.UUCP@umbc3.UMBC.EDU # # Westinghouse Electric Corp. # ...!netsys!tron!moran # # Electronic Systems Group # ...!{wb3ffv,netsys}!hrmhpc!harvey # # Baltimore, Md.
merlyn@iwarp.intel.com (Randal Schwartz) (09/01/89)
In article <956@mrsvr.UUCP>, davej@mrsvr (David Johnson x4-6506) writes: | From article <10596@dasys1.UUCP>, by parsnips@dasys1.UUCP (David Parsons): | | > The problem... the database consists of addresses... positions 99 and 100 | > in each record contain a two-position abbreviation for the state. It's easy | > to get cut to read those two characters, and grep to identify the state I | > want to extract, but how the ^#$&! do you then copy the ENTIRE record | > thus identified to another file??? Using grep alone is no good because | > the abbreviation appears in various other places in the record... | > | > David Parsons | > Big Electric Cat Public UNIX | | I tried e-mail and it bounced and bounced and . . . | | Assume your "database" (file) is called database. | Assume your state abbreviation is in a shell vbl called $abbr | | Try this: | | cut -c99,100 database | grep -n "$abbr" | sed 's/:..*$/p/' | ed - foo | | This should print all of the matching lines on stdout. Hmmppph. I saw a guy removing a screw with the claw of a claw hammer the other day. You remind me of him. Wrong set of tools, dude! I can cut those lines in one process: awk 'substring($0,99,2) == "'"$abbr"'"' database (I might be off by one on the substring function...) Just another UNIX toolsmith (who can tell a claw hammer from a straight screwdriver...), -- /== Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ====\ | on contract to Intel, Hillsboro, Oregon, USA | | merlyn@iwarp.intel.com ...!uunet!iwarp.intel.com!merlyn | \== Cute Quote: "Welcome to Oregon... Home of the California Raisins!" ==/
davej@mrsvr.UUCP (David Johnson x4-6506) (09/05/89)
From article <4885@omepd.UUCP>, by merlyn@iwarp.intel.com (Randal Schwartz):
= In article <956@mrsvr.UUCP>, davej@mrsvr (David Johnson x4-6506) writes:
= | From article <10596@dasys1.UUCP>, by parsnips@dasys1.UUCP (David Parsons):
= |
= | > The problem... the database consists of addresses... positions 99 and 100
= | > in each record contain a two-position abbreviation for the state. It's easy
= | > to get cut to read those two characters, and grep to identify the state I
= | > want to extract, but how the ^#$&! do you then copy the ENTIRE record
= | > thus identified to another file??? Using grep alone is no good because
= | > the abbreviation appears in various other places in the record...
= | >
= | > David Parsons
= |
= | Try this:
= |
= | cut -c99,100 database | grep -n "$abbr" | sed 's/:..*$/p/' | ed - foo
=
= Hmmppph. I saw a guy removing a screw with the claw of a claw hammer
= the other day. You remind me of him. Wrong set of tools, dude!
I must take issue with this. The question was phrased in terms of cut and
grep; my answer responded at the level of the question. I think my solution
is easier to understand for the "casual" shell user (who doesn't know his
way around awk). Suppose further that his requirements now changed to
"delete the ENTIRE record thus identified . . .". The solution above is
easy to modify for the "casual" user (i.e. one who understands ed).
= I can cut those lines in one process:
=
= awk 'substring($0,99,2) == "'"$abbr"'"' database
=
= Just another UNIX toolsmith (who can tell a claw hammer from a straight
= screwdriver...),
But who can't RTFM ;-) (try using "substr" instead).
=
= /== Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ====\
--
David J. Johnson - Computer People Unlimited, Inc. @ GE Medical Systems
gemed!python!davej@crd.ge.com - OR - sun!sunbird!gemed!python!davej
"What a terrible thing it is to lose one's mind." - Dan Quayle
jpr@dasys1.UUCP (Jean-Pierre Radley) (09/07/89)
In article <10596@dasys1.UUCP> parsnips@dasys1.UUCP (David Parsons) writes: >I would like to use a Bourne shell script to extract records from a simple >database of fixed-length fields terminated with a new-line character. >The problem... the database consists of addresses... positions 99 and 100 >in each record contain a two-position abbreviation for the state. It's easy >to get cut to read those two characters, and grep to identify the state I >want to extract, but how the ^#$&! do you then copy the ENTIRE record >thus identified to another file??? Using grep alone is no good because >the abbreviation appears in various other places in the record... This is a one-liner for awk: :statefinder awk ' { if ("'$1'" == substr($0, 99, 2) ) print } ' $2 Thus if characters 99 and 100 of your line contain the state, then typing statefinder NY datafile should do what you want. -- Jean-Pierre Radley jpr@jpradley.uucp New York, NY 72160.1341@compuserve.com