[comp.unix.questions] Shell Database Management

parsnips@dasys1.UUCP (David Parsons) (08/31/89)

I would like to use a Bourne shell script to extract records from a simple
database of fixed-length fields terminated with a new-line character.

I've tried R'ing TFM to no avail.  

The problem... the database consists of addresses... positions 99 and 100
in each record contain a two-position abbreviation for the state.  It's easy
to get cut to read those two characters, and grep to identify the state I
want to extract, but how the ^#$&! do you then copy the ENTIRE record 
thus identified to another file???  Using grep alone is no good because 
the abbreviation appears in various other places in the record...

I KNOW there's a simple answer to this, but it's making me crazy (we gotta
buy a database, I know, I know...)

Any help would be appreciated! 

Dave

David Parsons
..cmcl2!{ccnysci,cucard,hombre}!dasys1!parsnips

-- 
David Parsons
Big Electric Cat Public UNIX
..!cmcl2!{ccnysci,cucard,hombre}!dasys1!parsnips

davej@mrsvr.UUCP (David Johnson x4-6506) (08/31/89)

From article <10596@dasys1.UUCP>, by parsnips@dasys1.UUCP (David Parsons):

> The problem... the database consists of addresses... positions 99 and 100
> in each record contain a two-position abbreviation for the state.  It's easy
> to get cut to read those two characters, and grep to identify the state I
> want to extract, but how the ^#$&! do you then copy the ENTIRE record 
> thus identified to another file???  Using grep alone is no good because 
> the abbreviation appears in various other places in the record...
> 
> David Parsons
> Big Electric Cat Public UNIX

I tried e-mail and it bounced and bounced and . . .

Assume your "database" (file) is called database.
Assume your state abbreviation is in a shell vbl called $abbr

Try this:

cut -c99,100 database | grep -n "$abbr" | sed 's/:..*$/p/' | ed - foo

This should print all of the matching lines on stdout.

Good Luck.

--
David J. Johnson - Computer People Unlimited, Inc. @ GE Medical Systems
gemed!python!davej@crd.ge.com  - OR - sun!sunbird!gemed!python!davej
  "What a terrible thing it is to lose one's mind." - Dan Quayle

moran@tron.UUCP (Harvey R Moran) (09/01/89)

In article <10596@dasys1.UUCP> parsnips@dasys1.UUCP (David Parsons) writes:
>I would like to use a Bourne shell script to extract records from a simple
>database of fixed-length fields terminated with a new-line character.
>
>I've tried R'ing TFM to no avail.  
>
>The problem... the database consists of addresses... positions 99 and 100
>in each record contain a two-position abbreviation for the state.  It's easy
>to get cut to read those two characters, and grep to identify the state I
>want to extract, but how the ^#$&! do you then copy the ENTIRE record 
>thus identified to another file???  Using grep alone is no good because 
>the abbreviation appears in various other places in the record...
>
>-- 
>David Parsons
>Big Electric Cat Public UNIX
>..!cmcl2!{ccnysci,cucard,hombre}!dasys1!parsnips

Mail from here to to dasys1 bounces.  Well, someone else might find it
useful, so ...
# -------- snip -------- snip -------- snip -------- snip --------
#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  agrep.sh agrep.c test.data
# Wrapped by moran@tron on Fri Sep  1 07:11:18 1989
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'agrep.sh' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'agrep.sh'\"
else
echo shar: Extracting \"'agrep.sh'\" \(1850 characters\)
sed "s/^X//" >'agrep.sh' <<'END_OF_FILE'
X#!/bin/sh
X# agrep.sh -- anchored grep
X# 
X# Usage:    agrep beginning column match_pattern [file_name_or_list]
X#
X# Intended usage is searching fixed length ascii records for matches at a
X# particular column position.  Written in response to a posted example
X# desire to search for a match with 2 character state designators
X# in columns 99 and 100 of records containing (USA) mailing addresses.
X#
X# This uses brute force to generate a match pattern anchored
X# at the column of interest.  If the records get long enough, this technique
X# will probably die a horrible death by exceeding a pattern buffer length
X# allocation in grep.
X#
X# There are also potential pitfalls associated with full regular
X# expressions because the calculation of APAT below has "$2" in it
X# which exposes $2 to possible unintended wildcard expansion by the shell.
X# If *I* needed this capability, I would re-write "agrep" in a few lines
X# of equivalent C code.  A (garbiginous) sample is in this shar as "agrep.c"
X#
X# Harvey Moran  moran@tron.UUCP or moran@tron.UUCP@umbc3.UMBC.EDU 9/1/89
X
Xcase $# in
X0|1|2) echo "Usage:  $0 beginning_column match_pattern [file_name_or_list]"
X       echo "Acceptable range of beginning_column is 1 to 300"
X       exit 1
X       ;;
Xesac
XDOTS50=".................................................." # 50 dots in a row
XDOTS300="${DOTS50}${DOTS50}${DOTS50}${DOTS50}${DOTS50}${DOTS50}"
X
Xif [ $1 -lt 1 -o $1 -gt 300 ]
Xthen
X       echo "Usage:  $0 beginning_column match_pattern file_name_or_list"
X       echo "Acceptable range of beginning_column is 1 to 300"
X       exit 1
Xfi
XAPAT="^"`echo "${DOTS300}" | cut -c1-$1`"$2"
Xshift
Xshift
X
X# NEWEXP is now the original match_pattern prefixed with a left anchor
X# and a number of dots (match any character) to position
X# the original match_pattern at the column of interest.
X
Xecho grep "\"${APAT}\"" "$@"
END_OF_FILE
if test 1850 -ne `wc -c <'agrep.sh'`; then
    echo shar: \"'agrep.sh'\" unpacked with wrong size!
fi
chmod +x 'agrep.sh'
# end of 'agrep.sh'
fi
if test -f 'agrep.c' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'agrep.c'\"
else
echo shar: Extracting \"'agrep.c'\" \(785 characters\)
sed "s/^X//" >'agrep.c' <<'END_OF_FILE'
X
X/*
X * agrep.c -- garbiginous version of "anchored grep"
X * Yes, I know this is not portable, but it is supposed to be a clue
X * rather than a program and it "works"
X * (given correct arguments) under Ultrix 3.1
X *
X * Making this trash into a program is left as an exercise to "the student".
X *
X * Usage:
X *    agrep anchor_column_number pattern [files]
X */
X
Xmain(ac, av, envp)	/* Yeah, I know envp is not portable */
Xint ac;
Xchar *av[], *envp[];
X{
X   static char *dots50 = "..................................................";
X    static char dots300[1024] = "^";
X    int i;
X    for (i = 0; i < 6; ++i )
X       (void) strcat(dots300, dots50);
X	  dots300[atoi(av[1])] = '\0';
X    strcat(dots300, av[2]);
X    av[1] = "agrep";
X    av[2] = dots300;
X    execve("/bin/grep", av+1, envp);
X}
END_OF_FILE
if test 785 -ne `wc -c <'agrep.c'`; then
    echo shar: \"'agrep.c'\" unpacked with wrong size!
fi
# end of 'agrep.c'
fi
if test -f 'test.data' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'test.data'\"
else
echo shar: Extracting \"'test.data'\" \(244 characters\)
sed "s/^X//" >'test.data' <<'END_OF_FILE'
X-------------------- test as: agrep 10 MD test.data, and variations thereof
X123456789VA234567890
XabcdefghiMD2zyxwvuts
XabcdefghiNH2zyxwvuts
X12MD56789NJ2zyxwvuts
XabcdefghiMD2zyxwvuts
X123MD6789NH2zyxwvuts
XabcdefghiCA23MD67890
XMD3456789FL2zyxwvuts
END_OF_FILE
if test 244 -ne `wc -c <'test.data'`; then
    echo shar: \"'test.data'\" unpacked with wrong size!
fi
# end of 'test.data'
fi
echo shar: End of shell archive.
exit 0

-- 
# Harvey Moran                  #    moran%tron.UUCP@umbc3.UMBC.EDU    #
# Westinghouse Electric Corp.   #    ...!netsys!tron!moran             #
# Electronic Systems Group      #    ...!{wb3ffv,netsys}!hrmhpc!harvey #
# Baltimore, Md.

merlyn@iwarp.intel.com (Randal Schwartz) (09/01/89)

In article <956@mrsvr.UUCP>, davej@mrsvr (David Johnson x4-6506) writes:
| From article <10596@dasys1.UUCP>, by parsnips@dasys1.UUCP (David Parsons):
| 
| > The problem... the database consists of addresses... positions 99 and 100
| > in each record contain a two-position abbreviation for the state.  It's easy
| > to get cut to read those two characters, and grep to identify the state I
| > want to extract, but how the ^#$&! do you then copy the ENTIRE record 
| > thus identified to another file???  Using grep alone is no good because 
| > the abbreviation appears in various other places in the record...
| > 
| > David Parsons
| > Big Electric Cat Public UNIX
| 
| I tried e-mail and it bounced and bounced and . . .
| 
| Assume your "database" (file) is called database.
| Assume your state abbreviation is in a shell vbl called $abbr
| 
| Try this:
| 
| cut -c99,100 database | grep -n "$abbr" | sed 's/:..*$/p/' | ed - foo
| 
| This should print all of the matching lines on stdout.

Hmmppph.  I saw a guy removing a screw with the claw of a claw hammer
the other day.  You remind me of him.  Wrong set of tools, dude!

I can cut those lines in one process:

awk 'substring($0,99,2) == "'"$abbr"'"' database

(I might be off by one on the substring function...)

Just another UNIX toolsmith (who can tell a claw hammer from a straight
screwdriver...),
-- 
/== Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ====\
| on contract to Intel, Hillsboro, Oregon, USA                           |
| merlyn@iwarp.intel.com ...!uunet!iwarp.intel.com!merlyn	         |
\== Cute Quote: "Welcome to Oregon... Home of the California Raisins!" ==/

davej@mrsvr.UUCP (David Johnson x4-6506) (09/05/89)

From article <4885@omepd.UUCP>, by merlyn@iwarp.intel.com (Randal Schwartz):
= In article <956@mrsvr.UUCP>, davej@mrsvr (David Johnson x4-6506) writes:
= | From article <10596@dasys1.UUCP>, by parsnips@dasys1.UUCP (David Parsons):
= | 
= | > The problem... the database consists of addresses... positions 99 and 100
= | > in each record contain a two-position abbreviation for the state.  It's easy
= | > to get cut to read those two characters, and grep to identify the state I
= | > want to extract, but how the ^#$&! do you then copy the ENTIRE record 
= | > thus identified to another file???  Using grep alone is no good because 
= | > the abbreviation appears in various other places in the record...
= | > 
= | > David Parsons
= | 
= | Try this:
= | 
= | cut -c99,100 database | grep -n "$abbr" | sed 's/:..*$/p/' | ed - foo
= 
= Hmmppph.  I saw a guy removing a screw with the claw of a claw hammer
= the other day.  You remind me of him.  Wrong set of tools, dude!

I must take issue with this.  The question was phrased in terms of cut and
grep; my answer responded at the level of the question.  I think my solution
is easier to understand for the "casual" shell user (who doesn't know his
way around awk).  Suppose further that his requirements now changed to 
"delete the ENTIRE record thus identified . . .".  The solution above is
easy to modify for the "casual" user (i.e. one who understands ed).

= I can cut those lines in one process:
= 
= awk 'substring($0,99,2) == "'"$abbr"'"' database
= 
= Just another UNIX toolsmith (who can tell a claw hammer from a straight
= screwdriver...),

But who can't RTFM ;-) (try using "substr" instead).

=
= /== Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ====\
--
David J. Johnson - Computer People Unlimited, Inc. @ GE Medical Systems
gemed!python!davej@crd.ge.com  - OR - sun!sunbird!gemed!python!davej
  "What a terrible thing it is to lose one's mind." - Dan Quayle

jpr@dasys1.UUCP (Jean-Pierre Radley) (09/07/89)

In article <10596@dasys1.UUCP> parsnips@dasys1.UUCP (David Parsons) writes:
>I would like to use a Bourne shell script to extract records from a simple
>database of fixed-length fields terminated with a new-line character.
>The problem... the database consists of addresses... positions 99 and 100
>in each record contain a two-position abbreviation for the state.  It's easy
>to get cut to read those two characters, and grep to identify the state I
>want to extract, but how the ^#$&! do you then copy the ENTIRE record 
>thus identified to another file???  Using grep alone is no good because 
>the abbreviation appears in various other places in the record...

This is a one-liner for awk:

:statefinder
awk '
	{  if ("'$1'" == substr($0, 99, 2) )  print }
' $2


Thus if characters 99 and 100 of your line contain the state,
then typing
	
	statefinder NY datafile

should do what you want.
-- 
Jean-Pierre Radley					      jpr@jpradley.uucp
New York, NY					      72160.1341@compuserve.com