[comp.unix.wizards] "dd conv=unblock cbs=80 - really grep replacement"

jad@insyte.UUCP (Jill Diewald) (06/30/88)

In article <10104@tekecs.TEK.COM> andrew@frip.gwd.tek.com (Andrew Klossner) writes:
>>> Can't you just do: "dd conv=unblock cbs=80 (or whatever)" to
>>> convert the file to standard Unix \n-terminated lines?  Hasn't this been
>>> part of Unix since at least v6?
>
>> Apparently not:  neither System III nor System V r3.1 supports it.  (I used
>> "strings" on both systems, to make sure it wasn't merely undocumented).
>
>Either your "strings" is busted or you have a crippled V3.1.  The
>vanilla AT&T 3.1 source tape includes a dd.c that implements this
>command.

This is a reply to my original request for a way to use grep on
files with very long fixed record lengths.

Two things: first I am using HP-UX which does not have 'conv=block'
documented so its probably not implemented either.

More importantly, this is NOT a good solution.  (I could easily write
a c program to reformat the file if I wanted to.)  The files that
we deal with are very very large data files.  Making a reformated
copy of the file is not a good solution.  I could just as easily
(in fact this is what I usually do) execute emacs, go to lunch while 
it loads, and then use emacs to do the searches - which is easier than 
grep anyway.  Making a reformated copy 1) uses up too much disk space,
and 2) takes too long.

Ideally, grep could be passed a record size which it would use instead
of newlines.  And/Or grep could be told to only search specified columns
of every record.  This would enable me to easily deal with these files 
without tieing up either memory or disk space.

Jill Diewald
Innovative Systems 
Newton, Ma
... harvard!axiom!insyte!jad

guy@gorodish.Sun.COM (Guy Harris) (07/01/88)

> >Either your "strings" is busted or you have a crippled V3.1.  The
> >vanilla AT&T 3.1 source tape includes a dd.c that implements this
> >command.
> 
> Two things: first I am using HP-UX which does not have 'conv=block'
> documented so its probably not implemented either.

Bad assumption.  UNIX systems have lots of features that aren't documented.
S5R3's documentation doesn't mention "conv=block", or any of the other V7 or
4BSD "dd" features, but they're all there.

> More importantly, this is NOT a good solution.  (I could easily write
> a c program to reformat the file if I wanted to.)  The files that
> we deal with are very very large data files.  Making a reformated
> copy of the file is not a good solution.

How about piping the output of "dd" to "grep"?  This obviates the need to make
a reformatted copy.

> Ideally, grep could be passed a record size which it would use instead
> of newlines.  And/Or grep could be told to only search specified columns
> of every record.  This would enable me to easily deal with these files 
> without tieing up either memory or disk space.

I'm sure there are zillions of options that could be added to "grep" that would
make somebody's life easier.  However, if the same job can be done with a
pipeline, it's not clear that "grep" *should* have all those options added.
"grep" is intended to work on UNIX text files; extending it to work on various
other random file formats might be nice, but it might also cause a lot of
rather specialized options to be added, most of which are good at *some* jobs
but not for other closely related jobs.

I think the demand for this particular capability is sufficiently low that
you're not likely to see it provided in "gre" proper; however, I seem to
remember reading that the guts of "gre" would be provided as library routines,
so if one has a specialized task such as the one described, and the existing
UNIX tools either couldn't do it or were too slow, one could write a
specialized program to do it.

chris@mimsy.UUCP (Chris Torek) (07/01/88)

In article <145@insyte.UUCP> jad@insyte.UUCP (Jill Diewald) writes:
>Two things: first I am using HP-UX which does not have 'conv=block'
>documented so its probably not implemented either.

(silly of them :-) )

>More importantly, this is NOT a good solution. ... The files that
>we deal with are very very large data files.  Making a reformated
>copy of the file is not a good solution.

Think `tools'.  Think `pipes'.

	dd if=big_data_file cbs=<recordsize> conv=unblock | grep <regexp>
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

andrew@alice.UUCP (07/05/88)

the job of grep is to search text files which have a conventional
structure of \n terminated lines (note current greps may or may not
print matching last lines without a trailing \n). what should grep
do when it finds no such newline within shouting distance of a match?
obviously (to me) it should complain about line too long. but
can it produce useful output? in general, yes. my feeling is that it
should print some window around the match (say 256? bytes) so that
users can use the -b (hopefully meaning byte offset) to find out where
the match really is. this way, normal input is not affected (either
semantically or performance) and people with non-newline (BUT text)
input can put together a script to do what they want.

allbery@ncoast.UUCP (Brandon S. Allbery) (07/09/88)

As quoted from <145@insyte.UUCP> by jad@insyte.UUCP (Jill Diewald):
+---------------
| Ideally, grep could be passed a record size which it would use instead
| of newlines.  And/Or grep could be told to only search specified columns
| of every record.  This would enable me to easily deal with these files 
| without tieing up either memory or disk space.
+---------------

	dd if=... conv=unblock cbs=... | cut -c... | grep ...

(or replace "dd" with an unblocker program)

Cro-magnons win out over Neanderthals again.  ;-)

-> If you don't have "cut", that should tell you something.  What it tells
you depends on which variant of Unix you have.
-- 
Brandon S. Allbery, uunet!marque!ncoast!allbery			DELPHI: ALLBERY
	    For comp.sources.misc send mail to ncoast!sources-misc

jad@insyte.UUCP (Jill Diewald) (07/12/88)

>>More importantly, this is NOT a good solution. ... The files that
>>we deal with are very very large data files.  Making a reformated
>>copy of the file is not a good solution.
>
>Think `tools'.  Think `pipes'.
>
>	dd if=big_data_file cbs=<recordsize> conv=unblock | grep <regexp>
>-- 
>In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
>Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris


Yes I know.  However we really need to be able to "look around" in these
files so several greps might be neccessary.  Reformatting the entire 
file before each grep is slow - and a temporary file takes up too much disk
space.  There doesn't seem to be a good solution.  There are many ways to 
do this but none are very good.

Jill Diewald
INSYTE
Newton, MA