[comp.unix.shell] ok, i've got a question...

calvin@sequent.UUCP (Calvin Goodrich) (09/25/90)

...for the unix.gods out there. i have a file that has a whole mess of
null characters in it ('bout 1/2 a meg). is there any way (preferably
a shell script) to strip them off?

thanx,

calvin.

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (09/26/90)

In article <42900@sequent.UUCP> calvin@sequent.UUCP (Calvin Goodrich) writes:
: ...for the unix.gods out there. i have a file that has a whole mess of
: null characters in it ('bout 1/2 a meg). is there any way (preferably
: a shell script) to strip them off?

If your tr works like mine, you can just say

	tr '' '' <foo >bar

Other possibilities:

	sed '' <foo >bar
	perl -pe 's/\0//g' <foo >bar

Larry Wall
lwall@jpl-devvax.jpl.nasa.gov

karl_kleinpaste@cis.ohio-state.edu (09/26/90)

calvin@sequent.uucp writes:
   i have a file that has a whole mess of
   null characters in it ('bout 1/2 a meg). is there any way (preferably
   a shell script) to strip them off?

tr -d '\0'

ted@nmsu.edu (Ted Dunning) (09/26/90)

In article <42900@sequent.UUCP> calvin@sequent.UUCP (Calvin Goodrich) writes:


   ...for the unix.gods out there. i have a file that has a whole mess of
   null characters in it ('bout 1/2 a meg). is there any way (preferably
   a shell script) to strip them off?

tr without any options will do it.

(it's a bug ... it's a feature ...)


--
ted@nmsu.edu					+---------+
						| In this |
						|  style  |
						|__10/6___|

jik@athena.mit.edu (Jonathan I. Kamens) (09/26/90)

In article <42900@sequent.UUCP>, calvin@sequent.UUCP (Calvin Goodrich) writes:
|> ...for the unix.gods out there. i have a file that has a whole mess of
|> null characters in it ('bout 1/2 a meg). is there any way (preferably
|> a shell script) to strip them off?

  Well, since "tr" deletes NULLs from its input, you could do "tr '' '' <
filename > filename.nonulls".

  Then again, "sed" apparently also deletes NULLs, so you could do something
similar with it: "sed -n p < filename > filename.nonulls".

  Both of those solutions are pretty much just hacks that rely on the fact
that tr and sed delete NULLs.  There's probably a more correct (i.e. it's
doing what it's doing explicitly, rather than relying on a fluke in a program)
solution in perl, but I'm religiously against posting perl scripts to the net,
since so many other people do it so much better than I :-).

-- 
Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8495			      Home: 617-782-0710

brister@decwrl.dec.com (James Brister) (09/26/90)

On 25 Sep 90 18:59:26 GMT, lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) said:

> In article <42900@sequent.UUCP> calvin@sequent.UUCP (Calvin Goodrich) writes:
> : ...for the unix.gods out there. i have a file that has a whole mess of
> : null characters in it ('bout 1/2 a meg). is there any way (preferably
> : a shell script) to strip them off?

> Other possibilities:

> 	sed '' <foo >bar

I try to avoid this unless I know there'll be a new line fairly frequently
(which in a file full of nulls is *usually* unlikely). sed (at least my
version or it) has a line length limit that can cause problems here.

> 	perl -pe 's/\0//g' <foo >bar

Of course :-)

James
--
James Brister                                           brister@decwrl.dec.com
DEC Western Software Lab., Palo Alto, CA    {uunet,sun,pyramid}!decwrl!brister

lugnut@sequent.UUCP (Don Bolton) (09/26/90)

In article <9651@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:
>In article <42900@sequent.UUCP> calvin@sequent.UUCP (Calvin Goodrich) writes:
>: ...for the unix.gods out there. i have a file that has a whole mess of
>: null characters in it ('bout 1/2 a meg). is there any way (preferably
>: a shell script) to strip them off?
>
>If your tr works like mine, you can just say
>
>	tr '' '' <foo >bar
>
>Other possibilities:
>
>	sed '' <foo >bar
>	perl -pe 's/\0//g' <foo >bar

AWK AWK ACKKKK :-)

awk -f filebelow <oldlist >newlist

{ for (i = 1; i <= NF; i = i + 1)
     { if (i >= NF)
	  printf("%s",$i)
     else
	  printf("%s ", $i) 
     }
printf("\n")
}

course I assume the "null" characters are just blanks here

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (09/27/90)

In article <42947@sequent.UUCP> lugnut@sequent.UUCP (Don Bolton) writes:
: In article <9651@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:
: >In article <42900@sequent.UUCP> calvin@sequent.UUCP (Calvin Goodrich) writes:
: >: ...for the unix.gods out there. i have a file that has a whole mess of
: >: null characters in it ('bout 1/2 a meg). is there any way (preferably
: >: a shell script) to strip them off?
: >
: >If your tr works like mine, you can just say
: >
: >	tr '' '' <foo >bar
: >
: >Other possibilities:
: >
: >	sed '' <foo >bar
: >	perl -pe 's/\0//g' <foo >bar
: 
: AWK AWK ACKKKK :-)
: 
: awk -f filebelow <oldlist >newlist
: 
: { for (i = 1; i <= NF; i = i + 1)
:      { if (i >= NF)
: 	  printf("%s",$i)
:      else
: 	  printf("%s ", $i) 
:      }
: printf("\n")
: }

ACKKKK is right.

This simply dumps core on my machine.  Probably line length limitation.

The sed solution apparently works because nulls are weeded out on input
and never put into the pattern buffer.  No source handy, alas...

Larry

lugnut@sequent.UUCP (Don Bolton) (09/27/90)

In article <9677@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:
>In article <42947@sequent.UUCP> lugnut@sequent.UUCP (Don Bolton) writes:
>: In article <9651@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:
>: >In article <42900@sequent.UUCP> calvin@sequent.UUCP (Calvin Goodrich) writes:
>: >: ...for the unix.gods out there. i have a file that has a whole mess of
>: >: null characters in it ('bout 1/2 a meg). is there any way (preferably
>: >: a shell script) to strip them off?
>: >
>: >If your tr works like mine, you can just say
>: >
>: >	tr '' '' <foo >bar
>: >
>: >Other possibilities:
>: >
>: >	sed '' <foo >bar
>: >	perl -pe 's/\0//g' <foo >bar
>: 
>: AWK AWK ACKKKK :-)
>: 
>: awk -f filebelow <oldlist >newlist
>: 
>: { for (i = 1; i <= NF; i = i + 1)
>:      { if (i >= NF)
>: 	  printf("%s",$i)
>:      else
>: 	  printf("%s ", $i) 
>:      }
>: printf("\n")
>: }
>
>ACKKKK is right.
>
>This simply dumps core on my machine.  Probably line length limitation.

Hmmmm.. did you try cat oldlist | awk -f filebelow > newlist ? Thats
the way I've been running it. Also, on line 2 >= NF can be changed to
== NF (this was my first venture into deeper awk actions) That shouldn't
be the cause of the core dump though.

>The sed solution apparently works because nulls are weeded out on input
>and never put into the pattern buffer.  No source handy, alas...
>
>Larry

jik@athena.mit.edu (Jonathan I. Kamens) (09/28/90)

In article <42947@sequent.UUCP>, lugnut@sequent.UUCP (Don Bolton) writes:
|> awk -f filebelow <oldlist >newlist
|> 
|> { for (i = 1; i <= NF; i = i + 1)
|>      { if (i >= NF)
|> 	  printf("%s",$i)
|>      else
|> 	  printf("%s ", $i) 
|>      }
|> printf("\n")
|> }
|> 
|> course I assume the "null" characters are just blanks here

  First of all, the assumption that the nulls are supposed to represent blanks
in the text is faulty, and is (as far as I can tell) in no way a valid
assumption given the data that was provided by the original poster. 
Furthermore, there is no reason to make that assumption, since other posters
have posted solutions which do not.

  Note that the original poster did not say that he wanted to replace the
nulls with spaces (which is what your solution does), he said that he wanted
to remove them altogether.

  Second, as Larry Wall already pointed out, your solution will coredump on a
lot of systems.

  Third, your solution deletes extra space between words.  If I have a line
which appears as "foo          bar" in the input, it will appear as "foo bar"
in the output.

  Fifth, the awk on my system (4.3BSD) loses anything on the line after the
first null.  Therefore, "foo^@^@^@bar" turns into "foo".  Presumably, your
version doesn't do this, else you wouldn't have posted your solution, so you
have portability concerns.  There are still other versions of awk (e.g. GNU
awk) that keep nulls intact.

  Sixth, the awk code you posted is suboptimal in at least three different
ways.  For example, if you look runs from 1 to NF, how can i ever be greater
than NF inside the body of the loop?  Here's a piece of code that does the
same thing (although, like I've said, I don't think it's the right thing to
do):

   {
      for (i = 1; i < NF; i++)
         printf("%s ", $i)
      printf("%s\n", $NF)
   }

-- 
Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8495			      Home: 617-782-0710

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (09/28/90)

In article <43048@sequent.UUCP> lugnut@sequent.UUCP (Don Bolton) writes:
: Hmmmm.. did you try cat oldlist | awk -f filebelow > newlist ? Thats
: the way I've been running it.

Still dumps.  As I said, probably a line length limitation, which cat
would have no effect on.  Perhaps you're running gawk?  Dave is doing
a good job with that.  Nawk (the version I have, anyway) complains
about "input record `...' too long".

Tsk, tsk.  Well, at least it checks.  Give it half credit.

Larry

lugnut@sequent.UUCP (Don Bolton) (09/28/90)

In article <1990Sep27.170227.5257@athena.mit.edu> jik@athena.mit.edu (Jonathan I. Kamens) writes:
>In article <42947@sequent.UUCP>, lugnut@sequent.UUCP (Don Bolton) writes:
>|> awk -f filebelow <oldlist >newlist
>|> 
>|> { for (i = 1; i <= NF; i = i + 1)
>|>      { if (i >= NF)
>|> 	  printf("%s",$i)
>|>      else
>|> 	  printf("%s ", $i) 
>|>      }
>|> printf("\n")
>|> }
>|> 
>|> course I assume the "null" characters are just blanks here
>
>  First of all, the assumption that the nulls are supposed to represent blanks
>in the text is faulty, and is (as far as I can tell) in no way a valid
>assumption given the data that was provided by the original poster. 
>Furthermore, there is no reason to make that assumption, since other posters
>have posted solutions which do not.
>
This is true, alas, I work with RDBMS products such as Oracle and Informix
and am used to seeing nulls represented as blank spaces.

>  Note that the original poster did not say that he wanted to replace the
>nulls with spaces (which is what your solution does), he said that he wanted
>to remove them altogether.
>
Actualy what my program does do is strip out multiple blanks and replaces
them with one blank space.

>  Second, as Larry Wall already pointed out, your solution will coredump on a
>lot of systems.
>
This is not a point I would have considered, as it runs fine on my machine
and is really merely a modified example from the awk programming language
book I have.

>  Third, your solution deletes extra space between words.  If I have a line
>which appears as "foo          bar" in the input, it will appear as "foo bar"
>in the output.
>
Which was my intent. (I did do *something* right) :-)

>  Fifth, the awk on my system (4.3BSD) loses anything on the line after the
>first null.  Therefore, "foo^@^@^@bar" turns into "foo".  Presumably, your
>version doesn't do this, else you wouldn't have posted your solution, so you
>have portability concerns.  There are still other versions of awk (e.g. GNU
>awk) that keep nulls intact.
>
Don't know bout this one....

>  Sixth, the awk code you posted is suboptimal in at least three different
>ways.  For example, if you look runs from 1 to NF, how can i ever be greater
>than NF inside the body of the loop?  Here's a piece of code that does the
>same thing (although, like I've said, I don't think it's the right thing to
>do):
>
i cannot be greater than NF, this bozo hosehead here tried to use an
assignment operand as an equality operator, in a fit of "what the fu**",
I tossed in the > and bingo it ran.

16 months ago I was a telemonkey (read telemarketer) with <NULL programming
experience. Because of the application generators associated with the RDBMS
packages I found RDBMS programming to be easy and a LOT more enjoyable than
dialing for dollars. I'm still bumping my way through shell programming,
though not an expert, I can do whatever I need to with it. awk is something
I just recently started playing with and the program you saw was my first
forray beyond {print "some text", $1} useage..

I'll learn, thanks for the pointers..

>   {
>      for (i = 1; i < NF; i++)
>         printf("%s ", $i)
>      printf("%s\n", $NF)
>   }
>
>-- 
>Jonathan Kamens			              USnail:
>MIT Project Athena				11 Ashford Terrace
>jik@Athena.MIT.EDU				Allston, MA  02134
>Office: 617-253-8495			      Home: 617-782-0710

Half lug, half nut