[comp.lang.c] string manipulation

chad@lakesys.UUCP (Chad Gibbons) (09/26/88)

        For several applications it would become necessary to remove a small
string from a larger one. There appears to be an easy way to do this, and here
is a sample piece of code I was testing :

    char *squeeze(cs, ct)
        const char cs[];
        const char ct[];
    {
        register int i;
        int j,x;

        j = strcspn(cs, ct);
        for (i = j; cs[i] != '\0'; i++)
            cs[i] = cs[i + x];
        return (char *)s;
    }
It seems the algorithm should work. However, if nothing else is wrong, one
basically major thing is: you cannot over step the array bounds. If you do,
you get a nifty little core dump error. So, I'm looking for a good way to do
this. I tried it with pointers, but I had worse luck than with this. For all
I know there is some obscure library function to do this, but I didn't find
one in our library. Any one have a better way to do it? Post it, inform us
all. I'm curious to see if this can be done with explicit pointers and not
just with array indexes.

leo@philmds.UUCP (Leo de Wit) (09/27/88)

In article <1061@lakesys.UUCP> chad@lakesys.UUCP (Evil Iggy's illegitimate twin brother) writes:
>
>        For several applications it would become necessary to remove a small
>string from a larger one. There appears to be an easy way to do this, and here
>is a sample piece of code I was testing :
>
>    char *squeeze(cs, ct)
>        const char cs[];
>        const char ct[];
>    {
>        register int i;
>        int j,x;
>
>        j = strcspn(cs, ct);
>        for (i = j; cs[i] != '\0'; i++)
>            cs[i] = cs[i + x];
>        return (char *)s;
>    }

Since you asked to post:

Strcspn does not do what you want it to. It will return the index of
the first character in cs[] that is equal to one of the characters in
ct[].  Unless ct[] is exactly one character long this is not what you
want.  Ansi seems to support a function called strstr(), which might
just return a pointer to a substring within a string. I say 'might'
because I've no documentation on strstr() available, but the analogy
with strchr() makes me think this (can someone confirm this?). If you
don't have strstr(), you can roll your own; note that you generally
don't need to examine all characters of cs[] (ct[]) to recognize a
match; there are very fast algorithms , so check your literature
(keywords: Boyer-Moore).

As for your function parameters, I think the declaration
        const char cs[];
should really be
        char cs[];
since the contents of ct[] (the characters of the array) gets changed.
If I understand it well, the substitution is meant to be in place
(don't know where s is coming from, though...). If not, you can of
course malloc room for a copy within the function, or supply a pointer
to a buffer as a parameter to your function, which buffer can then be
filled by the function.

Supposed the substitution is in place:

#include <string.h>

char *squeeze(cs, ct)
char cs[];
const char ct[];
{
    char *s, *t;

    if ((t = strstr(cs,ct)) == (char *)0) {
        return (char *)0;
    }
    s = t + strlen(ct);
    if (t == cs) {
        return s;
    }
    strcpy(t,s);
    return cs;
}

a) Note this version avoids to do a copy if ct[] is a prefix of cs[].
b) If we name the parts of cs[]: A, B and C where B matches ct[], this
function copies C over B (unless A is empty, in which case a pointer to
C is returned). If C is larger than A it is worthwhile to consider
copying A over B (reverse, aligned on the end). It is questionnable whether
determining the sizes does involve too much overhead, so I'll give a version
for this case without testing for whatever is larger:

#include <string.h>

char *squeeze(cs, ct)
char cs[];
const char ct[];
{
    char *s, *t;

    if ((t = strstr(cs,ct)) == (char *)0) {
        return (char *)0;
    }
    s = cs + strlen(ct);
    memmove(cs,s,t - cs);
    return s;
}

This is, assuming memmove knows how to deal with overlapping pieces of memory
(now where is that copy of the ANSI Draft...)

                      Leo.