[comp.unix.wizards] of course!

maart@cs.vu.nl (Maarten Litmaath) (11/14/89)

In article <17264@rpp386.cactus.org> jfh@rpp386.cactus.org (John F. Haugh II) writes:
\...
\	isadir (char *path)
\	{
\		char dir[PATH_MAX];
\
\		if (access (path, 0))
\			return 0;
\
\		strcpy (dir, path);
\		strcat (dir, "/x");
\
\		errno = 0;
\		access (dir, 0);
\
\		return errno == 0 || errno == ENOENT;
\	}
\
\We know all of the initial path exists because of the first access()
\call.

Can you say `race condition'?

\And with the second access() call we can discover if the
\last component of `path' isn't a directory since errno would be ENOTDIR
\rather than ENOENT.

	isadir(char *path)
	{
		char dir[PATH_MAX];

		strcpy(dir, path);
		strcat(dir, "/.");

		return access(dir, 0);
	}
-- 
"Richard Sexton is actually an AI program (or Construct, if you will) running
on some AT&T (R) 3B" (Richard Brosseau) | maart@cs.vu.nl, mcsun!botter!maart

jfh@rpp386.cactus.org (John F. Haugh II) (11/14/89)

In article <4526@ski.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
>In article <17264@rpp386.cactus.org> jfh@rpp386.cactus.org (John F. Haugh II) writes:
>\We know all of the initial path exists because of the first access()
>\call.
>
>Can you say `race condition'?

Any number of suitably paranoid conditions exist.

>	isadir(char *path)
>	{
>		char dir[PATH_MAX];
>
>		strcpy(dir, path);
>		strcat(dir, "/.");
>
>		return access(dir, 0);
>	}

On the other hand, not all file systems are going to contain `.' and
`..'.  However, all file systems in the UNIX domain do include files
with names like `x' inside of subdirectories ;-)
-- 
John F. Haugh II                        +-Things you didn't want to know:------
VoiceNet: (512) 832-8832   Data: -8835  | The real meaning of EMACS is ...
InterNet: jfh@rpp386.cactus.org         |   ... EMACS makes a computer slow.
UUCPNet:  {texbell|bigtex}!rpp386!jfh   +--<><--<><--<><--<><--<><--<><--<><---

gwc@root.co.uk (Geoff Clare) (11/20/89)

In article <17303@rpp386.cactus.org> jfh@rpp386.cactus.org (John F. Haugh II) writes:
>>	isadir(char *path)
>>	{
>>		char dir[PATH_MAX];
>>
>>		strcpy(dir, path);
>>		strcat(dir, "/.");
>>
>>		return access(dir, 0);
>>	}
>
>On the other hand, not all file systems are going to contain `.' and `..'.

They don't have to *contain* `.' and `..' for this to work, they only
have to interpret them correctly in pathnames.  The pathname resolution
rules in POSIX.1 guarantee this behaviour.

The fact that the strcat() may write past the end of dir[] is more of
a problem.  Another is that PATH_MAX might not be defined (it should
always be obtained via pathconf() in portable applications).  Anyway,
using a maximum length array is rather wasteful - malloc(strlen(path)+3)
would be much better all round.
-- 
Geoff Clare, UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH
gwc@root.co.uk  (Dumb mailers: ...!uunet!root.co.uk!gwc)  Tel: +44-1-315-6600

jik@athena.mit.edu (Jonathan I. Kamens) (11/23/89)

In article <1051@root44.co.uk> gwc@root.co.uk (Geoff Clare) writes:
=>>	isadir(char *path)
=>>	{
=>>		char dir[PATH_MAX];
=>>
=>>		strcpy(dir, path);
=>>		strcat(dir, "/.");
=>>
=>>		return access(dir, 0);
=>>	}
=The fact that the strcat() may write past the end of dir[] is more of
=a problem.

  Yes, this is definitely a problem.  There should be a check on the
length of path before strcat'ing onto the end of it.

=            Another is that PATH_MAX might not be defined (it should
=always be obtained via pathconf() in portable applications).

  Not convinced this is a major problem.  I've yet to run into a Unix
programming environment that doesn't somewhere in a header file give
you some indication of the maximum path length, although that may be
just my limit Unix experience talking.  I also don't know about
non-Unix C libraries....

=                                                              Anyway,
=using a maximum length array is rather wasteful - malloc(strlen(path)+3)
=would be much better all round.

  This, I must completely disagree with.

  An automatic variable of a procedure is placed on the stack (with
the rest of the procedure's stack frame) when the procedure is called,
and is removed from the stack when the procedure exits.  Unless the
program is using *lots* of memory and stack space, there is very
little likelihood of overflowing the stack.  Furthermore, all of this
allocation is done down at the machine instruction level, not up in
user-written code.

  Malloc, on the other hand, is user code,  It allocates memory that
is part of the data area of the program, and it takes more time than a
procedure call would take to do this *because* it is user code.
Furthermore, if the program's data area isn't big enough to hold the
malloc'd memory, malloc has to increase the datasize, and the data
size will NOT decrease, even after free has been called on the string.
Furthermore, a malloc is far more likely (in my experience) to cause a
program to run out of memory than a procedure call is.

  Therefore, malloc is slower than just putting it in an automatic
variable, it is more likely to cause the program to run out of memory,
and it might even cause the program's data size to grow unnecessarily.

  I therefore fail to see why malloc is a better choice than an
automatic variable.

  Then again, perhaps I'm just confused :-)

Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8495			      Home: 617-782-0710

brnstnd@stealth.acf.nyu.edu (Dan Bernstein) (11/23/89)

In article <1051@root44.co.uk> gwc@root.co.uk (Geoff Clare) writes:
> In article <17303@rpp386.cactus.org> jfh@rpp386.cactus.org (John F. Haugh II) writes:
> >		char dir[PATH_MAX];
> >		strcpy(dir, path);
> >             strcat(dir, "/.");
> Anyway,
> using a maximum length array is rather wasteful - malloc(strlen(path)+3)
> would be much better all round.

Ummm, no. PATH_MAX is much smaller than the page size on most machines,
so it almost certainly won't change the actual memory use of the process.
In contrast, malloc(strlen()) is noticeably slower, a bit less readable,
and prone to failure.

Unless a lot of memory is tied up in such arrays, malloc() isn't worth
it. Don't waste dynamic memory for static uses.

Followups to comp.lang.c so I don't have to read them.

---Dan

martin@mwtech.UUCP (Martin Weitzel) (11/23/89)

In article <1051@root44.co.uk> gwc@root.co.uk (Geoff Clare) writes:
[first part deleted]
>The fact that the strcat() may write past the end of dir[] is more of
>a problem.  Another is that PATH_MAX might not be defined (it should
>always be obtained via pathconf() in portable applications).  Anyway,
>using a maximum length array is rather wasteful - malloc(strlen(path)+3)
>would be much better all round.

But then you should not forget to check the result (in a portable approach
you cannot assume these huge amount of memory, some people are used to
have - or ULIMIT may be set low in the environment, the program runs).
And of course: Don't forget to do a 'free'. And what should the programm
do, if there is no memory from malloc() - abort? The one who uses the
directory-check function might not like this ...

Ehhm, what was the original question? Something faster than 'access()'.
Seems to be the time to stop this now ....

MW

gwyn@smoke.BRL.MIL (Doug Gwyn) (11/24/89)

In article <1989Nov22.224209.28911@athena.mit.edu> jik@athena.mit.edu (Jonathan I. Kamens) writes:
>Unless the program is using *lots* of memory and stack space, there is very
>little likelihood of overflowing the stack.

Many implementations have severe constraints on stack size.  For example,
on Gould PowerNode series running UTX-32 (based on 4.3BSD), the stack
size is fixed at link time, typically only a few kilobytes.  Allocating
large auto arrays can easily cause the stack limit to be exceeded.

The three major alternatives are:
	auto arrays, with the problem just mentioned
	static arrays, which waste program data space
	malloc()ed storage, which costs more time.

In most situations I would opt for the latter.

mark@jhereg.Minnetech.MN.ORG (Mark H. Colburn) (11/25/89)

In article <1989Nov22.224209.28911@athena.mit.edu> jik@athena.mit.edu (Jonathan I. Kamens) writes:
>In article <1051@root44.co.uk> gwc@root.co.uk (Geoff Clare) writes:
>=            Another is that PATH_MAX might not be defined (it should
>=always be obtained via pathconf() in portable applications).

_POSIX_PATH_MAX will always be defined in limits.h and will always have a
value of 255. (ss2.9.2)  However, some systems have virtually unlimited 
pathname limits.  In these cases PATH_MAX will not be defined and the 
application should be smart enough to do the right thing. (ss2.9.4)

Checking pathconf for the PATH_MAX value may also yield an undefined
result.  In this case, the application should assume that pathname lengths
are unlimited, and again, do the right thing.  (ss5.7.1.3)

-- 
Mark H. Colburn                       mark@Minnetech.MN.ORG
Open Systems Architects, Inc.

richard@aiai.ed.ac.uk (Richard Tobin) (11/26/89)

In article <11674@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>Many implementations have severe constraints on stack size.

Some at least.

>For example, on Gould PowerNode series running UTX-32 (based on 4.3BSD),
>the stack size is fixed at link time, typically only a few kilobytes.  
 ...
>The three major alternatives are:

(4) Don't buy such machines.  We all know that all the world is not a vax,
    and that we mustn't dereference null pointers, but some machines just
    aren't worth the pain, given that there are plenty of sane systems
    available.  In my opinion, Goulds are among them.

-- Richard

-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

peter@ficc.uu.net (Peter da Silva) (11/29/89)

In article <1989Nov22.224209.28911@athena.mit.edu> jik@athena.mit.edu (Jonathan I. Kamens) writes:
> Furthermore, a malloc is far more likely (in my experience) to cause a
> program to run out of memory than a procedure call is.

I take it you have never worked on a machine with a fixed-size stack.

And if your stack size is variable, then your big allocation can still
increase your stack segment, and it's still never going to decrease. So you
still lose there.

If you're *real* worried about this, the best solution is a stack-like
memory allocator. Here's a totally bare-bones one that falls back on
malloc. It's *NOT* a malloca replacement, if anyone's thinking of that.

------allot.c:
#include "allot.h"
char *here = lotsaram;

------allot.h:
char lotsaram[BIG];
char *here;

#define ALLOT(n) ( (here+(n) > &lotsaram[BIG]) ? malloc(n) : (here += (n)) )
#define FORGET(p) ( ((p) >= lotsaram && (p) <= &lotsaram[BIG]) ? (here = (p)) : free(p) )

------isadir.c:
#include "allot.h"

isadir(string)
char *string;
{
	int len;
	char *copy;
	int res;

	len = strlen(string)
	copy = ALLOT(len+2);

	if(!copy) return -1;

	strcpy(copy, string);
	strcpy(&copy[len], "/.");

	res = access(copy, 0);

	FORGET(copy);
	return res != -1;
}
-- 
`-_-' Peter da Silva <peter@ficc.uu.net> <peter@sugar.lonestar.org>.
 'U`  --------------  +1 713 274 5180.
"The basic notion underlying USENET is the flame."
	-- Chuq Von Rospach, chuq@Apple.COM

jfh@rpp386.cactus.org (John F. Haugh II) (11/29/89)

In article <7132@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
>I take it you have never worked on a machine with a fixed-size stack.

and you've never seen a heap/stack collision.

>And if your stack size is variable, then your big allocation can still
>increase your stack segment, and it's still never going to decrease. So you
>still lose there.

it also doesn't take care to properly align the return pointers and
a few other nasties.

the solution is to only use the right amount of memory in the first
place.  whether it comes off the heap or the stack seems immaterial
if they both share the same small address space.
-- 
John F. Haugh II                        +-Things you didn't want to know:------
VoiceNet: (512) 832-8832   Data: -8835  | In Ham lingo DEC is rot-13 for "Low
InterNet: jfh@rpp386.cactus.org         | Power".  "CPU?"  "QRP Vax-11."
UUCPNet:  {texbell|bigtex}!rpp386!jfh   +--------------------------------------

gwc@root.co.uk (Geoff Clare) (11/29/89)

In article <1989Nov22.224209.28911@athena.mit.edu> jik@athena.mit.edu (Jonathan I. Kamens) writes:
>In article <1051@root44.co.uk> gwc@root.co.uk (Geoff Clare) writes:
>=            Another is that PATH_MAX might not be defined (it should
>=always be obtained via pathconf() in portable applications).
>
>  Not convinced this is a major problem.  I've yet to run into a Unix
>programming environment that doesn't somewhere in a header file give
>you some indication of the maximum path length,

I don't know of any current systems like this, either, but it's entirely
possible there will be systems like this in the near future.  That's
because POSIX.1 says that PATH_MAX is optional and provides pathconf()
to obtain its value at runtime.

Even if there is a PATH_MAX in <limits.h> it should not be used to
dimension an array which is assumed to be able to hold any path name,
because the actual maximum is allowed to be higher.  E.g. an NFS
mounted file system might allow longer names than local file systems.

>  I therefore fail to see why malloc is a better choice than an
>automatic variable.

Given that the length of an array to hold all possible path names is
not known until runtime, you have no choice but to use malloc().  So
why not just malloc only what you need?

Your objections on performance grounds are unfounded.  Only the first
call to the routine will be slower.  After that malloc() will just
adjust a few pointers.
-- 
Geoff Clare, UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH
gwc@root.co.uk  (Dumb mailers: ...!uunet!root.co.uk!gwc)  Tel: +44-1-315-6600

lm@snafu.Sun.COM (Larry McVoy) (12/05/89)

In article <1989Nov22.224209.28911@athena.mit.edu> jik@athena.mit.edu (Jonathan I. Kamens) writes:
>In article <1051@root44.co.uk> gwc@root.co.uk (Geoff Clare) writes:
>=>>	isadir(char *path)
>=>>	{
>=>>		char dir[PATH_MAX];
>=>>
>=>>		strcpy(dir, path);
>=>>		strcat(dir, "/.");
>=>>
>=>>		return access(dir, 0);
>=>>	}
>=The fact that the strcat() may write past the end of dir[] is more of
>=a problem.
>
>  Yes, this is definitely a problem.  There should be a check on the
>length of path before strcat'ing onto the end of it.
>
>=            Another is that PATH_MAX might not be defined (it should
>=always be obtained via pathconf() in portable applications).
>
>  Not convinced this is a major problem.  I've yet to run into a Unix
>programming environment that doesn't somewhere in a header file give
>you some indication of the maximum path length, although that may be
>just my limit Unix experience talking.  I also don't know about
>non-Unix C libraries....

The standard programming practice that I've seen is

isadir(char *path)
{
	char	dir[_POSIX_PATH_MAX];

	/* etc */
}

I don't think this is a good thing to do.  The reason is that the 
_POSIX_* values are intended to be the minimum allowable values.
That means, you can alwasy count on *at least* _POSIX_PATH_MAX space
in a pathname, but it may be more.  The better model, albeit slower, is

isadir(char *path)
{
	char	*dir;

	dir = malloc(pathconf(path, _PC_PATH_MAX));

	/* etc */
}

The whole point is that this value could change.  Consider a diskless
machine with NFS, RFS, or God knows what mounted up.  One remote filesystem
might be a System V (14 char names); another might be DOS.  So you really
need to ask.  As far as I know there is no constant that will work all
the time.  I don't like it either, but what is a better solution?  I can't
see anyway to provide a fixed answer in the face of remote file systems.

	 What I say is my opinion.  I am not paid to speak for Sun.

Larry McVoy, Sun Microsystems                          ...!sun!lm or lm@sun.com

joey@tessi.UUCP (Joe Pruett) (12/08/89)

I know this is a GNU-ism, but local variable sized arrays seem
like the best of all worlds.  For example:

isadir(path)
char *path;
{
	char tpath[strlen(path) + 3];

	strcpy(tpath, path);
	strcat(tpath, "/.");
	.
	.
	.
}

It is fairly quick, only uses as much memory as is necessary, and
the storage automatically goes away at the end of the function.
This is one of the best additions to C that I've seen.  It's intuitive,
let's the compiler do the yucky work (alloca is a hack), and add's
a very useful feature.

Yes, 3 should be calculated instead of a constant, but I gotta
have something for people to complain about... :-)

bri@boulder.Colorado.EDU (Brian Ellis) (12/10/89)

In article <563@balthmus.tessi.UUCP> joey@tessi.UUCP (Joe Pruett) writes:
>isadir(path)
>char *path;
>{
>	char tpath[strlen(path) + 3];
[deleted]

	this declaration is not legal, though. You must give the size at
	*compile* time. You'll get the error "constant expected"

	-brian ellis (bri@boulder.Colorado.EDU)

blm@6sceng.UUCP (Brian Matthews) (12/10/89)

In article <14738@boulder.Colorado.EDU> bri@boulder.Colorado.EDU (Brian Ellis) writes:
|In article <563@balthmus.tessi.UUCP> joey@tessi.UUCP (Joe Pruett) writes:
|>isadir(path)
|>char *path;
|>{
|>	char tpath[strlen(path) + 3];
|[deleted]
|	this declaration is not legal, though. You must give the size at
|	*compile* time. You'll get the error "constant expected"

Go back and read <563@balthmus.tessi.UUCP> again.  The declaration is
one of gcc's extensions to C (all of which can be turned off, of course).
It's equivalent to

	char *tpath = alloca (strlen (path) + 3);
-- 
Brian L. Matthews	blm@6sceng.UUCP