[comp.sys.ibm.pc] Unix-like global expansion in DOS

mike2@lcuxa.UUCP (M S Slomin) (11/04/88)

Have you wished to be able to specify wildcards in DOS the way you
can in Unix?  I certainly have.  The brain-damaged * (which matches
the remainder of the string, and not portions that are followed by
a specification, such as *a to match all filenames ending with the
letter a) and lack of range specifications (like [a-f]*, etc.) have
driven me to distraction, as has the requirement that the filename
and extension be matched separately with a period between them.

Last month, Alan Strassberg posted a public domain 'gmatch'
function that implements string matching capabilities similar to
those implemented by the Bourne shell.  This stimulated me to see
if it could be used in DOS.  What follows is the result of pasting
together three existing sources of code, and adding my own code to
fill the interstices.  The result seems to work -- probably
independently of memory model in the TurboC version, but maybe with
memory model problems in the MSC version because of the funny
pointer conversions used to implement findfirst/findnext/setdta/getdta.  
I've compiled and run the code in the large model under MSC4.0 and
MSC5.0 without problem, but nevertheless I'm not convinced that it
is bulletproof in other than the small model.  Perhaps others might
improve it; I'm satisfied for the time being.

Operation: 
	1.  The first operative line of code in a C program after
	    the declarations should be: 
			argv = exparg(&argc, argv);
	    having first been preceded by the declaration:
			extern char **exparg();

	2.  If so, arguments to the C program will be expanded,
	    and will populate a replacement set of argv[1],
	    argv[2],... argv[argc-1] strings, and argc will be
	    readjusted.  Thereafter, the program can be written 
	    to access argv[]/argc as if the expanded versions
	    had been placed there by the operating system.

	3.  * will match zero or more of any character; ? will match
	    a single character (but not zero occurrences); [a-d] will 
	    match a single character in the range 'a' through 'd'; 
	    [!a-d] will match any single character except a character 
	    in the range 'a' through 'd'.

	4.  The period between the filename root and its extension
	    need not be stated explicitly.  Thus, the pattern
				a*e
	    will match 'abacus.exe' as well as 'axyz.e' and
	    'apple'.

	
Size:

The following code size differences resulted when a simple-minded
test program was compiled with and without the exparg code:

MSC4.0		  1670 bytes added (small model)
MSC5.0 		  1670 bytes added (small model)
TURBOC1.5	  1446 bytes added (small model)

(Not bad for what it gives you!)
===================================================================
                               CODE
===================================================================
                                 
/*  Compilation options:
*/

/* #define MSC		/*  for Microsoft C */
/* #define TURBOC	/*  for TurboC 1.0 or 1.5 */
#define ATTRIB 1	/*  search only for normal files, including
				read-only ones */
/* #define ATTRIB 0	/*  search only for normal writable files  */
/* #define ATTRIB 0x3f	/*  search for all file names, including hidden
			    and system ones, directory names and . and .. */
/* #define TEST		/*  see it work  */



/*  Credits:

1. 	The first/next/getdta/setdta routines are based
	(loosely) on a public domain "Sample wildcard processor 
	in Lattice C" by Alan Losoff,  Milwaukee WI

2.	The exparg code is a modification of wildcard expansion
	code originally written for TurboC 1.0 by:
			Richard Hargrove
			Texas Instruments, Inc.
			P.O. Box 869305, m/s 8473
			Plano, Texas 75086
			214/575-4128
	and posted to USENET in Sept., 1987.

3.	The gmatch code was posted to USENET by Alan Strassberg,
	Lockheed, Santa Cruz, CA in Oct., 1988.  His posting
	indicated that it was derived from a posting to 
	comp.os.minix.

4.	The remainder, such as it may be, is mine, Mike Slomin, 
	bellcore!lcuxa!mike2, and may be used for any purpose.
*/

#include <stdio.h>
#include <ctype.h>
#include <dos.h>
#include <string.h>

#ifndef TURBOC
#include <direct.h>
#include <malloc.h>
#else
#include <dir.h>
#include <alloc.h>
#endif /* TURBOC */

#define DOS_GETFAT  0x3600
#define DOS_SETDTA  0x1A00
#define DOS_GETDTA  0x2F00
#define DOS_FFIRST  0x4E00
#define DOS_FNEXT   0x4F00
#define CARRY_FLAG  0x0001

#define	MAXARGS		100	/* maximum number of entries the new argv */
				/* array can contain			  */
#define MAXPATH		80
#define MAXDIR		66
#define MAXDRIVE	3
#define MAXFILE 	9
#define MAXEXT		5

#define TRUE		1
#define FALSE		0
#define NIL(type)	((type *) NULL)

typedef int BOOLEAN;

struct	DIRS			/* dos directory entry */
{
	char	for_dos[21];
	char	attr;
	struct	ftime
	{
		unsigned hour : 5;
		unsigned minute : 6;
		unsigned twosec : 5;
	}	time;
	struct	fdate
	{
		unsigned year : 7;
		unsigned month : 4;
		unsigned day : 5;
	}	date;
	long	size;
	char	name[13];
	char	fill[85];
};


static	union	REGS	reg;
static	struct	DIRS	dta;
static  struct  SREGS   segregs;
static	char	path[80];
static	int	pathend;

/*  The following are not all really needed for MSC5.0+, which
does have functions such as _dos_findfirst/_dos_find_next, etc., however
since the code works on MSC4.0 and is upwardly compatible it seemed
easier simply to stick with it, rather than to migrate it.  */

#ifdef MSC
char *
getdta()
{
	reg.x.ax = DOS_GETDTA;
	reg.x.bx = 0;
	reg.x.cx = 0;
	reg.x.dx = 0;
	intdos(&reg, &reg);
	return (reg.x.bx);
}

setdta(dta)
char *dta;
{
	reg.x.ax = DOS_SETDTA;
	reg.x.bx = 0;
	reg.x.cx = 0;
	reg.x.dx = (unsigned int) dta;
	intdos(&reg, &reg);
}

char *
strlwr(s)
register char *s;
{
	register char *os;
	os = s;
	while(*s){
		*s = tolower(*s);
		*s++;
	}
	return(os);
}

char *
stpcpy(s1,s2)
char *s1, *s2;
{
	return(strcpy(s1,s2) + strlen(s1));
}

char *
first(name, blk, attrib)
char *name, *blk;
int attrib;
{
	setdta(&dta);
	reg.x.ax = DOS_FFIRST;
	reg.x.bx = 0;
	reg.x.cx = 0;
	reg.x.dx = (unsigned int) name;
	intdos(&reg, &reg);
	if (reg.x.cflag & CARRY_FLAG)
		return(-1);
	return(0);
}

char *
next(blk)				       /* find next directory entry */
char *blk;
{
	setdta(&dta);
	reg.x.ax = DOS_FNEXT;
	reg.x.bx = 0;
	reg.x.cx = 0;
	reg.x.dx = 0;
	intdos(&reg, &reg);
	if (reg.x.cflag & CARRY_FLAG)
		return(-1);
	return(0);
}
#endif /* MSC */

#ifdef TURBOC
first(name, dta, attrib)
char *name, *dta;
int attrib;
{
	return (findfirst(name, dta, attrib));
}

next(dta)
char *dta;
{
	return (findnext(dta));
}

#endif /* TURBOC */

pathsplit(fpath, drive_dir)
char *fpath, *drive_dir;
{

	/* separate path and directory from input name */
	strcpy(drive_dir, fpath);
	pathend = strlen(drive_dir);
	while(pathend && drive_dir[pathend-1] != ':' 
	&& drive_dir[pathend-1] != '\\')
		pathend--;
	drive_dir[pathend] = '\0';
	return(drive_dir);
}


/******************************************************************************/
/*  The following is an adaptation of Richard Hargrove's 'exparg.c' code
    which he wrote to do wild card expansion for the initial release of
    TurboC (TurboC 1.0).  It keeps track of dynamic memory allocation
    efficiently, and codes well.  Besides, who wants to reinvent the
    wheel?

    As originally written, the code invoked TurboC's findfirst/findnext
    routines to expand each argv[] argument, and replaced the original array 
    of argv[] strings with an expanded one.  It also appropriately replaced
    argc.  Thus, so long as the first operative line after main() in
    a program was 
			argv = exparg(&argc,argv);
    from that point onward the program would operate as if the operating
    system, and not the program, had already expanded the arguments.

    To bring in pdgmatch, the game is:
	a) to use Mr. Hargrove's findfirst/findnext code with the argument
	   "*.*", to get a list of all of the files in the selected
	   path;
	b) apply pdgmatch to each original argv[] and the list of all
	   files; and
	c) use the result of the pdgmatch(s) to populate the replacement
           argv[] strings.
    Also, the results of DOS' findfirst/findnext are converted to lower 
    case before they are sent to pdgmatch, since it would be annoying 
    to have to use upper.  Note that the ATTRIB definition will determine
    whether only conventional files will be matched (ATTRIB=0) or
    whether hidden and system files, and directories, will also be
    matched (ATTRIB=32).
*/

char **exparg (pargc, argv)
int *pargc;
char **argv;
{
	static char *newargv[MAXARGS];	
	char pathi[MAXPATH];
	char patho[MAXPATH];
	char drive[MAXDRIVE];
	char dir[MAXDIR];
	char drive_dir[MAXDRIVE + MAXDIR];

	char *olddta;
	int args = 0;
	int newargc = 0;
	BOOLEAN err = FALSE;
	
   olddta = getdta();
   newargv[newargc++] = argv[args++];


   while (!err && args < *pargc) {
	patho[0]='\0';
	pathsplit(argv[args],drive_dir);
	stpcpy(stpcpy(patho, drive_dir), "*.*");
    
    	if (!first(patho, &dta,ATTRIB)) {
	    do {
	      char *localcptr = (char *)malloc (
		(unsigned)(stpcpy(stpcpy(pathi,drive_dir),dta.name) - pathi) + 1);
#ifdef TURBOC
	      if (localcptr == NIL(char)){
#else
	      if (localcptr == NULL){
#endif  /* TURBOC */
		fputs("\n_exparg error : no memory for filenames\n",stderr);
		exit(1);
	      } 
	      if  (gmatch(strlwr(pathi), argv[args])) {
		newargv [newargc++] = strcpy (localcptr, pathi);
	      }
	    } while ((newargc < MAXARGS) && !next (&dta));
	 }
	 else {
	    newargv [newargc++] = argv [args];
	 }
	 err = (newargc == MAXARGS);
	 args++;
   }

   if (err) fputs ("\n_exparg error : too many filenames\n", stderr);
   setdta (olddta);
   *pargc = newargc;
   return (&newargv [0]);
}

/***************************************************************************/
/*
 * int gmatch(string, pattern)
 * char *string, *pattern;
 *
 * Match a pattern as in sh(1).
 */

#define	NULL	0
#define	CMASK	0377
#define	QUOTE	0200
#define	QMASK	(CMASK&~QUOTE)
#define	NOT	'!'	/* might use ^ */

static	char	*cclass();

int
gmatch(s, p)
register char *s, *p;
{
	register int sc, pc;

	if (s == NULL || p == NULL)
		return(0);
	while ((pc = *p++ & CMASK) != '\0') {
		sc = *s++ & QMASK;
		switch (pc) {
		case '[':
			if ((p = cclass(p, sc)) == NULL)
				return(0);
			break;

		case '?':
			if (sc == 0)
				return(0);
			break;

		case '*':
			s--;
			do {
				if (*p == '\0' || gmatch(s, p))
					return(1);
			} while (*s++ != '\0');
			return(0);

		default:
			if (sc != (pc&~QUOTE))
				return(0);
		}
	}
	return(*s == 0);
}

static char *
cclass(p, sub)
register char *p;
register int sub;
{
	register int c, d, not, found;

	if ((not = *p == NOT) != 0)
		p++;
	found = not;
	do {
		if (*p == '\0')
			return(NULL);
		c = *p & CMASK;
		if (p[1] == '-' && p[2] != ']') {
			d = p[2] & CMASK;
			p++;
		} else
			d = c;
		if (c == sub || c <= sub && sub <= d)
			found = !not;
	} while (*++p != ']');
	return(found? p+1: NULL);
}
/******************************************************************************/
#ifdef TEST

main (argc,argv)
int argc;
char **argv;
{
/*  Normally, when using exparg, you should precede the exparg() call
    with the declaration:
		extern char **exparg();
    and the first line non-declaration code after main should
    be:
		argv = exparg (&argc, argv)
    However, to show how it works, we will first print the original
    command line parameters in the following test code.  And, since
    exparg() has already been declared, we will not bother to do so
    here.
*/
	 
	int i = 0;
	
	printf ("original command line parameters : argc: %d\n", argc);
	for (; i < argc; i++) {
	  printf ("%s\n", argv [i]);
	}

	argv = exparg (&argc, argv);

	printf ("new command line parameters : argc: %d\n", argc);
	for (i = 0; i < argc; i++) {
	  printf ("%s\n", argv [i]);
	}
}
#endif
===============================END OF CODE===========================

No warranties whatsoever.  You get
what you pay for!
					Mike Slomin
					bellcore!lcuxa!mike2

link@stew.ssl.berkeley.edu (Richard Link) (11/04/88)

In article <215@lcuxa.UUCP> mike2@lcuxa.UUCP (M S Slomin) writes:
>
>Have you wished to be able to specify wildcards in DOS the way you
>can in Unix?


Personally, I wish both UNIX and MS-DOS could expand wildcards like VMS.
I *hate* the useless destination restrictions in UNIX.

Dr. Richard Link
University of California, Berkeley
link@ssl.berkeley.edu

naughton%wind@Sun.COM (Patrick Naughton) (11/04/88)

I like the idea of having unix shell style wildcard expansion, but
having it be a C client function which has to be called before argv[]
processing kind of limits its usefulness.  The solution I would like to
see would be a COMMAND.COM shell enhancement (replacement?) which
handled the usual stuff like file completion, history substitution, AND
wildcard expansion.  This way the client program/command such as masm or
link (which always has long command lines under Unix) would already have
the correct arg[cv] values on startup.  This also has the advantage of
having only one copy of argexp() lying around rather than one copy per
client. 

This is a small matter of programming and I would certainly have already
done it if it were not for the point of this posting: The DOS command
line (last time I checked) had an upper limit of 128 characters.  Thus
any wildcard expansion which expanded out to more than 128 characters
would fail.  Does anyone know if this is hardwired, or where the hack
would have to go in to change it? I am guessing that it is because old
".COM" files put argv[] in the PSP which is only 256 bytes long.  It
seems that for an ".EXE" file, the process loader or the offset fixup-er
could allocate some space for the arg list no matter how long and point
argv[] at it.

This shortcoming of DOS/COMMAND.COM causes standard Unix Makefiles to be
useless since most link lines get to be several hundred characters, if
you have any reasonable number of libraries or object files.  DOS has
the @file convention for reading the args for commands out of, but it
makes Makefile management a nightmare and portability to Unix a moot
point.

Any comments or suggestions?

-Patrick
    ______________________________________________________________________
    Patrick J. Naughton				    ARPA: naughton@Sun.COM
    Window Systems Group			    UUCP: ...!sun!naughton
    Sun Microsystems, Inc.			    AT&T: (415) 336 - 1080

dhesi@bsu-cs.UUCP (Rahul Dhesi) (11/05/88)

In article <76192@sun.uucp> naughton@sun.com (Patrick J. Naughton) writes:
>The DOS command
>line (last time I checked) had an upper limit of 128 characters....
>Does anyone know if this is hardwired, or where the hack
>would have to go in to change it?
...
>This shortcoming of DOS/COMMAND.COM causes standard Unix Makefiles to be
>useless since most link lines get to be several hundred characters,...

The command line limit is unfortunately hard-coded because of the
limited size of the PSP.  (I think this problem is inherited from
CP/M.)  For C programs, that use argv[], this is not a problem at all
-- the C runtime library can still expand command line arguments and
malloc() space for them, and in fact both Microsoft and Borland
supply functions that will do this for you and call main() with
argv[] containing expanded filenames.  (It's true that various
versions of these have various peculiarities, often choking on forward
slashes in pathnames.)

This command line shortcoming is not a problem with makefiles if you use
Don Kneller's ndmake program.  It accepts highly UNIX-compatible
makefiles, and recognizes the word "link" and feeds the linker a
response file with the @ command.  It's shareware for $35, and a steal
at that price.

(P.S.  They just fixed my PS/2 and gave it back to me, and I'm using it
for the first time and finding the placement of the keys quite
irritating.  Does anybody have a simple way of exchanging the caps-lock
and ctrl keys on the keyboard?  I wish these big companies such as DEC,
AT&T, and IBM would stop trying to improve keyboards by constantly
changing the keys around, inserting backslashes at odd places, making
escape keys vanish, etc.)
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee}!bsu-cs!dhesi

sullivan@marge.math.binghamton.edu (fred sullivan) (11/07/88)

In article <76192@sun.uucp> naughton@sun.com (Patrick J. Naughton) writes:

>I like the idea of having unix shell style wildcard expansion, but
>having it be a C client function which has to be called before argv[]
>processing kind of limits its usefulness.  The solution I would like to
>see would be a COMMAND.COM shell enhancement (replacement?) which
>handled the usual stuff like file completion, history substitution, AND
>wildcard expansion.
>
>This is a small matter of programming and I would certainly have already
>done it if it were not for the point of this posting: The DOS command
>line (last time I checked) had an upper limit of 128 characters.  Thus
>any wildcard expansion which expanded out to more than 128 characters
>would fail.

A similar problem exists for the Atari ST, and one solution? which has been
used (I think by Mark Williams C) is to pass arguments which go past 128 bytes
in an environment variable.  The problem with any solution like this is, of
course, that it simply will not work with existing software.  For software
which one writes for oneself, it is easy enough to have the program itself
expand the wildcards (with Turbo C one links in a routine called setargv,
which is called on startup by the runtime system -- I assume other compilers
have something similar).  It was an extremely stupid decision to embed the
command line arguments in a 256 byte PSP, but then it's things like this that
make DOS DOS.

Fred Sullivan				SUNY at Binghamton
Dept. Math. Sciences			Binghamton, NY 13903
					sullivan@marge.math.binghamton.edu
First you make a roux!

swh@hpsmtc1.HP.COM (Steve Harrold) (11/07/88)

Re: Long DOS command lines in makefiles

As stated earlier in this notestring, UNIX makefiles are hard to manage
under MSDOS because the long command lines often encountered in UNIX
get truncated by DOS to 128 characters.  The work-around is to use manually
created "response" files, but these are hard to keep in sync with various
make macros such as the commonly used $(OBJS).

There is a DOS make product named OPUS Professional Make that handles
this automatically.  When a command line to invoke "link", "lib" or any other
user-designated program is longer than 128 characters (after macro
substitution), the OPUS program creates a temporary response file and feeds
THAT to the intended program.  There is no special user involvement in 
using this "workaround"; he simply manages his makefiles as if they had
infinitely long command lines.

Another bonus is that a "mkmf" program is included that allows you to 
automatically generate the dependencies in the makefile, directly from
the source files.

Their address is: OPUS Software
		  1468 8th Ave.
		  San Francisco, CA 94122

UUCP:	...ucbvax!ucsfcgl!kneller

My only association with the product is that of a satisfied user.
--
---------------------
Steve Harrold			...hplabs!hpsmtc1!swh
				HPG200/13
				(408) 447-5580
---------------------

zu@ethz.UUCP (Urs Zurbuchen) (11/09/88)

In article <76192@sun.uucp> naughton@sun.com (Patrick J. Naughton) writes:

>I like the idea of having unix shell style wildcard expansion, but
>having it be a C client function which has to be called before argv[]
>processing kind of limits its usefulness.

If anyone is interested in a replacement for the Microsoft C startup
wildcard expansion routine drop me a note. I adjusted some subroutines
which do UN*X style wildcard expansion and included them into the C
libraries. For all the C programs you compile you get this feature for
free. I didn't write all the code myself. Credits are given in the
sources.

>...  The solution I would like to
>see would be a COMMAND.COM shell enhancement (replacement?) which
>handled the usual stuff like file completion, history substitution, AND
>wildcard expansion.
>
This solution is preferable of course. But as I have to sources to
almost all the utilities I use it isn't of high necessity to me. I would
suggest that instead of hooking something to Command.Com it could be
easier and better to write a replacement. Anybody volunteering ? (Don't
tell me about the MKS toolkit's korn shell. I heard of that.)

		Have a nice day,
		    ...urs

james@bigtex.cactus.org (James Van Artsdalen) (11/12/88)

In <11470038@hpsmtc1.HP.COM>, swh@hpsmtc1.HP.COM (Steve Harrold) wrote:
> Re: Long DOS command lines in makefiles  [...]

> There is a DOS make product named OPUS Professional Make that handles
> this automatically.  [...]

I'll strongly second this recommendation of Opus make.  It is the
essentially the equal or better of any unix make.  It works correctly,
unlike that wretched abomination Microsoft has.  Borland's make is
good, but I could not make Borland's make work well with large
projects spanning multiple directories, whereas Opus make worked
perfectly.

There is also an OS/2 version, for those programming in OS/2 and
needing a working "make".
-- 
James R. Van Artsdalen      james@bigtex.cactus.org      "Live Free or Die"
Home: 512-346-2444 Work: 338-8789       9505 Arboretum Blvd Austin TX 78759

simpsong@ncoast.UUCP (Gregory R. Simpson) (11/12/88)

In article <16481@agate.BERKELEY.EDU> link@stew.ssl.berkeley.edu (Richard Link) writes:
>In article <215@lcuxa.UUCP> mike2@lcuxa.UUCP (M S Slomin) writes:
>>
>>Have you wished to be able to specify wildcards in DOS the way you
>>can in Unix?
>
>Personally, I wish both UNIX and MS-DOS could expand wildcards like VMS.
>I *hate* the useless destination restrictions in UNIX.
>
>Dr. Richard Link
>University of California, Berkeley
>link@ssl.berkeley.edu

What VMS Wildcards?  I can't even use the most simple wildcards, like
cc *.c !!!! THat is considered wildcard expansion???  Follow-up to
comp.sys.vms... this doesn't really need to be in comp.sys.ibm.pc...

Greg
-- 
---
      Gregory R. Simpson       

Prefered  Internet: SIMPSONG%LTD2.decnet@ge-crd.arpa
UUCP: uunet!steinmetz!ltd2.decnet!simpsong
UUCP: <BACKBONE>!cbosgd!ncoast!simpsong

allbery@ncoast.UUCP (Brandon S. Allbery) (11/14/88)

As quoted from <1554@bingvaxu.cc.binghamton.edu> by sullivan@marge.math.binghamton.edu (fred sullivan):
+---------------
| In article <76192@sun.uucp> naughton@sun.com (Patrick J. Naughton) writes:
| 
| >This is a small matter of programming and I would certainly have already
| >done it if it were not for the point of this posting: The DOS command
| >line (last time I checked) had an upper limit of 128 characters.  Thus
| >any wildcard expansion which expanded out to more than 128 characters
| >would fail.
| 
| A similar problem exists for the Atari ST, and one solution? which has been
| used (I think by Mark Williams C) is to pass arguments which go past 128 bytes
| in an environment variable.  The problem with any solution like this is, of
| course, that it simply will not work with existing software.  For software
+---------------

Perhaps someone should consider an extension to DOS (TOS).  Compliant
command interpreters would place as much of the argument list as possible
in the command line and *also* place the full command list into another
memory segment or etc. which would be passed to the program.  The extended
segment could be 1024 bytes or etc.  Programs which aren't compliant would
continue to have problems, but compliant programs could use the parameter
segment to get the full argument list.  If enough programs used the
extension, it would eventually become a standard.

++Brandon
-- 
Brandon S. Allbery, comp.sources.misc moderator and one admin of ncoast PA UN*X
uunet!hal.cwru.edu!ncoast!allbery  <PREFERRED!>	    ncoast!allbery@hal.cwru.edu
allberyb@skybridge.sdi.cwru.edu	      <ALSO>		   allbery@uunet.uu.net
comp.sources.misc is moving off ncoast -- please do NOT send submissions direct
      Send comp.sources.misc submissions to comp-sources-misc@<backbone>.