[comp.sys.amiga.tech] Tale of Wildcarding in an OS.

gsarff@sarek.UUCP (Gary Sarff) (03/22/90)
In article <1213@lpami.wimsey.bc.ca>, lphillips@lpami.wimsey.bc.ca (Larry Phillips) writes:
>In <682@xdos.UUCP>, doug@xdos.UUCP (Doug Merritt) writes:
>>The problems with that are:
>>	1) Pervasiveness/universality: if it's not in the shell, that
>>	   generally means that feature is not universally available.
>>	   So I have to remember which commands allow it, which do not,
>>	   which support one flavor of wildcarding, which allow another,
>>	   which use both, which have Super Wildcard Expansions, etc.
>>	   So this is also a "consistency/documentation" argument.
>>	   And this is precisely the problem I currently have.
>
>As you point out, it is a 'consistency/documentation' problem, not,
>as  you mention earlier, a 'not in the shell' problem.  As you well
>know, there are commands for which shell expansion of wildcards is
>innappropriate, and because of this, it is sometimes desirable to
>allow the program to determine the way to handle expansion of the
>wildcards.

Witness the boundless numbers of Unix shell scripts that are made almost
visually incomprehensible because of having to use \ to escape all the
wildcard characters that the shell might process before the utility you are
trying to run.  Such as "grep" command lines, sed command lines, awk
command lines, etc.

>
>>	2) No, you don't need "unix-like commands". Essentially all
>>	   reasonable utilities support a command line list of files.
>>	   Even if they support wildcards, they will not object to
>>	   what appears to be a wildcard-free list of filenames typed
>>	   by hand.
>
> [...] excised.
>
>Now there's the ticket. A simple call to a system library that
>will expand the wildcards, callable by the program _when
>appropriate_. Make it easy to use and cheap in code size, and it
>will be taken into consideration and used by programmers. In the
>example above, the 'mv' command would expand the first argument
>with a call to the system routine, and would either handle the
>second argument itself, or pass the second argument to the
>wildcard expansion, along with the first argument as the target of
>the expansion, rather than whatever currently exists in the file
>system.
>  [...] excised
>
>We have a machine in which the wildcard expansion philosphy is not
>set in stone. Let's not make the mistake of accepting either of
>the 'extremes' of philosophy. Let's look at how we can have the
>best features of both philosophies.

I think from a shell user's point of view at least that having the utilities
do wildcard expansion is preferred to the UNIX shell method, for some of the
reasons listed above, especially the mv *.foo *.bar example.

The OS I develop at work does this and in my experience, since I also manage
some of the Unix systems we sell, it is far easier to _use_.  I am often
scavenging the unix system's disks for space and going to some directory and
typing the innocuous "ls -l *" or some variant and being told there are "too
many files" (how useful) or that the command line is too long, and then
having to end up using "find" and writing a shell script on the terminal is
certainly not shell-user friendly.  This is the wonderful environment that
all developers love?

I'll just mention, by way of example, the way the OS I develop does this, by
way of example. There is a library routine that _all_ utilities provided with
the OS use to do filelist expansion (wildcard expansion).  It recognizes the
syntax:

   *     - Matches zero or more arbitrary characters.
   =     - Matches a single arbitrary character.                             
   []    - Matches any one of a set of characters.  Within the brackets you  
           specify which characters are included in the set and which are    
           excluded.  Use ^ to exclude a character or range of characters.
      [^f-h]* - Matches strings that begin with any letter except f,g, or h
   ()    - Matches a numeric field in the name.  Within the parentheses you  
           specify a list of numeric ranges (see ranges).                    
      *(1-30)* - Matches strings that contain any number between 1 and 30.    

A Filelist is a comma separated list of wildcard specifications, such as
   *.c,*.s,*.o

The library, (and thus the utilities) also _all_ recognize the switches:
 
   :before=       - Type a date and time.  Files with creation dates (or     
                    modification dates if the :mod switch is specified) that 
                    are earlier than the time given with this switch will be 
                    included in the list of files returned. (see dates)      
                    The default for this switch is all files.                
   :builddir      - If the destination directory, (where applicable), does
                    not exist, create it before performing the specified
		    operation.
   :class=        - Type a list of device classes separated by commas.       
                    Only files that reside on the class(es) of devices given 
                    will be included in the list of files returned.  This is 
                    especially useful in doing network wide wildcarding.     
                    (see devlist)  The default for this is all classes.      
   :exclude=      - Type a list of file designations.  Files that match this 
                    criteria (wildcards are allowed) will be explicitly      
                    excluded from the list of files returned.  The default   
                    for this switch is no files excluded.                    
   :filesize=     - Type a numeric range of file sizes in K.  Files that     
                    fall within the specified size range will be included in 
                    the list of files returned.  The logical size of the     
                    file is rounded up to the next 1K boundary (i.e.,        
                    :filesize=1 will find files that have a logical size of  
                    1-1024 bytes).  The default for this switch is all files 
                    (range spec 0-).                                         
   :mod           - Use file modification date instead of the file creation  
                    date for comparison.                                     
   :since=        - Type a date and time.  Files with creation dates (or     
                    modification dates if the :mod switch is specified) that 
                    are later than the time given with this switch will be   
                    included in the list of files returned. (see dates)      
                    The default for this switch is all files.                
   :typeselect=   - Type a filetype list or range.  Files that match the set 
                    of filetypes given will be included in the list of files 
                    returned.  (see filetypes)
                    Example filetypes are: Directory, Image, Archive, etc.
   :uic=          - Select based upon the owner of the file.
                    Type a list of uics or usernames.  Usernames are first   
                    converted to their equivalent uics, then files that match
                    the uic specification will be included in the list of    
                    files returned. (see uiclist)                            
                                                                             
   :sort=         - Type DATE, EXTENSION, FILESIZE, FILETYPE, or UIC.  This  
                    switch controls the order which the files are returned.  
                    Only one of the values may be specified.  The default    
                    order is alphabetical based on the filename.  If DATE is 
                    given, the order is ascending dates and times. If        
                    EXTENSION is given, the order is alphabetical based on   
                    the extension.  If FILESIZE, FILETYPE, or UIC are given  
                    the order is ascending numerical values.  Whenever any   
                    of the sort values are equal, the secondary sort field   
                    is the file name.                                        

   
And we also have the :EDIT= switch.
   The :EDIT= switch allows you to specify a list of transformations to be   
   performed on the destination file designation.  The transformations are   
   specified as a series of SEARCH and REPLACE operations.  There may be     
   several transformations specified, with each transformation separated from
   the next by a comma.  The transformations occur one at a time as follows: 
       - The file designation is searched starting at the left for the first 
         occurrence of the SEARCH string.  If the SEARCH string is found, it 
         is replaced with the REPLACE string.                                
       - The next transformation (if any) is searched starting at the left   
         for the first occurrence of the SEARCH string.  If the SEARCH string
         is found, it is replaced with the REPLACE string.                   
*  There is nothing to prevent a subsequent transformation from editing a    
   previous transformation.                                                  
   The SEARCH string is separated from the REPLACE string by a colon (:). A  
   NULL SEARCH string (":REPLACE") means to replace the entire source string 
   with the REPLACE string.  A NULL REPLACE string ("SEARCH:") means to      
   delete the SEARCH string.  An ampersand (&) on the REPLACE side of the    
   transformation means to substitute the entire SEARCH string in its place. 
                                                                             
   You can use accept sequences to include colons (:), commas (,), or        
   ampersands (&) in the SEARCH and/or REPLACE strings.

*  There is nothing that prohibits you from specifying a transformation
   that results in an illegal file designation.
   
Most of the above was pulled from the online help files for wildcarding.
Also, all the switches can be abbreviated as long as the abbreviations are
unambiguous, :since= can be :s= if no other switch starts with "s".

      The offshoot of this is that I can use the same syntax in COPY, 
   RENAME, DELETE, ARCH (to make archive files), BACKUP, RESTORE (to restore
   from backup tapes), DIR, etc, etc.  And not have to worry about whether
   some utility will work properly or not.
   I like being able to say things like

  DIR /.*/*.c :sin=03-mar-1990_12:00:00 :before=05-mar-1990_16:00:00 \
          :filesize=20- :sort=DATE
  
  and see all my *.c files in all my subdirectories that were created between
  noon March 3, and 4pm March 5, and are 20K or greater in size, sorted by
  their creation date.

  :since and :before also recognize delta time, e.g., +4, four days from now,
  -02:00, two hours ago, and keywords YESTERDAY, TODAY, TOMORROW, and for
  time CURRENT, thus TODAY_CURRENT means right now.

To the programmer this is done with two library calls, one initwild()
that takes filespec list, edit list, since, before, filetype, sort, etc 
values (or null if not specified) as arguments, and initializes the wildcard
routines, and getnextfile() which returns the next file that matches the
user's specification.  Pretty easy actually, you don't even have to open any
directories or files yourself.

Important note, command line switches are "keywords", so they can come
anywhere on the command line, intermingled with filelists, _in any order_.  

  for example  
       DIR :SINCE=YESTERDAY *.c,*.pas >tmplist :SORT=FILESIZE :MOD
(list *.c, and *.pas files Modified since yesterday, sorted by their filesize
 and send the output to a file called tmplist.)

I/O redirection can also appear anywhere, using < for input, > for output,
and ^ to redirect Standard Error.  The shell provides for the I/O
redirection.  The utilities can get their command lines, nicely parsed with
switches disambiguated, etc, by calling one library routine, cmdline() and
getting back an array of containing all kinds of useful information about the
command line in a standard form, useful for passing to other library routines
such as the wildcarding routines.

Now, I may be biased, 1) since I develop the OS, and 2) since I have used
it so long that this is what I am used to, but I much prefer this command
line behaviour to the shell expansion facilities, at least as they are
provided by most UNIX shells.  Yes, users of our OS can on their own systems
choose not to link the command line parsing and/or wildcard library into
their code and thus confuse their users, but hopefully the users will then
ask said programmer to use the library so they won't be confused by his
program.  And I have never seen one of our utilities say "too many files"
like UNIX "ls" does, even recently when I was testing a new 1.2 Gigabyte SCSI
drive and copied 25,000+ files into one directory and did a DIR on it,
telling it to sort them by filesize.  It took a couple minutes time, but it
did do it.

The bottom line is consistency for the user, which may fairly often be in
opposition to ease of programmer development of the utility or library,
(these libraries were not easy to write).  After all, the user (as in the
movie TRON) should be thought of as our gods, we do want _someone_ to use our
programs don't we?  We are here to serve them in a sense, they are
not here to serve us.  Unix wildcarding (but not utility's command line
switches unfortunately) is consistent too, not optimal in my opinion, but it
is consistent.   Consistency for the user whatever wildcard method, syntax
etc, is chosen should be top priority for programmers.

----------------------------------------------------------------------------
                       I _DON'T_ live for the leap!