[comp.lang.c++] Filesystem classes: how to?

gilstrap@swbatl.sbc.com (3929) (06/07/90)

Okay, anyone want to give a stab at a set of classes to implement a Unix file
system?  My initial thoughts were:

                              FileSystemNode
                                   |
               +------------+--------------+-------+---------+
               |            |              |       |         |
           Directory  CharacterFile   BlockFile  Link  SymbolicLink
                            |
                  +-------------------+
             RegularFile     CharacterSpecialFile

However, since C++ won't let you cast down from a base class to a derived
class, I can't open up a directory (which would ideally contain a set of
FileSystemNodes), examine each entry, and work with it as the kind-of object
it is.  That is, if I'm wanting to traverse the directories and print out
the name of each FileSystemNode and then recursively traverse all directories,
how do I treat Directory objects as directory objects if directories contain
FileSystemNodes (which means I can't get into Directories since they are only
recognized as being a FileSystemNode).

I suppose the Directory class could keep lists of CharacterFile's and
BlockFile's and ...  However, this means that the Directory class has to change
each time a new derived class of FileSystemNode is created.  Bad news.

Anyone have any suggestions?

Brian R. Gilstrap
uucibg@swbatl.uucp OR ...!{ texbell, uunet }!swbatl!uucibg
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"You spend your whole life just piling it up there.  You got stacks and stacks
and stacks.  Then Gabriel comes and taps you on the shoulder, but you don't see
no hearses with luggage racks."
                             --- Don Henley
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Copyright (c) 1990 by Brian R. Gilstrap.  You may redistribute (my portions of)
this material for free if and only if your recipients may also do so.

dld@F.GP.CS.CMU.EDU (David Detlefs) (06/08/90)

Brian Gilstrap presents an a problem in object oriented design.  I
think it is an excellent example, and want to take a shot at answering
his concerns.  He wants to organize a Unix file system in an
object-oriented manner.  He presents the following class structure:

>
>                              FileSystemNode
>                                   |
>               +------------+--------------+-------+---------+
>               |            |              |       |         |
>           Directory  CharacterFile   BlockFile  Link  SymbolicLink
>                            |
>                  +-------------------+
>             RegularFile     CharacterSpecialFile

He then asks:

> However, since C++ won't let you cast down from a base class to a derived
> class, I can't open up a directory (which would ideally contain a set of
> FileSystemNodes), examine each entry, and work with it as the kind-of object
> it is.  That is, if I'm wanting to traverse the directories and print out
> the name of each FileSystemNode and then recursively traverse all directories,
> how do I treat Directory objects as directory objects if directories contain
> FileSystemNodes (which means I can't get into Directories since they are only
> recognized as being a FileSystemNode).
> 
> I suppose the Directory class could keep lists of CharacterFile's and
> BlockFile's and ...  However, this means that the Directory class has to change
> each time a new derived class of FileSystemNode is created.  Bad news.

It would be bad indeed if the Directory class had to change each time a new
derived class were created.  Avoiding this, after all, is the purpose
of inheritance.

Let's examine the specific question Brian poses: how to print the
names of all the files "in" a filesystem node.  I'll assume that
everything other than a directory is a single file that has a single
name.

The answer is to use virtual functions.  Define a virtual "print_names"
function on FileSystemNode.  This will do different things depending
on the actual type of the object it's invoked upon.  Here is some
partial code.

class FileSystemNode {
  private:
    char* name;
    // ...
  public:
    virtual void print_names(int indent =0) {
	cout << str("", indent) << name << "\n";
    }
};

class Directory : virtual public FileSystemNode {
  private:
    Set<FileSystemNode> entries;
  public:
    void print_names(int indent =0);
};

void Directory::print_names(int indent) {
   FileSystemNode::print_names(indent);
   SetIttr<FileSystemNode> iter(entries);
   for (FileSystemNode fsn = iter.get(); !iter.done(); iter++)
      fsn.print_names(indent+4);	
}

Notes:
I'm assuming the existence of Templates here, to implement the
parameterized Set type.  The SetIttr type provides an iterator that
allows one to iterate over all elements of a set.  It provides "get",
"done" and "++" operations, with the obvious interpretations.

I added indentation so it prints out in a tree style.

I assumed that every FileSystemNode has a string name, so I store this
data in the private data of that class.  Note the mileage one gets out
of providing a default implementation of print_names() in the base
class.  Only Directory need redefine print_names().


More generally, you'd like to be able to traverse the file system and
do things other than print files.  To extend this solution to that
concept, you'd have to provide a FileSystemNode virtual function for
every other thing you'd like to do in a traversal.  It would be nice
if these all had the same signature.  Then you could provide a virtual
"traverse_and_apply" function that took a pointer to member function
as its argument.  For everything except directories,
traverse_and_apply would simply invoke it's argument on "this."  For
directories, it would invoke its argument on "this," and then call
traverse_and_apply with the same arguement for each FileSystemNode in
the directory.

I hope this helps, and invite comments and discussion.

Dave


--
Dave Detlefs			Any correlation between my employer's opinion
Carnegie-Mellon CS		and my own is statistical rather than causal,
dld@cs.cmu.edu			except in those cases where I have helped to
				form my employer's opinion.  (Null disclaimer.)

roman@sparc7.hri.com (Roman) (06/08/90)

In article <1567@swbatl.sbc.com>, gilstrap@swbatl.sbc.com (3929) writes:
> Okay, anyone want to give a stab at a set of classes to implement a Unix file
> system?  My initial thoughts were:
> 
>                               FileSystemNode
>                                    |
>                +------------+--------------+-------+---------+
>                |            |              |       |         |
>            Directory  CharacterFile   BlockFile  Link  SymbolicLink
>                             |
>                   +-------------------+
>              RegularFile     CharacterSpecialFile
> 
[deleted]
> Anyone have any suggestions?
> 
> Brian R. Gilstrap
> uucibg@swbatl.uucp OR ...!{ texbell, uunet }!swbatl!uucibg
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~
> "You spend your whole life just piling it up there.  You got stacks
and stacks
> and stacks.  Then Gabriel comes and taps you on the shoulder, but you
don't see
> no hearses with luggage racks."
>                              --- Don Henley
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~
> Copyright (c) 1990 by Brian R. Gilstrap.  You may redistribute (my
portions of)
> this material for free if and only if your recipients may also do so.

You have to design everything into FileSystemNode, that is create
virtuals for all possible subclasses, so you can access them through the
base pointer. You also have to somehow know the type in run time ( or
have the base class virtuals respond appropriately).
Not very pretty, but the only way I found. I guess it's OK when you
design the whole system. Hard to extend tho.

williams@umaxc.weeg.uiowa.edu (Kent Williams) (06/08/90)

The Choices Operating System from the University of Ill. at Champaign
implements a hierarchy of different file systems.  Look at what they did.
I don't have the docs in front of me (or the source for that matter) but
the Champaign guys do look at this news group.

I don't know how you could get around the cast-down-the-inheritance-tree
problem.  Ideas?


--
Kent Williams                    'Look, I can understand "teenage mutant ninja 
williams@umaxc.weeg.uiowa.edu    turtles", but I can't understand "mutually 
williams@herky.cs.uiowa.edu      recursive inline functions".' - Paul Chisholm

dld@F.GP.CS.CMU.EDU (David Detlefs) (06/09/90)

Dave
Brian Gilstrap sends another scenario, and asks for comments:

If you absolutely, positively, can't use virtual functions to do
whatever you need to do (and it's a different way of thinking, I've
run in to quite a number of situations were I think "shoot, there's no
*way* I'm gonna be able to use virtual functions!" but I was
eventually able to), well, If you've really gotta be able to get a
CharacterFile out of something your program is thinking of as a
FileSystemNode, then I'd do something like this:

class FileSystemNode {
  // ...
  public:
    virtual CharacterFile* toCharacterFile() { return NULL; }
}

class CharacterFile : virtual public FileSystemNode {
  public:
    CharacterFile* toCharacterFile() { return this; }
}

Given a FileSystemNode object, I can then invoke toCharacterFile and
get back either NULL or a pointer to the object (safely) considered as a
CharacterFile.  Note that if there is ever a *really* good C++ compiler
it may be able to optimize the invocations of these virtual functions
away entirely, since they are inline.

Drawbacks:

By symmetry, you'd want a toX function in FileSystemNode for every
subclass X.  Therefore, when you add a new subclass Y to FileSystemNode,
you need to add a new toY function to the definition of
FileSystemNode.  This can require a lot of recompilation (virtual
table size changes, for one thing), but no reprogramming -- previous
code stays correct.

For what it's worth...

--
Dave Detlefs			Any correlation between my employer's opinion
Carnegie-Mellon CS		and my own is statistical rather than causal,
dld@cs.cmu.edu			except in those cases where I have helped to
				form my employer's opinion.  (Null disclaimer.)

greg@cheers.uucp (Greg Onufer) (06/10/90)

gilstrap@swbatl.sbc.com (3929) writes:
>Okay, anyone want to give a stab at a set of classes to implement a Unix file
>system?  My initial thoughts were:

>                              FileSystemNode
>                                   |
>               +------------+--------------+-------+---------+
>               |            |              |       |         |
>           Directory  CharacterFile   BlockFile  Link  SymbolicLink
>                            |
>                  +-------------------+
>             RegularFile     CharacterSpecialFile

Take a look at the SunOS vnode system.  It provides for a number of
operations that can be performed on a vnode.  Now, if you were to
implement a base class that had virtual functions to mimic some of the
vnode functions, then derived classes could implement them in any way
they desired.  I believe there was a paper at a Usenix conference not
too long ago about the vnode interface.  It may provide you with some
ideas. 

Cheers!greg

roger@procase.UUCP (Roger H. Scott) (06/12/90)

In article <1567@swbatl.sbc.com> gilstrap@swbatl.UUCP (Brian Gilstrap - UCI - 5-3929) writes:
>However, since C++ won't let you cast down from a base class to a derived
>class,
1.  I don't know what you mean by this statement - it is patently untrue as
I read it.
2.  There is a way of doing this that doesn't require any casts at all (I
actually saw this a few weeks ago in this news group, but I can't remember
whose idea it was):

    #define redefine
    class Base;
    class Der1;
    class Der2;

    class Base {
    public:
	virtual Dir1 *asDir1() {return NULL;}
	virtual Dir2 *asDir2() {return NULL;}
    };

    class Der1 : public Base {
    public:
	redefine Dir1 *asDir1() {return this;}
    };

    class Der2 : public Base {
    public:
	redefine Dir2 *asDir2() {return this;}
    };

    ...
	Base *b = ...;
	Dir1 *d1 = b->asDir1();
	if (d1 != NULL) {
	    // do Dir1-specific stuff
	    ...
	}

Some people object to this approach on the grounds that it requires class Base
to know about all of the subclasses such as Dir1 and Dir2.  This is not true.
Base only needs to know about those subclasses that it is able to "become".
Since this type transformation is an capability of class Base I see no reason
why it should not know about (at least the existence of) these subclasses.  Note
that Base only really needs to know about its immediate subclasses - its
subclasses can take care of any further type conversion.

hardin@hpindda.HP.COM (John Hardin) (06/15/90)

roger@procase.UUCP (Roger H. Scott) writes

> Some people object to this approach on the grounds that it requires class Base
> to know about all of the subclasses such as Dir1 and Dir2.  This is not true.
> Base only needs to know about those subclasses that it is able to "become".
> Since this type transformation is an capability of class Base I see no reason
> why it should not know about (at least the existence of) these subclasses.  Note
> that Base only really needs to know about its immediate subclasses - its
> subclasses can take care of any further type conversion.
----------

I think you can understand the objection to this better when you think of
it in terms of reusing code from a library of classes in general use around
your lab/company/industry.  If you need to add a Dir3 to this scheme and
Base is from a library of classes not controlled by you, then you have a
problem.  What if you were told that each change to a class from the 
library required a written justification to be submitted to the lab 
librarian and consideration by a review board of representatives from
functional areas in your organization that use the library and depend
upon its quality and stability?  Would you feel any differently about a 
scheme that required base class modification each time you expanded the
number of subclasses that base class could 'become'?

I think a lot of the disagreements on the 'cleanness' of one or another
technique arise from a different view of the constraints under which
we will be working in the future.  Personally, I look forward to a day
when we will all use standard classes that implement the more mundane
chores of programming.  (I can't even guess how many times I've 
implemented a linked list of something or other in my career.)

John Hardin
hardin@hpindgh.hp.com