philip@amdcad.AMD.COM (Philip Freidin) (08/18/88)
Many moons ago I posed the problem of a sort package that I had, that had suddenly stopped working, and had run out of file handles. In my article, I gave a VERY complete description of all the things that I had tried to resolve the problem, and it was exhaustive. The responses were entertaining/irritating, as most of them were irrelevant, as would have been obvious if my original description of what I had tried had been read carefully. People seem to read up to the point that they can form an oppinion, and then post. This mode seems to permeate the Net, so there you are. I will treat it as an epidemic of: Deafness of Eye-balls. So anyway, so as not to bore you all to death with my ramblings, here is a synopsis of the problem, and the surprizing resolution. (P.s. it turns out that that I lied in my original posting, with regard to "I didn't change Nuthin, an' now it don't work".) Synopsis of problem: Bigsort is a program I wrote that does a poly-phase quicksort/mergesort, while pretending that the disk is multiple tape drives. This works well because, disks being read sequentially, transfer data quite fast. Program opens about 8 scratch files, and several others, such as input, index, and report. Input file is about 700K ascii. Program crashes now, and reports it can't open all it's files. I tried games involving TSR's, AUTOEXEC.BAT, and CONFIG.SYS. The most misleading thing I did was to boot off a virgin distribution floppy, and even that didn't work. The only thing that worked was to bump up the FILES=xxx in the config.sys file, but why was it needed????? every thing used to work. The solution: Turns out I had changed autoexec.bat to make it quieter. An ECHO OFF at the begining, and for each TSR that was loaded, I shut them up as well, like this: c:>MARK ALL >nul: ^^^^^ What was happening was each of the TSR's was holding onto the file handle associated with the output redirection, regardless of the fact that it never used it again. The solution was to either increase files=xxx, or put up with a noisy autoexec.bat. I chose the latter. Discussion: The default files=xxx is 8. So when I booted from the floppy, without any config.sys, it failed. My normal config.sys had it set to 20, so my redirects in the autoexec.bat was eating up about 6 of them. Increasing the value in config.sys, fixed the problem, as did removing the redirects to nul:. The increase of files=xxx was unacceptable as I couldn't afford the memory. I tracked down the problem by writing a program that reports how many file handles are left. Placing this into the autoexec.bat file at multiple stategic places revealed the problem. Discovery: When msdos starts a program, 5 handles are allocated, no matter what you do. STDIN, STDERR, STDOUT, STDAUX:, and STDPRN:. ( I am talking about programs written in Microsoft C versions 4.0, 5.0, and 5.1 . I am not aware what happens with just run of the mill programs) BIG SURPRISE: although you can get at these handles, if you do a close on them, the handle is not released for other uses. Msdos allows 20 file handles max per program, and these are allocated from the pool defined by files=xxx. Therefore programs are in big trouble if they need more than 15 open files at one time. You can have a pool bigger than 20 file handles, but only 20 per program. There are certainly kludges around this but I am not interested. I hope this is of some interest to someone, since I spent way to long isolating the problem, let alone typing in this monalogue. Philip Freidin @ AMD SUNYVALE on {favorite path!amdcad!philip) Section Manager of Product Planning for Microprogrammable Processors (you know.... all that 2900 stuff...) "We Plan Products; not lunches" (a quote from a group that has been standing around for an hour trying to decide where to go for lunch)
dixon@control.steinmetz (walt dixon) (08/18/88)
There are really two separate issues here. Each program segment prefix (PSP) contains the address of a data structure known as the Job File Table (JFT). The default JFT is itself part of the PSP, but it can be moved. [DOS 3.3 function int 21h ah=67h does this. In older DOS versions one could alter the JFT address within the PSP; this change causes problems when the PSP is cloned.] Each JFT entry is one byte long and contains either a System File Number (SFN) or 0xff. The SFN is an index into another DOS data structure known as the System File Table (SFT). The handle returned by open and create services is an index into the JFT; a JFT value of 0ffh indicates the corresponding handle is unused. Each program has its own PSP, but there is only one copy of the SFT. [There is a separate SFT for FCB access.] The SFT is the focal point for device independent I/O. Each SFT entry contains the file name, (the path portion of the name is removed), file position, owner, reference count, flags, and device driver/device control block (DCB) address. [The DCB is an intermediate data structure for block devices. The DCB contains the address of the block device driver.] Each SFT entry is 35h bytes long. DOS will expand the SFT upto the files= value from config.sys. SFT entries are allocated in groups, but basically form a linked list. The initial block of SFT entries can be found using the undocumented int 21h ah=52h service. ES:[BX+4] contains the address of the first SFT block. When one opens/creates a file, DOS allocates a handle by scanning the JFT until it finds an unused entry, and then allocates an SFT entry. DOS records the current PSP in the SFT owner field and initializes the reference count to 1. If the "no inherit" bit is set, DOS sets the SFT flags field appropriately. After locating the file/device, DOS examines perviously used SFT entries looking for duplicate entries. If an SFT entry already exists, the newly allocated SFT entry is released and the reference count is incremented. Note that only one SFT entry exists for an open file or device no matter how many times it has been opened. When a file/device is closed, DOS uses the handle to get to the SFN which in turn locates the SFT entry. The reference count is decremented. When this count goes to zero, the SFT entry is deallocted. Only the file owner can cause the SFT entry to be deallocated. When you run a program, COMMAND.COM uses the int 21h ah=4bh load service. The load service makes the program resident from disk, clones the parent (in this case command.com) PSP, makes the new program the current PSP, and sets up a termination address. Any files opened by command.com are propagated to the child program. Normally command.com opens stdin, stdout, stderr, stdaux, and stdprn. When a program terminates, DOS scans the PSP and closes all its files and deallocates any memory blocks owned by the PSP. Closing the files inherited by command.com mereley decreases the SFT reference count. When a TSR terminates and stays resident, its open files are not closed. COMMAND.COM and any resident TSRs take up SFT entries. Since an SFT entry takes up a relatively small amount of memory, it is normally not a problem to set files= to a reasonable number. If you like living dangerously, a program which needs an abnormally large number of SFT entries can increase the SFT table size itself and decrease it when it terminates (watch out for TSRs which open files). A more complete description of these data structures can be found in Chapter 4(?) of the revised MS DOS Developer's Guide which will be published sometime soon (Howard Sams is the publisher). [Although I am the author of this chapter, I get no royalties. Just citing a good reference.] Walt Dixon {ARPA: dixon@ge-crd.arpa } {US Mail: GE Corp. R&D } { PO Box 8 } { Schenectady, NY 12345 } {Phone: 518-387-5798 } Standard disclaimers apply.