std-unix@ut-sally.UUCP (Moderator, John Quarterman) (12/11/85)
Date: Wed, 11 Dec 85 00:32:33 PST From: l5!gnu@LLL-CRG.ARPA (John Gilmore) Section 10.1.1 introduces the terms "interpret" and "translate" for "load" and "dump". Can we just use the familiar terms? I have trouble remembering which is which. [ I think using either of the words "dump" or "restore" (the latter actually used in the section) is a mistake, since they also connote a completely different set of programs than those usually associated with the format in question. -mod ] This section also says: > "The format-interpreting utility is defined such that if it is not a > privileged program, when data is read into the system from the > transportable media, all protection information is ignored. Instead the > user ownership and group owership are set to that of the process context > which is running the utility. All access protection information can be > set to be no more liberal than that of the process that is running the > utility. A privileged version of the utility must have as a minimum, an > option that obeys the protection information stored on the transportable > media, such that this format and the corresponding utility can be used as > a save/restore mechanism." First, this is self-contradictory; it says all protection information is ignored, then says it can be set "no more liberal" than the process. [ The utility is not prevented from reading anything in the data because of any protections associated with it. However, once the utility converts the data into files in the file system, there *is* protection associated with it, as with any file: the utility must set the appropriate protection bits. -mod ] (I would assume the OS takes care of not letting your process set protection more liberal than its own, else there is no security.) I think what it means is that it's not legal for system V "tar" to always chown away the files, which you can't get back. [ Cpio actually does that under System V. Such chowning is a major security problem, much like your phrase: "there is no security", since the numeric user ids on the "tape" may have completely different meanings on the system where it is being read than on the one where it was written. This problem has been addressed in several other places in the standard as well as here. -mod ] Was there some other reason for this paragraph? If not, can we replace the text with something like: "The format-loading utility must not set access protections that cannot be revoked by the user running the utility (whether the user is privileged or not). If it can be run as a privileged utility, an option (or default behaviour) must exist which obeys all the loaded protection information, so it can be used for system backups." --- Also, section 10.1.2 uses confusing terminology with regard to blocks and records. In the data processing world, a block is a big thing and one or more records fit in it (roughly speaking). Like you write 100 records 80 chars long in an 8000 byte block on tape. Has anybody checked the ANSI standard for tape format to see what they call 'em? The Unix standard uses "block" for the small records, "group" for the large things, and also mentions that a "group" might turn into a single tape "record". I also don't see the need for two records of zeros on the end. One should be fine, and it won't break compatability with the Unix tar program, which quits as soon as it sees the first one. Tar should really use EOF rather than this funny end of tape record; this would solve two or three minor problems with it, but would break compatability with existing Unix "tar". (The problems: the tape is positioned wrong after reading a tar archive from a multi-file tape, since the tape mark has not yet been read; you can't just concatenate tar archives to combine their contents (which would make multi-volume tar handling somewhat easier too); extra data is written, which makes it uneconomical to use a large, tape-efficient block size (like a megabyte on streaming cartridge tapes, since this will waste up to a megabyte of space on the tape). What I suggest is that ANSI standard tar's should be required to work OK when reading an archive terminated by EOF (short last block, then zero length result from read()). Suggested wording: An archive tape or file contains a series of records. Each record is of size TRECORDSIZE (see below). Although this format may be thought of as being on magnetic tape, this does not exclude the use of other media. Each file archived is represented by a header record which describes the file, followed by zero or more records which give the contents of the file. At the end of the archive file there may be a record filled with binary zeros as an end-of-file indicator. A conforming system must write a record of zeros at the end, but must not assume that an end-of-file record exists when reading an archive. The records may be blocked for physical I/O operations. Each block of n records (where n is set by the application program creating the archive file) may be written with a single write() operation. On magnetic tapes, the result of such a write is a single tape record. When writing an archive, the last block of records shall be written at the full size, with records after the zero record containing undefined data. When reading an archive, a confirming system shall properly handle an archive whose last block is shorter than the rest. This allows a system to provide an option to write more modern archives, which will be readable by all P1003 conforming systems, but requires that the default be compatible (readable with V7 Unix 'tar'). --- > /* Values used in typeflag field */ > #define REGTYPE '0' /* Regular file */ > #define AREGTYPE '\0' /* Regular file */ > #define LNKTYPE '1' /* Link */ > #define SYMTYPE '2' /* Reserved */ > #define CHRTYPE '3' /* Char. special */ > #define BLKTYPE '4' /* Block special */ > #define DIRTYPE '5' /* Directory */ > #define FIFOTYPE '6' /* FIFO special */ > #define CONTTYPE '7' /* Reserved */ In the header file, less generic names than e.g. "REGTYPE" should be used. How about "TF_REGULAR" (typeflag = regular file). This avoids the well known problem that a #define is a joy (or a pain) forever, especially when some other header file wants to use the same name: /* The typeflag defines the type of file */ #define TF_OLDNORMAL '\0' /* Normal disk file, compat */ #define TF_NORMAL '0' /* Normal disk file */ #define TF_LINK '1' /* Link to dumped file */ #define TF_SYMLINK '2' /* Symbolic link */ #define TF_CHR '3' /* Character special file */ #define TF_BLK '4' /* Block special file */ #define TF_DIR '5' /* Directory */ #define TF_FIFO '6' /* FIFO special file */ #define TF_CONTIG '7' /* Contiguous file */ /* * All other type values except A-Z are reserved for future standardization * and may not be used. A-Z may be used for implementation-dependent * record types. */ The mode fields should use a prefix like "TM_" rather than just "T". Also, TSVTX (the sticky bit) cannot be "reserved" otherwise implementations cannot write archives that have it turned on. Call it implementation-defined, if you must. > All characters are represented in ASCII, using 8-bit characters without > parity. Each field within the structure is contiguous; that is, there is > no padding used within the structure. Each character on the archive media > is stored contiguously. You'd better be more specific. USASCII, with the 7-bit character in the low-order 7 bits and the high-order bit cleared? What about foreign sites with funny characters in their file names? > The fields name, linkname, magic, uname and gname are null-terminated > character strings. Does this mean that when writing an archive, you MUST put in the null, or if the value exactly fills the field, is it OK to not have a null there? In other words, caveat writer or caveat reader? Here again, a prudent course would be to require the writer to do it right, and require the reader to accept it either way. > The mtime field is the modification time of the file at the time it was > archived. It is the ASCII representation of the octal value of the > modification time obtained from the stat() call. This should be spelled out in detail, so the definition of the archive format can stand alone. > ASCII digit `2' is reserved. > ASCII digit `7' is reserved. > ASCII letters `A' through `Z' are reserved for custom implementations. > All other values are reserved for specification in future revisions of the > standard. As I understand standards, something that is reserved canNOT be used by an implementation to extend the standard. This is not the intention here, since I presume compatability with BSD systems (which use 2 for symlinks) is desired. I'm not sure why we don't just standardize symlinks here; after all, not all systems have fifos or contiguous files either... [ They were in there at one point. I wonder what happened to them. -mod ] > The encoding of the header is designed to be portable across machines. This sentence can go... > 10.1.3 Notes > ... > Implementors should be aware that the previous file format did not include > a mechanism to archive directory type files. For this reason, the > convention of using a file name which ended with a slash (/) was adopted > to specify the archiving of a directory. But ANSI standard systems are not required to read such a tape? I think they should be required to read it but not write it. An additional point. The standard does not specify what fields are defined in what record types. For example, is it OK to have garbage in the linkname in record type 0 (normal files)? Is it OK to put zeros in the uid/gid fields if you have filled in the uname/gname/magic fields (say your system does not have numeric uids?). What about the bytes in the header records that are not defined by the structure? Or the bytes beyond the end of a file, in its last record? I'd suggest that we require these fields to be nulls on writing, and require them to be ignored on reading, again for prudence. Volume-Number: Volume 4, Number 9