jwindley@matt.ksu.ksu.edu (Jay Windley) (11/18/90)
tcurrey@x102a.ess.harris.com (currey tom 76327) writes: >> How do you find the # of and locations of all links to a file? chuck@trantor.harris-atd.com (Chuck Musciano) writes: > This is an easy one. You cannot. > > Well, sort of. You cannot determine which hard links to a file exist >without examining all the directories in a given file system, looking for >the specific inode of the file in question. Does anyone know of a tool to >do this? SunOS% find /foo -inum <num> -print where /foo is the mount point of the filesystem and <num> is the inode number will display the paths of all hard links to an inode. > Symbolic links are tougher. Since sym-links can span file systems and >NFS, you are not guaranteed to ever find all of them, only the ones in files >systems you have access to. You need to use find to find all symbolic links, >and then examine the link to see if it points to the file in question. This >can be tough, since some links are quite circuitous and not at all obvious. If you really want to, from a csh executing with root permissions enter the following command: SunOS# find / -type l -exec file {} \; | egrep <fname> > find.out where <fname> is any hard link to the file in question. This will bog your machine significantly, so use at your own risk. Upon completion, find.out will contain a list of symbolic links to the file. > Easiest way: remove the file in question. Wait for the phone to ring. Well, I suppose this would work too, unless the file in question belongs to your boss :-). -- Jay Windley - CIS Dept. - Kansas State University NET: jwindley@matt.ksu.ksu.edu VOICE: (913) 532-5968 FAX: (913) 532-6722 USnail: 323 Seaton Hall, Kansas State Univ., Manhattan, KS 66506 Obligatory quote: "" -- /dev/null
jfh@rpp386.cactus.org (John F. Haugh II) (11/18/90)
In article <1990Nov17.203012.28052@maverick.ksu.ksu.edu> jwindley@matt.ksu.ksu.edu (Jay Windley) writes: >tcurrey@x102a.ess.harris.com (currey tom 76327) writes: >>> How do you find the # of and locations of all links to a file? >chuck@trantor.harris-atd.com (Chuck Musciano) writes: >> This is an easy one. You cannot. >> >> Well, sort of. You cannot determine which hard links to a file exist >>without examining all the directories in a given file system, looking for >>the specific inode of the file in question. Does anyone know of a tool to >>do this? > >SunOS% find /foo -inum <num> -print > >where /foo is the mount point of the filesystem and <num> is the inode >number will display the paths of all hard links to an inode. Lest I be accused of somehow "breaking" find(1), the above command will not work if there are any directories mounted on "/foo" which contain a file with the same i-number. Since "/" is a directory which is frequently mounted on (;-), I think this is a real problem. The -xdev option can be used to keep find on the same file system, but it is not a "standard" option. -- John F. Haugh II UUCP: ...!cs.utexas.edu!rpp386!jfh Ma Bell: (512) 832-8832 Domain: jfh@rpp386.cactus.org "SCCS, the source motel! Programs check in and never check out!" -- Ken Thompson
bzs@world.std.com (Barry Shein) (11/22/90)
>> How do you find the # of and locations of all links to a file? > > This is an easy one. You cannot. > > Well, sort of. You cannot determine which hard links to a file exist >without examining all the directories in a given file system, looking for >the specific inode of the file in question. Does anyone know of a tool to >do this? % ls -i foo 4924 foo % find /mount-point -inum 4924 -print -- -Barry Shein Software Tool & Die | {xylogics,uunet}!world!bzs | bzs@world.std.com Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD
spaf@cs.purdue.EDU (Gene Spafford) (11/26/90)
In article <BZS.90Nov21184414@world.std.com> bzs@world.std.com (Barry Shein) writes: > >>> How do you find the # of and locations of all links to a file? >> > % ls -i foo > 4924 foo > % find /mount-point -inum 4924 -print Somewhat faster, but requiring root access, you can use the "ncheck" command if your system has it: # ncheck -i 4924 /dev/rsd1c -- Gene Spafford NSF/Purdue/U of Florida Software Engineering Research Center, Dept. of Computer Sciences, Purdue University, W. Lafayette IN 47907-2004 Internet: spaf@cs.purdue.edu uucp: ...!{decwrl,gatech,ucbvax}!purdue!spaf
bzs@world.std.com (Barry Shein) (11/26/90)
>Somewhat faster, but requiring root access, you can use the "ncheck" >command if your system has it: > ># ncheck -i 4924 /dev/rsd1c > >-- >Gene Spafford Three cheers for V6 commands...I just used ncheck tonight to unscrew a file system (well, figure out what fsck was complaining about with some dup'd inodes, ncheck told me the exact paths so I could jot them down and make sure the files were on backup tape just in case, before clearing them, certainly lessened the confusion.) If your system has them I heartily recommend to everyone reading the manual pages for ncheck, dcheck and icheck before you need them. Oft overlooked tools to use before you start saying yes to fsck. -- -Barry Shein Software Tool & Die | {xylogics,uunet}!world!bzs | bzs@world.std.com Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD
jquinn@uk.oracle.com (John Quinn) (11/26/90)
bzs@world.std.com (Barry Shein) writes: >>> How do you find the # of and locations of all links to a file? >> >> This is an easy one. You cannot. >> >> Well, sort of. You cannot determine which hard links to a file exist >>without examining all the directories in a given file system, looking for >>the specific inode of the file in question. Does anyone know of a tool to >>do this? > % ls -i foo > 4924 foo > % find /mount-point -inum 4924 -print >-- > -Barry Shein >Software Tool & Die | {xylogics,uunet}!world!bzs | bzs@world.std.com >Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD ncheck is the tool for the job. John D. Quinn.
jonb@specialix.co.uk (Jon Brawn) (11/26/90)
jquinn@uk.oracle.com (John Quinn) writes: >bzs@world.std.com (Barry Shein) writes: >>>> How do you find the # of and locations of all links to a file? >>> This is an easy one. You cannot. >>> Well, sort of. You cannot determine which hard links to a file exist >>>without examining all the directories in a given file system, looking for >>>the specific inode of the file in question. Does anyone know of a tool to >>>do this? >> % ls -i foo >> 4924 foo >> % find /mount-point -inum 4924 -print >> -Barry Shein >ncheck is the tool for the job. >John D. Quinn. OK, so thats found all the real links. Reading the subject line, how do you find all the *symbolic* links? How do I access a symbolic link file to find out that it IS a symbolic link (I mean from within C, I assume stat( char *filename ) is going to stat the pointed too file?) Does 'find' have a wonderful flag for finding symlinks? Do any of ncheck/icheck/dcheck/fsck/fsdb/anything understand them? Also, what about old crusties like 'tar' and 'cpio'? What do they do? I assume they use good old fashioned stat (not having src I can't check), so aren't they going to be conned into making actual files instead of symbolic links?????? (Oh mummy! I don't like the sound of that!) Just curious this time.... -- Jon? -- jonb@specialix.co.uk "Never be sorry for a might have been."
mjr@hussar.dco.dec.com (Marcus J. Ranum) (11/27/90)
jonb@specialix.co.uk (Jon Brawn) writes: >[...] How do I access a symbolic link >file to find out that it IS a symbolic link (I mean from within C, I >assume stat( char *filename ) is going to stat the pointed too file?) use lstat(2), of course. >Does 'find' have a wonderful flag for finding symlinks? read the manual page on find(1). >Also, what about old crusties like 'tar' and 'cpio'? What do they do? tar and cpio [assuming on cpio - I don't use it] use lstat(2) instead of stat(2), as they should. tar also keeps a table of the inode #s of files it has already dumped, and makes a note to make a hard link to the file instead of just storing 2 copies. mjr. -- Good software will grow smaller and faster as time goes by and the code is improved and features that proved to be less useful are weeded out. [from the programming notebooks of a heretic, 1990]
jik@athena.mit.edu (Jonathan I. Kamens) (11/27/90)
(This is bordering on belonging in comp.unix.programmer rather than comp.unix.internals, but it's close enough to the edge that I see no reason to attempt to shift the discussion into c.u.p, especially since any such attempt will invariably fail miserably. :-) In article <1990Nov26.150716.7268@specialix.co.uk>, jonb@specialix.co.uk (Jon Brawn) writes: |> OK, so thats found all the real links. Reading the subject line, how |> do you find all the *symbolic* links? How do I access a symbolic link |> file to find out that it IS a symbolic link (I mean from within C, I |> assume stat( char *filename ) is going to stat the pointed too file?) The lstat(2) system call (which, at least on my system, appears on the same man page as the stat(2), so it shouldn't have been that difficult for you to find by RTMing {Wow, RTM as a verb! My English teacher would roll over in her grave if she were dead, but she's not.}) "is like stat except in the case where the named file is a symbolic link, in which case lstat returns information about the link, while stat returns information about the file the link references." Furthermore, readlink(2) will read the contents of a symbolic link so that you can find out where it actually points to. |> Does 'find' have a wonderful flag for finding symlinks? Well, my version of find (4.3 BSD) allows you to use "-type l" to search for files that are symbolic links. It does not, however, provide any facility for reading the contents of the link. It also doesn't follow links -- they are treated as files, and where they point to is irrelevant. I believe that there are other versions of find that do things differently. |> Do any of ncheck/icheck/dcheck/fsck/fsdb/anything understand them? For ncheck, icheck, dcheck, and fsck, the answer is almost certainly no -- a symbolic is just treated as a file with an extra bit set in the inode; that bit is ignored for the purpose of filesystem consistency checks and path tracing of the type that ncheck, icheck, dcheck, and fsck do. I would suspect that this is also the case for fsdb, although I've never used it so I don't know for sure. |> Also, what about old crusties like 'tar' and 'cpio'? What do they do? |> I assume they use good old fashioned stat (not having src I can't check), |> so aren't they going to be conned into making actual files instead of |> symbolic links?????? (Oh mummy! I don't like the sound of that!) *Smart* versions of tar and cpio will use lstat instead of stat so that they can detect and correctly archive symlinks, the same way that tar detects hard links and recreates them when extracting an archive. When a vendor adds symbolic links to their filesystem and then forgets to update things like tar and cpio to understand them, they are just being stupid. I believe that most systems that have symlinks also have a version of tar that supports them, but there are some systems that have symlinks but don't have a version of cpio that understands them. -- Jonathan Kamens USnail: MIT Project Athena 11 Ashford Terrace jik@Athena.MIT.EDU Allston, MA 02134 Office: 617-253-8085 Home: 617-782-0710
bhoughto@cmdnfs.intel.com (Blair P. Houghton) (11/27/90)
In article <1990Nov26.201925.1251@athena.mit.edu> jik@athena.mit.edu (Jonathan I. Kamens) writes: >find by RTMing {Wow, RTM as a verb! My English teacher would roll over in her "RTFM" has always been a verb. A second-person-imperative, at that. --Blair "I can feel the vorticial winds from your rapidly rotating English teacher even now..."
mchinni@pica.army.mil (Michael J. Chinni, SMCAR-CCS-E) (11/27/90)
In article <BZS.90Nov21184414@world.std.com> bzs@world.std.com (Barry Shein) writes: > >>> How do you find the # of and locations of all links to a file? >> > % ls -i foo > 4924 foo > % find /mount-point -inum 4924 -print Did I miss something here? I thought the original poster wanted to find all links (hard AND symbolic). The method shown above I always thought only found hard links since symbolic links have different inode numbers. Is this right - this only finds hard links ? If so, how would you find all symbolic links to the file ? /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/ Michael J. Chinni US Army ARDEC Picatinny Arsenal, New Jersey ARPA: mchinni@pica.army.mil UUCP: ...!uunet!pica.army.mil!mchinni /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
bzs@world.std.com (Barry Shein) (11/27/90)
From: jonb@specialix.co.uk (Jon Brawn) >OK, so thats found all the real links. Reading the subject line, how >do you find all the *symbolic* links? How do I access a symbolic link >file to find out that it IS a symbolic link (I mean from within C, I >assume stat( char *filename ) is going to stat the pointed too file?) All the questions you ask are adequately explained in the manual pages (at least they are in my manual pages, stock SunOS 4.0.3). Not sure what would be the use of reading them back to you but if perchance you don't have access to the appropriate man pages send me mail and I'd be glad to send you the relevant excerpts. -- -Barry Shein Software Tool & Die | {xylogics,uunet}!world!bzs | bzs@world.std.com Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD
spaf@cs.purdue.EDU (Gene Spafford) (11/27/90)
Okay, the question is, how do you find all links to a file? Hard links are easy. Barry & I have shown two ways -- using find and ncheck -- to do it. Hard links, by their nature, must always reside in the same disk partition as the file system entry they describe (doesn't have to be a file -- links can be to directories, FIFOs, devices, etc...anything describeable by an i-node). Anyhow, the question was then posed, as posed before, how to find all the symbolic links. The simple answer given earlier is "You can't." I'll give the same answer, but also try to explain it. A symbolic link contains a pathname to the object it describes. Therefore, the symbolic link can exist on another disk partition and still be valid. That means that there could be a link to your file but it is not "active" right now. For instance, if the partition on which the link exists is not currently mounted, the link isn't there, but as soon as the partition is mounted it is a link to your file. Or consider this. If I mount a partition on /mnt, and create a symbolic link to the ../etc/passwd file, that works until I unmount and remount the file system on /usr/tmp in which case the link no longer works. Other examples come to mind, but the same thing happens. There could be lots of links to your file, but they currently aren't available. Now let's refine the problem some and ask how to find all currently available symbolic links (mounted in proper places, etc) that point to a specific file system entry. That's not simple, but it is possible-- however, I don't know of a way to do this with standard commands. You need to compare major dev, minor dev, i-node triples to make an exact determination, and I can't think of a standard utility that reports the device numbers. You can't just compare contents, because it is possible that a copy of a file could be made on another parition and end up with the same i-node number. Unlikely, but possible. The way to do it is to write a program that stats your object, and gets the device numbers and i-node number. Then scan the file system (using find, for instance) looking for symbolic links. For each one found, fetch the device numbers and compare against your saved trio. Voila. Not simple, but it will work. -- Gene Spafford NSF/Purdue/U of Florida Software Engineering Research Center, Dept. of Computer Sciences, Purdue University, W. Lafayette IN 47907-2004 Internet: spaf@cs.purdue.edu uucp: ...!{decwrl,gatech,ucbvax}!purdue!spaf
tif@doorstop.austin.ibm.com (Paul Chamberlain) (11/27/90)
In article <12575@medusa.cs.purdue.edu> spaf@cs.purdue.edu (Gene Spafford) writes: >Okay, the question is, how do you find all links to a file? >Hard links are easy. ... symbolic links ... "You can't." This isn't the answer you're looking for but there is a program that should be in the archives called "ll" to list links. It is a simple but fast tree walk. If it doesn't already support symbolic links, it seems like it would be easy to add. In fact, it seems that all you'd have to do is make sure it doesn't know about symbolic links and it would, by definition of stat with respect to symbolic links, find them. Paul Chamberlain | I do NOT represent IBM. tif@doorstop, sc30661 at ausvm6 512/838-7008 | ...!cs.utexas.edu!ibmchs!auschs!doorstop.austin.ibm.com!tif
jonb@specialix.co.uk (Jon Brawn) (11/27/90)
mjr@hussar.dco.dec.com (Marcus J. Ranum) writes: >jonb@specialix.co.uk (Jon Brawn) writes: >>[...] How do I access a symbolic link >>file to find out that it IS a symbolic link (I mean from within C, I >>assume stat( char *filename ) is going to stat the pointed too file?) > use lstat(2), of course. Thanks. >>Does 'find' have a wonderful flag for finding symlinks? > read the manual page on find(1). Hmm. And? SCO Unix doesn't appear to document it - hang on a mo I'll look at Interactive... ...nope. So, having RTFM, I find nothing useful. The question remains unanswered. >>Also, what about old crusties like 'tar' and 'cpio'? What do they do? > tar and cpio [assuming on cpio - I don't use it] use lstat(2) >instead of stat(2), as they should. tar also keeps a table of the inode >#s of files it has already dumped, and makes a note to make a hard link >to the file instead of just storing 2 copies. I know how tar and cpio handle regular files and device nodes. I want to know about symbolic links. I would hope that tar would copy the contents of the file. That would be do much more useful than trying to do a restore, and discovering that what you thought you had backed up as /usr/data_base/main_data_file was in actual fact just a sixty four character pathname to an obscure corner of the file system, and that you had, in fact, lost everything in the last crash... ...but I guess tar will probably just backup the symbolic path name anyway. >mjr. >-- > Good software will grow smaller and faster as time goes by and >the code is improved and features that proved to be less useful are >weeded out. [from the programming notebooks of a heretic, 1990] So, in summary: I asked (quite nicely) how you played with symbolic links at a fairly nuts'n'bolts level, and got told to RTFM. Now, TFMs don't mention it because symbolc links are not in the ``currently popular'' releases. I tell people to RTFM on a regular basis. I don't (usually) need to be told to do it myself. When something new comes along, I like to be able to ask those more priviledged than myself for enlightenment, so they can look in their manuals and say, 'Theres this new O/S call that is dead ace, it's just like stat, but different, you give it a file name, and it gives you the *real* info on the file'. So, whats the real truth about find, cpio & tar? how do they behave? -- Jon? -- jonb@specialix.co.uk "Never be sorry for a might have been."
bzs@world.std.com (Barry Shein) (11/28/90)
From: mchinni@pica.army.mil (Michael J. Chinni, SMCAR-CCS-E) >If so, how would you find all symbolic links to the file ? In general, it's very difficult. Consider that valid symlinks can point across NFS mounts. I have just created the following: % ln -s /pica.army.mil/root/etc/termcap ./termcap That's a perfectly valid symlink from my machine to your termcap file. I can't resolve it right now because I can't NFS mount your root directory, but it's still a symlink to your termcap file, no? How would you find it? In fact, to be more realistic, NFS mounts are not even commutative (some workstation at your site can mount and point thru a file system on your system, but you can't necessarily see that valid symlink.) But, if we narrow the question to "how can I find all the symlinks I can find?" the find command can locate every symlink (-type l) and they can be resolved and their i-node numbers tested for equality with a target with an only slightly convoluted shell command (or trivially by writing a 20 line C-program which can be -exec'd by find with the path in question and the inode number desired.) % find /mount-point -type l -a -exec testinode '{}' #inum ';' where testinode is just something like: #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> main(argc,argv) int argc; char **argv; { struct stat st; if(!stat(argv[1],&st) && (st.st_ino == atoi(argv[2]))) printf("%s\n",argv[1]); exit(0); } (you could add argv checks) -- -Barry Shein Software Tool & Die | {xylogics,uunet}!world!bzs | bzs@world.std.com Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD
mjr@hussar.dco.dec.com (Marcus J. Ranum) (11/28/90)
jonb@specialix.co.uk (Jon Brawn) writes: >>>Does 'find' have a wonderful flag for finding symlinks? >> read the manual page on find(1). >Hmm. And? SCO Unix doesn't appear to document it - hang on a mo I'll >look at Interactive... >...nope. So, having RTFM, I find nothing useful. The question remains >unanswered. Use "find dir -type l -print" to print the name of symlinks. Sorry I reflexively RTFM'd you - it's surprising that a vendor would sell a UNIX with symlinks and *NOT* have readlink(2), lstat(2), the -l option to find, etc, etc, etc [or some similar and documented features]. In fact, I would not purchase such a UNIX, if I knew of one. Since your version(s?) of UNIX (SCO/interactive?) appear to be missing stuff, according to you, all bets are off - so I'll tell you how ULTRIX/BSD handle these things. >I know how tar and cpio handle regular files and device nodes. I want to >know about symbolic links. tar with the -h makes a copy of the contents of the symbolic link, otherwise it defaults to just storing information about the link so it can recreate the link. >[...] When something new comes along, I >like to be able to ask those more priviledged than myself for >enlightenment, so they can look in their manuals and say, 'Theres >this new O/S call [...] That's why I think Barry and I RTFM'd you - it's *NOT* new stuff, it's been around for several years, and that's forever at the rate new stuff gets kluged into UNIX these days. :) Plus, your original postings somehow led me to think you had symlinks on your machine(s?) - I'm surprised anyone would buy a UNIX that had symlinks but no decent ways to examine them, etc. If you do an "ls -l" (or -L?) does "ls" display the links ? If so, your machine probably does have lstat(2) - try to write a program that calls it (just like stat) and if it's not in TFM, then complain to the vendor. mjr. -- Good software will grow smaller and faster as time goes by and the code is improved and features that proved to be less useful are weeded out. [from the programming notebooks of a heretic, 1990]
ddean@rain.andrew.cmu.edu (Drew Dean) (11/28/90)
There seems to be a simple problem here. Symbolic links come from an old BSD release (sorry, I forget my Un*x history, was it 4.2 or 4.1 or earlier; it's not in _The Design and Implementation of the 4.3BSD UNIX Operating System_), and the poster is trying to use them on System V. Now, it looks like Sys V is broken (what's new :-)), at least with respect to things like man pages. Since a great deal of Usenet (especially the portion on the Internet) runs a BSD-derived Unix, the proper answer for BSD is RTFM, because it's all there. For the record, tar (at least on the Unices I've used, which include Mach, 4.3 BSD, SunOS 3.{2,5} & 4.0.3), does NOT follow symlinks, it merely copies them, unless given the h option. I quote from the SunOS 4.0.3 man page tar(1), under options: h Follow symbolic links as if they were normal files or directories. Normally, tar does not follow symbolic links. Under BUGS we have the following: There is no way selectively to follow symbolic links. Does this answer all the questions ? Drew Dean Drew_Dean@rain.andrew.cmu.edu
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/28/90)
In article <1990Nov27.155015.24837@specialix.co.uk> jonb@specialix.co.uk (Jon Brawn) writes: [ on symbolic links ] > I would hope that tar would copy the contents of the file. tar must not do this by default. ``These files now take up seventeen times as much space, because we moved them from another disk via tar.'' Ungood. ---Dan
lyndon@cs.athabascau.ca (Lyndon Nerenberg) (11/29/90)
jonb@specialix.co.uk (Jon Brawn) writes: >I would hope that tar would copy the contents of the file. That would be >do much more useful than trying to do a restore, and discovering that what >you thought you had backed up as /usr/data_base/main_data_file was in >actual fact just a sixty four character pathname to an obscure corner of the >file system, and that you had, in fact, lost everything in the last crash... >...but I guess tar will probably just backup the symbolic path name anyway. I don't think you can blame tar for sloppy administration practices. If you don't back up all your file systems, you will have this type of problem regardless of what software you use for your archives. What about the case where you have the following symlinks: /nfs/user0/lyndon/bin/share/mailq --> /usr/local/smail/mailq /usr/local/smail/mailq --> /usr/local/bin/smail /usr/local/bin/smail --> /usr/lib/sendmail where /nfs/user0, /usr, and /usr/local are seperate filesystems. How should tar deal with this? -- Lyndon Nerenberg VE6BBM / Computing Services / Athabasca University {alberta,cbmvax,mips}!atha!lyndon || lyndon@cs.athabascau.ca Packet: ve6bbm@ve6mc [.ab.can.na] The only thing open about OSF is their mouth. --Chuck Musciano
dnichols@uunet.uu.net (DoN Nichols) (11/29/90)
"Jon Brawn says:" > [...] > I know how tar and cpio handle regular files and device nodes. I want to > know about symbolic links. > > I would hope that tar would copy the contents of the file. That would be > do much more useful than trying to do a restore, and discovering that what > you thought you had backed up as /usr/data_base/main_data_file was in > actual fact just a sixty four character pathname to an obscure corner of the > file system, and that you had, in fact, lost everything in the last crash... > ...but I guess tar will probably just backup the symbolic path name anyway. From man tar on my Tektronix 6130 (running a 4.2BSD derivative called UTek) h Force tar to follow symbolic links as if they were normal files or directories. Normally, tar does not follow symbolic links. (It normally just records that the entry is a Symbolic Link to a specific file, whose name is shown.) Gnu tar, on systems with symbolic links, can be compiled with similar options, although I don't know at present whether the same option selector is used. My sources are all on backup media, since I only have 67MB on this AT&T Unix-pc. (Till the new drives arrive :-) [...] > So, in summary: > > I asked (quite nicely) how you played with symbolic links at a fairly > nuts'n'bolts level, and got told to RTFM. Now, TFMs don't mention it > because symbolc links are not in the ``currently popular'' releases. I think that some warning should be posted that frequently RTFM actually is RMFM, where the first M is *MY*. Everyone assumes that all TFM's are identical. Some systems don't even come with a machine-readable FM, so grep-assisted scanning is not practical, and you are left with four linear feet of manuals to juggle on your lap or desktop. I have found cases of options which are documented in others copies of TFM, but not in mine, are still valid when tested. [..] > So, whats the real truth about find, cpio & tar? how do they behave? From the same Tektronix 6130, the man page for cpio says: Cpio does not know about symbolic links, but since it is usually used with find, there is little danger of getting into loops. Also, instead of archiving or copying symbolic links, cpio copies the files pointed to by the links, if they exist. Only the superuser can copy special files. Which all seems to be fairly resonable behavior. (Of course this is on the OS that started symbolic links, so you would hope that they would do it right. > -- > Jon? > -- > jonb@specialix.co.uk > "Never be sorry for a might have been." > -- Donald Nichols (DoN.) | Voice (Days): (703) 664-1585 D&D Data | Voice (Eves): (703) 938-4564 Disclaimer: from here - None | Email: <dnichols@ceilidh.beartrack.com> --- Black Holes are where God is dividing by zero ---
andyc@bucky.intel.com (Andy Crump) (11/29/90)
>>>>> On 26 Nov 90 15:07:16 GMT, jonb@specialix.co.uk (Jon Brawn) said:
Jon> OK, so thats found all the real links. Reading the subject line, how
Jon> do you find all the *symbolic* links? How do I access a symbolic link
Jon> file to find out that it IS a symbolic link (I mean from within C, I
Jon> assume stat( char *filename ) is going to stat the pointed too file?)
Jon> Does 'find' have a wonderful flag for finding symlinks?
Jon> Do any of ncheck/icheck/dcheck/fsck/fsdb/anything understand them?
In SVR4, find as the option parameter of 'l' to -type. Thus
find <dir> -type l -print, will print all files that are a symbolic
link. Tho this doesn't tell you where they point.
--
-- Andy Crump
...!tektronix!reed!littlei!andyc | andyc@littlei.intel.com
...!uunet!littlei!andyc | andyc@littlei.uu.net
Disclaimer: Any opinions expressed here are my own and
not representive of Intel Corportation.
andyc@bucky.intel.com (Andy Crump) (11/29/90)
>>>>> On 27 Nov 90 20:57:42 GMT, ddean@rain.andrew.cmu.edu (Drew Dean) said:
Drew> There seems to be a simple problem here. Symbolic links come from an old BSD
Drew> release (sorry, I forget my Un*x history, was it 4.2 or 4.1 or earlier; it's
Drew> not in _The Design and Implementation of the 4.3BSD UNIX Operating System_),
Drew> and the poster is trying to use them on System V. Now, it looks like Sys V is
Drew> broken (what's new :-)), at least with respect to things like man pages. Since
Drew> a great deal of Usenet (especially the portion on the Internet) runs a
Drew> BSD-derived Unix, the proper answer for BSD is RTFM, because it's all there.
In SVR4, tar has the option L for following symlinks, and by default
does not follow. Quote from the SVR4 manpage for tar.
L Follow symbolic links. This causes symbolic links
to be followed. By default, symbolic links are not
followed.
Cpio also has the same option in SVR4:
-L Follow symbolic links. The default is not to follow
symbolic links.
And find has -type l for symlink types and -follow to determine
whether to follow symlinks or not:
-type c True if the type of the file is c, where c
is b, c, d, l, p, or f for block special
file, character special file, directory,
symbolic link, fifo (named pipe), or plain
file, respectively.
-follow Always true; causes symbolic links to be
followed. When following symbolic links,
find keeps track of the directories visited
so that it can detect infinite loops; for
example, such a loop would occur if a
symbolic link pointed to an ancestor. This
expression should not be used with the -type
l expression.
--- FYI ---
--
-- Andy Crump
...!tektronix!reed!littlei!andyc | andyc@littlei.intel.com
...!uunet!littlei!andyc | andyc@littlei.uu.net
Disclaimer: Any opinions expressed here are my own and
not representive of Intel Corportation.
jonb@specialix.co.uk (Jon Brawn) (11/30/90)
ddean@rain.andrew.cmu.edu (Drew Dean) writes: >There seems to be a simple problem here. Symbolic links come from an old BSD >release (sorry, I forget my Un*x history, was it 4.2 or 4.1 or earlier; it's >not in _The Design and Implementation of the 4.3BSD UNIX Operating System_), >and the poster is trying to use them on System V. Now, it looks like Sys V is >broken (what's new :-)), at least with respect to things like man pages. Since >a great deal of Usenet (especially the portion on the Internet) runs a >BSD-derived Unix, the proper answer for BSD is RTFM, because it's all there. Before I get shot out of the water by AT&T, SCO and/or (&|?) Interactive: SVR3 DOES NOT GENERALLY SUPPORT SYMBOLIC LINKS! Various variations on it may, I don't know. BUT, as they are supported in the general case, they aren't in the manual. The manuals aren't broken! -- Jon? -- jonb@specialix.co.uk "Never be sorry for a might have been."
cpcahil@virtech.uucp (Conor P. Cahill) (12/02/90)
In article <1990Nov27.155015.24837@specialix.co.uk> jonb@specialix.co.uk (Jon Brawn) writes: >>>Does 'find' have a wonderful flag for finding symlinks? >> read the manual page on find(1). >Hmm. And? SCO Unix doesn't appear to document it - hang on a mo I'll >look at Interactive... >...nope. So, having RTFM, I find nothing useful. The question remains >unanswered. The reason that SCO Unix and Interactive UNIX do not document a flag for finding symlinks is that neither OS supports symbolic links. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc., uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
rembo@unisoft.UUCP (Tony Rems) (12/03/90)
In article <ANDYC.90Nov29110617@bucky.intel.com> andyc@bucky.intel.com (Andy Crump) writes: >>>>>> On 27 Nov 90 20:57:42 GMT, ddean@rain.andrew.cmu.edu (Drew Dean) said: > > >Drew> There seems to be a simple problem here. Symbolic links come from an old BSD >Drew> release (sorry, I forget my Un*x history, was it 4.2 or 4.1 or earlier; it's >Drew> not in _The Design and Implementation of the 4.3BSD UNIX Operating System_), Sorry to nitpick here, but symbolic links are covered on page 190-191 of "The Design and Implementation of the 4.3 BSD UNIX Operating System" It's a great book, btw, for the uninititated. -Tony
rembo@unisoft.UUCP (Tony Rems) (12/03/90)
>> >>Drew> There seems to be a simple problem here. Symbolic links come from an old BSD ...stuff deleted... > "The Design and Implementation of the 4.3 BSD UNIX Operating System" > >It's a great book, btw, for the uninititated. Hmm, it must be late ----------/\ make that "uninitiated". Don't need any mail on that one... :-}. -Tony
seanf@sco.COM (Sean Eric Fagan) (12/04/90)
In article <1990Nov27.191633.28103@decuac.dec.com> mjr@hussar.dco.dec.com (Marcus J. Ranum) writes: >Sorry I reflexively RTFM'd you - it's surprising that a vendor would >sell a UNIX with symlinks and *NOT* have readlink(2), lstat(2), the >-l option to find, etc, etc, etc [or some similar and documented >features]. In fact, I would not purchase such a UNIX, if I knew of one. Uhm, SCO doesn't have symbolic links. I don't believe Interactive does, either. If they're not supported, why are you surprised that it's not in the manual? -- -----------------+ Sean Eric Fagan | "*Never* knock on Death's door: ring the bell and seanf@sco.COM | run away! Death hates that!" uunet!sco!seanf | -- Dr. Mike Stratford (Matt Frewer, "Doctor, Doctor") (408) 458-1422 | Any opinions expressed are my own, not my employers'.
zwicky@erg.sri.com (Elizabeth Zwicky) (12/05/90)
> So, whats the real truth about find, cpio & tar? how do they behave?
There is no real truth about find, cpio, and tar, all of which are
relatively notorious for behaving differently on different versions of
UNIX (and, for all I know, may be available for things like MS-DOS,
where all bets are off). What you want to know, presumably, is what
the real truth is about *your* find, cpio, and tar; if the manual
doesn't tell you, I suggest experimentation. (If it does tell you, I
still suggest experimentation if you care a whole lot - often the
manual lies.) At least specify exactly what OS, and yes, the release
number matters.
Just to spread gloom among the populace, let me point out that those
of you who rest secure in the belief that your tar correctly handles
symbolic links, and thus will not erroneously expand your file system,
are unjustifiably optimistic. I know of at least two ways in which a
supposedly well-behaved tar can turn your 10 meg of data from your 16
meg partition into a tape-sucking nightmare, only one of which I
comprehend. There is this cute little concept of files with holes in
them; the file system can omit blocks full of nulls, and essentially
replace them with little signs that say "and when you get here, there
should be 20 meg of nulls, OK?" This is a useful trick, since sparsely
allocated tables can be useful things to write to disk - both core
dumps and dbm databases make use of it lavishly. Unfortunately, you
have to get pretty intimate with the disk to tell that the 20 meg of
nulls aren't there (well, it's not that unfortunate, since it is
rather the point of the exercise, but it is at times inconvenient).
tar in many versions, as a matter of principle, cuddles up to the
filesystem like a user program, and doesn't romp through disk blocks.
Thus, it happily fills in all the holes. I have a friend who
discovered this little trick when he tried to restore a 16 meg root
partition on a machine with 32 meg of VM. There was a core file in it.
The 40-odd meg of data on his tape did not fit well into the 16 meg
partition. Luckily, it was a test restore, and I'm sure you can all
figure out the moral to *that* story.
The other way to turn tar into a raging monster is frankly unlikely to
happen to anyone else; we tarred off an NFS mounted partition from a
VMS machine which had a pathological nameless directory on it.
Something caused the UNIX side to believe that the nameless directory
was some sort of link to ., and tar recursed its little heart out.
There are a lot of reasons why this should not have happened, but it
did, and the only reason that it didn't run until the tape filled is
that it ran out of path name space first, a few recursions down. This
is the only good thing I have to say about the concept of having a 128
character path-name limit in a program, when the OS does not have the
same limit...
Elizabeth Zwicky
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/05/90)
In article <1990Dec5.052124.28435@erg.sri.com> zwicky@erg.sri.com (Elizabeth Zwicky) writes: > Unfortunately, you > have to get pretty intimate with the disk to tell that the 20 meg of > nulls aren't there Hardly. You just look at the file size. Other than the file size, there is no way a portable program can tell the difference between a hole and an allocated block of zeros. If an archiver knows the block size and sees that a file has N holes, it can just squish the first N holes it finds, and write explicit zeros in the remaining zero-filled blocks. ---Dan
jonb@specialix.co.uk (Jon Brawn) (12/06/90)
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >In article <1990Dec5.052124.28435@erg.sri.com> zwicky@erg.sri.com (Elizabeth Zwicky) writes: >> Unfortunately, you >> have to get pretty intimate with the disk to tell that the 20 meg of >> nulls aren't there >Hardly. You just look at the file size. Other than the file size, there >is no way a portable program can tell the difference between a hole and >an allocated block of zeros. If an archiver knows the block size and >sees that a file has N holes, it can just squish the first N holes it >finds, and write explicit zeros in the remaining zero-filled blocks. Umm? really? I wrote this program: #include <stdio.h> #include <fcntl.h> #include <sys/types.h> #include <sys/stat.h> /* ** This program was written on SCO Unix System V release 3.2.something. */ char buffer[1024]; main() { int fd; /* file desctiptor */ int block; /* a block number */ long offset; /* offset at which to write above block */ struct stat statb; /* stat structure used to read the file size */ /* ** set the buffer to a known data pattern: */ memset(buffer,42,sizeof(buffer)); /* ** create a new file */ if ( (fd = creat("hole_file",0666))==-1 ) { perror("cant creat hole_file"); exit(1); } /* ** write ten (sparse) blocks to it */ for ( block=0; block<10; block++ ) { /* ** blocks are at 10K intervals in the file */ offset = block * 10240; /* ** seek... */ if ( lseek(fd,offset,0) != offset ) { perror("cant seek into hole_file"); exit(1); } /* ** ...write */ if ( write(fd,buffer,sizeof(buffer)) != sizeof(buffer) ) { perror("cant write hole_file"); exit(1); } } /* ** close the file */ if ( close(fd) == -1 ) { perror("trouble closeing hole_file"); } /* ** ask the OS how big the file is */ if ( stat("hole_file",&statb)==-1 ) { perror("cant stat hole_file"); exit(1); } printf("stat information for hole_file:\n"); printf("st_size %d\n",statb.st_size); system("ls -ils hole_file"); } And ran it, producing this output: stat information for hole_file: st_size 93184 16314 184 -rw-rw---- 1 jonb soft 93184 Dec 5 18:23 hole_file inode size mode num user group size date time name (blocks) links name name (bytes) The size of the file is indeed 9*10240+1024. Now, please demonstrate to your audience where the holes can be detected? >---Dan Jonb -- ``Let the myth be expelled. I stand here before you. Can you not see me? Do you not hear my voice?''
jgreely@morganucodon.cis.ohio-state.edu (J Greely) (12/06/90)
In article <10960:Dec507:07:4190@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >Hardly. You just look at the file size. Other than the file size, there >is no way a portable program can tell the difference between a hole and >an allocated block of zeros. That only works if 1) stat returns the number of blocks, and 2) statfs (or its equivalent) returns the correct block size. -- J Greely (jgreely@cis.ohio-state.edu; osu-cis!jgreely)
goudreau@larrybud.rtp.dg.com (Bob Goudreau) (12/06/90)
In article <10960:Dec507:07:4190@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > > > Unfortunately, you > > have to get pretty intimate with the disk to tell that the 20 meg of > > nulls aren't there > > Hardly. You just look at the file size. Other than the file size, > there is no way a portable program can tell the difference between > a hole and an allocated block of zeros. If an archiver knows the > block size and sees that a file has N holes, it can just squish the > first N holes it finds, and write explicit zeros in the remaining > zero-filled blocks. By "file size" and "portable", I assume that you are talking about the st_size field of the POSIX.1-defined struct stat. This number means only "the file size in bytes"; it says nothing about how many bytes (or blocks) the file occupies on disk. Some UNIXes with BSD-derived file systems also define a field called st_blocks that reports the number of blocks occupied by the file, but this isn't much help. For one thing, it isn't portable over all UNIXes; for another, it tells you nothing about the number and location of holes in the file. A truly portable method must use only standard functions (such as the ones defined in POSIX.1) and must assume nothing at all about block sizes or any other aspects of the file system structure. The obvious way to do this is to have the archiver program read() all <st_size> bytes of the file while keeping an eye out for long stretches of 0-valued bytes so that it can store them in a special space-saving manner in its archive. The unarchiving step must then perform an lseek() over each such stretch in order to avoid write()ing out potentially space-consuming null bytes. Unfortunately, while such an approach is portable, its performance will leave something to be desired on files with truly tremendous holes in them; much time will be wasted on read()ing the holes. That's why competent archiver utilities such as dump(1M) do in fact get pretty intimate with the system (the file system format, not the disk). By snooping around in the file system that contains the file, dump can quickly locate all holes in the file and avoid reading useless data. ---------------------------------------------------------------------- Bob Goudreau +1 919 248 6231 Data General Corporation goudreau@dg-rtp.dg.com 62 Alexander Drive ...!mcnc!rti!xyzzy!goudreau Research Triangle Park, NC 27709, USA
tchrist@convex.COM (Tom Christiansen) (12/06/90)
In article <1990Dec5.052124.28435@erg.sri.com> zwicky@erg.sri.com (Elizabeth Zwicky) writes: >Unfortunately, you >have to get pretty intimate with the disk to tell that the 20 meg of >nulls aren't there (well, it's not that unfortunate, since it is >rather the point of the exercise, but it is at times inconvenient). Naw, it's not that hard. If you really want to, you could compare the size fields with the blocks field returns by stat(2) and by this derive holey ideas. Or (as I prefer) for applications that are trying to minimize space used (cp should have a flag for this; -z on a few systems), you don't care whether a block of zeroes was or wasn't there: you want to make it a whole to save space. Check each block before you write it, and just lseek ahead if it's all null. This works on a disk for copying files, but won't do you much good on a tape, where some other scheme would have to worked out. I've heard that GNU tar does the right thing here. You want the option because you may judge this too much overhead for the default operation. Of course, having vector compare instructions to check that it's all zeroes speeds this up a bit. :-) --tom -- Tom Christiansen tchrist@convex.com convex!tchrist "With a kernel dive, all things are possible, but it sure makes it hard to look at yourself in the mirror the next morning." -me
zwicky@erg.sri.com (Elizabeth Zwicky) (12/06/90)
In article <109886@convex.convex.com> tchrist@convex.COM (Tom Christiansen) writes: >This [making holes in files if you can by using lseek, whether >or not they were there to begin with] works on a disk for copying >files, but won't do you much good on a tape, where some other scheme >would have to worked out. I've heard that GNU tar does the right >thing here. I've heard that it doesn't, reliably; or, rather, that modern versions of gnutar appear to reliably write files to tape with holes in them, avoiding the "How can it take 8 tapes to dump a 16 meg filesystem?" problem, but have an unfortunate tendency to die attempting to restore the files. But this is hearsay. J, you want to provide some honest-to-God facts here? I have also heard speculations about the existence of programs that do cuddle up to the raw disk and know where the holes in their files are, and get upset if you move them around. Certainly this is theoretically possible, although behaviour to be forcefully deprecated. Does anybody know of programs that actually do this? Elizabeth Zwicky
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/07/90)
In article <JGREELY.90Dec5134716@morganucodon.cis.ohio-state.edu> J Greely <jgreely@cis.ohio-state.edu> writes: > In article <10960:Dec507:07:4190@kramden.acf.nyu.edu> > brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >Hardly. You just look at the file size. Other than the file size, there > >is no way a portable program can tell the difference between a hole and > >an allocated block of zeros. > That only works if 1) stat returns the number of blocks, That's what st_blocks is for. > and 2) statfs > (or its equivalent) returns the correct block size. Which is what statfs is for, but who cares? The point is that an application can depend on st_blocks for information. An archiver should preserve that information. ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/07/90)
In article <1990Dec5.183223.28304@specialix.co.uk> jonb@specialix.co.uk (Jon Brawn) writes: > brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >In article <1990Dec5.052124.28435@erg.sri.com> zwicky@erg.sri.com (Elizabeth Zwicky) writes: > >> Unfortunately, you > >> have to get pretty intimate with the disk to tell that the 20 meg of > >> nulls aren't there > >Hardly. You just look at the file size. Other than the file size, there > >is no way a portable program can tell the difference between a hole and > >an allocated block of zeros. If an archiver knows the block size and > >sees that a file has N holes, it can just squish the first N holes it > >finds, and write explicit zeros in the remaining zero-filled blocks. > Umm? really? I wrote this program: [ ... ] > printf("st_size %d\n",statb.st_size); That is the logical size. The actual size on disk is st_blocks. > Now, please demonstrate to your audience where the holes can be detected? In a previous article. ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/07/90)
In article <1990Dec5.190610.5612@dg-rtp.dg.com> goudreau@larrybud.rtp.dg.com (Bob Goudreau) writes: > In article <10960:Dec507:07:4190@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > > > Unfortunately, you > > > have to get pretty intimate with the disk to tell that the 20 meg of > > > nulls aren't there > > Hardly. You just look at the file size. Other than the file size, > > there is no way a portable program can tell the difference between > > a hole and an allocated block of zeros. If an archiver knows the > > block size and sees that a file has N holes, it can just squish the > > first N holes it finds, and write explicit zeros in the remaining > > zero-filled blocks. > By "file size" and "portable", I assume that you are talking about > the st_size field of the POSIX.1-defined struct stat. No, I was referring to BSD st_blocks. (Is there really no equivalent under System V?) As you say about it: > it tells you nothing about the number and location of holes > in the file. That's quite correct. In the article you're responding to, I wrote ``it can just squish the first N holes it finds, and write explicit zeros in the remaining zero-filled blocks.'' One might infer from this that there is no way to detect the locations of the holes. So what? > A truly portable method must use only standard functions (such as the > ones defined in POSIX.1) and must assume nothing at all about block > sizes or any other aspects of the file system structure. Well, if a POSIX system doesn't have st_blocks, then obviously a portable program can't figure out that a file has holes, so there's no point to figuring out how many holes there are. But every POSIX-based system I've seen does have st_blocks. > The obvious > way to do this is to have the archiver program read() all <st_size> > bytes of the file while keeping an eye out for long stretches of > 0-valued bytes so that it can store them in a special space-saving > manner in its archive. This is only slow on files that do have holes, and then only on long stretches of zeros. > Unfortunately, while such > an approach is portable, its performance will leave something to be > desired on files with truly tremendous holes in them; much time will > be wasted on read()ing the holes. No, there won't be any read() time wasted. There will be CPU time wasted. (Tom points out in another article that vectorization helps here.) Yes, it would be nice to have a way to see where the holes are. ---Dan
jonb@specialix.co.uk (Jon Brawn) (12/07/90)
zwicky@erg.sri.com (Elizabeth Zwicky) writes: >I have also heard speculations about the existence of programs that do >cuddle up to the raw disk and know where the holes in their files are, >and get upset if you move them around. Certainly this is theoretically >possible, although behaviour to be forcefully deprecated. Does anybody >know of programs that actually do this? I worked on one such beast while I was with Systime: a software suite called 'Cobra'. The idea here was to be *so* close to the file system that you only backed up disk blocks that were useful (i.e. not free blocks) and had been changed. It used all sorts of exciting buffering and shared memory tricks to whizz data onto streamer tapes and the such, and had (most of) a wonderful system for telling you which tapes you needed to restore files. It never saw the light of day. I would like to finish it one day.... > Elizabeth Zwicky -- "These opinions where made up on the spur of the moment, to a formula kept secret from prying eyes for hundreds of years, and bear no relationship to my actual beliefs, let alone those of Specialix International" Jon Brawn, jonb@specialix.co.uk "I didn't do it. I wan't there."
jacob@gore.com (Jacob Gore) (12/07/90)
/ comp.unix.internals / brnstnd@kramden.acf.nyu.edu (Dan Bernstein) / Dec 6'90/ > > it tells you nothing about the number and location of holes > > in the file. > > That's quite correct. In the article you're responding to, I wrote ``it > can just squish the first N holes it finds, and write explicit zeros in > the remaining zero-filled blocks.'' One might infer from this that there > is no way to detect the locations of the holes. So what? What's the point then? What do you do when you restore the file to disk? You can't assume that any one of those first N "holes" weren't really zero-filled blocks, so you have to write zero-filled blocks to disk everywhere. If you're only going for space savings in the archive, you may as well cut out all zero-filled blocks, not just the first N. During extraction, they all have to be filled out... Jacob -- Jacob Gore Jacob@Gore.Com boulder!gore!jacob
goudreau@larrybud.rtp.dg.com (Bob Goudreau) (12/08/90)
In article <6647:Dec619:11:3690@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > > > it [st_blocks] tells you nothing about the number and location of > > holes in the file. > > That's quite correct. In the article you're responding to, I wrote > ``it can just squish the first N holes it finds, and write explicit > zeros in the remaining zero-filled blocks.'' One might infer from > this that there is no way to detect the locations of the holes. So > what? First you say that the archiver should perform certain actions on "the holes it finds", then you admit that "there is no way to detect the locations of the holes". So how, pray tell, is it supposed to find them? The only portable way is to examine the file data looking for stretches of nulls; but as I mentioned, this makes your program slower than it has to be. > > A truly portable method must use only standard functions (such as the > > ones defined in POSIX.1) and must assume nothing at all about block > > sizes or any other aspects of the file system structure. > > Well, if a POSIX system doesn't have st_blocks, then obviously a > portable program can't figure out that a file has holes, so there's no > point to figuring out how many holes there are. But every POSIX-based > system I've seen does have st_blocks. Broaden your horizons a little. A vast number of UNIX systems in the world are not BSD-based and do not have st_blocks. Since POSIX.1 also does not require it, any software that relies on st_blocks' presence will be seriously limiting its claims of portability. But even that's beside the point; the real issue is that st_blocks alone gives you very little useful information. Given a file's st_blocks and st_size counts, you can't say for certain that the file doesn't have any holes unless you also have some knowledge of the underlying file system format and its allocation mechanism. (Remember that st_blocks also counts things like indirect blocks and any blocks that may be allocated past the end of the file.) And even if you could determine the number and size of any holes in the file, st_blocks doesn't tell you where they are, so you still have to examine the file data anyway. Since st_blocks doesn't win much for us unless accompanied by other information acquired by non-portable means, we might as well forget about portability and have the archiver munge through the file system structures directly (a la dump(1M)). > > The obvious > > way to do this is to have the archiver program read() all <st_size> > > bytes of the file while keeping an eye out for long stretches of > > 0-valued bytes so that it can store them in a special space-saving > > manner in its archive. > > This is only slow on files that do have holes, and then only on long > stretches of zeros. Er, yes, that's the point, isn't it? We're discussing how to make an archiver that wastes neither time nor tape. > > Unfortunately, while such > > an approach is portable, its performance will leave something to be > > desired on files with truly tremendous holes in them; much time will > > be wasted on read()ing the holes. > > No, there won't be any read() time wasted. There will be CPU time > wasted. (Tom points out in another article that vectorization helps > here.) Yes, there will be read() time wasted; the archiver must read() the entire file a chunk at a time and then check each chunk for zeros. For holes, the read()s shouldn't translate into many actual disk reads (except for the indirect blocks), but you're still making a lot of read() calls that would be totally unnecessary if you avoided the holes entirely. ---------------------------------------------------------------------- Bob Goudreau +1 919 248 6231 Data General Corporation goudreau@dg-rtp.dg.com 62 Alexander Drive ...!mcnc!rti!xyzzy!goudreau Research Triangle Park, NC 27709, USA
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/10/90)
Elizabeth said that ``you have to get pretty intimate with the disk'' to tell that a file has holes, or something like that. She concluded that an archiver can with good conscience restore files with as many holes as possible, hence saving as much space as possible. Under System V this is true. Under BSD it is not. On BSD systems st_blocks exists. I don't care what information you get from it; the fact is that a portable BSD application may make some use of st_blocks. (It might checksum some fields in the inode, for example.) So an archiver that does not properly restore st_blocks on a BSD system is broken. That's all I was trying to say. My other comments were just a description of some tricks an archiver might use to restore st_blocks. ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/10/90)
In article <1990Dec7.192441.24778@dg-rtp.dg.com> goudreau@larrybud.rtp.dg.com (Bob Goudreau) writes: > In article <6647:Dec619:11:3690@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > > > it [st_blocks] tells you nothing about the number and location of > > > holes in the file. > > That's quite correct. In the article you're responding to, I wrote > > ``it can just squish the first N holes it finds, and write explicit > > zeros in the remaining zero-filled blocks.'' One might infer from > > this that there is no way to detect the locations of the holes. So > > what? > First you say that the archiver should perform certain actions on "the > holes it finds", then you admit that "there is no way to detect the > locations of the holes". So how, pray tell, is it supposed to find > them? Sorry. What I meant was that the archiver can just squish the first N zero-filled blocks it finds into holes. Then it writes zeros into the remaining zero-filled blocks. This suffices to restore st_blocks. It may not restore the locations of the holes, but there is no (portable BSD) way to detect those locations, so this isn't a problem. Do you understand this? The point is to restore as much information as possible. st_blocks is perfectly portable within BSD, so a BSD archiver should make every effort to restore it. On the other hand, there is no portable way within BSD to locate where the holes are, so the archiver does not need to restore the holes into their original spots. It just has to get the right number of them. > The only portable way is to examine the file data looking for > stretches of nulls; but as I mentioned, this makes your program slower > than it has to be. Yes, it makes it slower. It does not make it significantly slower. > > > A truly portable method must use only standard functions (such as the > > > ones defined in POSIX.1) and must assume nothing at all about block > > > sizes or any other aspects of the file system structure. > > Well, if a POSIX system doesn't have st_blocks, then obviously a > > portable program can't figure out that a file has holes, so there's no > > point to figuring out how many holes there are. But every POSIX-based > > system I've seen does have st_blocks. > Broaden your horizons a little. A vast number of UNIX systems in the > world are not BSD-based and do not have st_blocks. I'm aware of that. I just haven't seen a POSIX system that doesn't have a BSD-derived filesystem, where struct stat includes st_blocks. And (once again---no offense, but I feel like I'm talking to a wall) it is only important to restore the same number of holes IF there is a way for a program, portable within the environment in question, to figure out the number of holes. It is not important to restore the number of holes if there is no equivalent to st_blocks. (This is what I said starting with ``Well'' above.) In other words, I am talking about a problem specific to a certain environment, so why can't I talk about portability within that environment? > Since POSIX.1 also > does not require it, any software that relies on st_blocks' presence > will be seriously limiting its claims of portability. Once you use some BSD features, you've already limited your claims of portability. What's wrong with also taking advantage of st_blocks? > > This is only slow on files that do have holes, and then only on long > > stretches of zeros. > Er, yes, that's the point, isn't it? We're discussing how to make an > archiver that wastes neither time nor tape. Er, yes, but sometimes you have to pay time for space. As I said before, it would be better to have full information about the locations of holes, but we have to work within the information provided by current systems. > > > Unfortunately, while such > > > an approach is portable, its performance will leave something to be > > > desired on files with truly tremendous holes in them; much time will > > > be wasted on read()ing the holes. > > No, there won't be any read() time wasted. There will be CPU time > > wasted. (Tom points out in another article that vectorization helps > > here.) > Yes, there will be read() time wasted; the archiver must read() the > entire file a chunk at a time and then check each chunk for zeros. It has to read the entire file anyway, if it is going to write() it onto tape. Where are your extra read()s? ---Dan
dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (12/10/90)
In <2469:Dec1001:13:4390@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >...the fact is that a portable BSD application may make some use >of st_blocks. Er...a *nonportable* BSD application may make some use of st_blocks. -- Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com> UUCP: oliveb!cirrusl!dhesi
zwicky@erg.sri.com (Elizabeth Zwicky) (12/11/90)
In article <2469:Dec1001:13:4390@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >Elizabeth said that ``you have to get pretty intimate with the disk'' to >tell that a file has holes, or something like that. She concluded that >an archiver can with good conscience restore files with as many holes as >possible, hence saving as much space as possible. No, actually, Elizabeth didn't say either of those things. And doesn't believe the latter at all and requested counter examples. What I did say is that you cannot tell the difference between a hole and an equivalent number of nulls without reading raw blocks. st_blocks at best tells you how many holes there are; it doesn't tell you *where*. Just as programs may, conceivably, care what st_blocks is (care to name one that does?), they may also care where the holes are (I have no examples of this one either, but it's equally imaginable). I conclude from this that good archivers are not portable. One can arguably conclude that if you want a portable program, you can in good conscience restore files with as many holes as possible, since you can't get it right. Elizabeth Zwicky
goudreau@larrybud.rtp.dg.com (Bob Goudreau) (12/11/90)
In article <2707:Dec1001:26:4290@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > > Sorry. What I meant was that the archiver can just squish the first N > zero-filled blocks it finds into holes. Then it writes zeros into the > remaining zero-filled blocks. OK. I understand your point now. > > The only portable way is to examine the file data looking for > > stretches of nulls; but as I mentioned, this makes your program slower > > than it has to be. > > Yes, it makes it slower. It does not make it significantly slower. I guess it depends on how many holey files you have, and how big their holes are. Most of the read()s over the holes sould come fairly cheap, but you now also have the additional step of examining the input data looking for stretches of zeros. > > > > Unfortunately, while such > > > > an approach is portable, its performance will leave something to be > > > > desired on files with truly tremendous holes in them; much time will > > > > be wasted on read()ing the holes. > > > No, there won't be any read() time wasted. There will be CPU time > > > wasted. (Tom points out in another article that vectorization helps > > > here.) > > Yes, there will be read() time wasted; the archiver must read() the > > entire file a chunk at a time and then check each chunk for zeros. > > It has to read the entire file anyway, if it is going to write() it onto > tape. Where are your extra read()s? No, my point is that dump(1M) *doesn't* have to read() the entire file; by examining the file system directly, it can determine in advance exactly where the holes are and thus avoid read()ing through them. The only data it need read are the actual allocated data blocks. Whereas the more portable & straightforward archiving approach must naively read() through (say) a gigabyte of hole, and also analyze all the null bytes thus read in order to verify that they indeed form a hole. ---------------------------------------------------------------------- Bob Goudreau +1 919 248 6231 Data General Corporation goudreau@dg-rtp.dg.com 62 Alexander Drive ...!mcnc!rti!xyzzy!goudreau Research Triangle Park, NC 27709, USA
bzs@world.std.com (Barry Shein) (12/11/90)
Actually, under BSD, you can write a fairly portable program to identify holes without getting intimate with the disk, tho I'm not entirely certain if there are any, um, holes in it, probably. This approace *is* "destructive" of the file (it destroys the holes) tho when you're done you can reconstruct the file with the holes put back. Obviously this won't work if you don't have room for the exapnded file (tho some shenanigans might only require you to have the size of the unexpanded file plus one block available, I dunno, would take some thought.) The basic idea goes like this: 1. Holes always read back as a block of zeros, so only blocks that appear to be filled with zeros are interesting. 2. If you rewrite a real hole with all zeros (still with me?) the number of blocks in the file will change, a stat() will indicate this. Here's a basic program (which could be improved in various ways, but illustrates the idea) which prints out which blocks in the file are holes, have fun picking holes in it (at least grant me that I said it was BSD-only)! -------------------- #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/file.h> main(argc,argv) int argc; char **argv; { int i; int fd; char *buf; struct stat st, nst; if((fd = open("TMPTMP",O_RDWR)) < 0) { perror("open"); exit(1); } if(fstat(fd,&st) < 0) { perror("stat"); exit(1); } buf = (char *)malloc(st.st_blksize); for(i=0;;i++) { if(read(fd,buf,st.st_blksize) != st.st_blksize) exit(0); if(allzeros(buf,st.st_blksize)) { lseek(fd,-st.st_blksize,L_INCR); write(fd,buf,st.st_blksize); fstat(fd,&nst); if(nst.st_blocks > st.st_blocks) printf("block %d is a hole\n",i); st = nst; } } } allzeros(p,n) register char *p; register int n; { while(n--) if(*p++ != '\0') return(0); return(1); } -------------------- -- -Barry Shein Software Tool & Die | {xylogics,uunet}!world!bzs | bzs@world.std.com Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD
rbj@uunet.UU.NET (Root Boy Jim) (12/11/90)
In article <BZS.90Nov27120756@world.std.com> bzs@world.std.com (Barry Shein) writes: >From: mchinni@pica.army.mil (Michael J. Chinni, SMCAR-CCS-E) >>If so, how would you find all symbolic links to the file ? The hard way. By looking thru the entire filesystem, examining every file, and determining whether it points to the target. >In general, it's very difficult. Consider that valid symlinks can >point across NFS mounts. Hmmm, I hadn't thought of that. I suppose it all depends on what stat returns for st_dev and st_ino. > % find /mount-point -type l -a -exec testinode '{}' #inum ';' Gee, Barry, using -exec is so passe :-) XARGS is the way to go. Your program testinode should accept multiple arguments. The first arg is special and contains tha name of the target file. If the st_dev and st_ino match, print the sucker. > -Barry Shein > >Software Tool & Die | {xylogics,uunet}!world!bzs | bzs@world.std.com >Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD -- Root Boy Jim Cottrell <rbj@uunet.uu.net> Close the gap of the dark year in between
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/12/90)
In article <1990Dec10.191522.2757@erg.sri.com> zwicky@erg.sri.com (Elizabeth Zwicky) writes: > In article <2469:Dec1001:13:4390@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >Elizabeth said that ``you have to get pretty intimate with the disk'' to > >tell that a file has holes, or something like that. She concluded that > >an archiver can with good conscience restore files with as many holes as > >possible, hence saving as much space as possible. > No, actually, Elizabeth didn't say either of those things. Well, sorry, I thought it was Elizabeth who said ``you have to get pretty intimate with the disk to tell that the 20 meg of nulls aren't there'' in <1990Dec5.052124.28435@erg.sri.com>. And who agreed in a later article with Tom's conclusions. But this is besides the point. Does anyone else understand the importance of restoring as much stat information as possible? It's an archiver's duty to do as good a job as it can. Now Elizabeth's position has been that an archiver cannot do this without going beyond the stat information and reading the raw disk. Other people have agreed that you don't need raw access, but claim that dumps become a lot slower. I'm more of an optimist: 1. On a system without st_blocks, an archiver can lseek past every 0-filled region. The system will automatically use holes wherever possible. (A) This doesn't require raw disk access. (B) Since stat doesn't care about holes, this doesn't destroy any information. (C) This wastes only restore time, not dump time. 2. On a system with st_blocks, an archiver can lseek past the first N 0-filled regions, enough to restore st_blocks; and then it can write explicit zeros in the rest. Even if it doesn't know the block size, it can use trial and error to get the right st_blocks, as Barry illustrated in a previous article; since most files in practice do not have holes, this will rarely be necessary. (A) This does not require raw disk access. (B) st_blocks is restored as we want. (C) This wastes only restore time, not dump time; and it only wastes restore time on files that actually do have holes. 3. On a system with full information about the locations of holes, an archiver can trivially record the locations and lseek appropriately on restore. (A) This does not require raw disk access. (B) All stat information is restored as we want. (C) This doesn't waste any time. 4. On a system... well, I've never seen any systems that don't fall under #1 or #2, and hopefully future systems will be under #3. People talking about ``portability'' simply don't understand what's going on here. An archiver ON SYSTEM X is responsible for restoring stat information as returned BY SYSTEM X. It is incredibly asinine to say ``#2 is wrong on an AT&T system''---#2 is not *meant* for an AT&T system! > What I did say is that you cannot tell the difference between a hole > and an equivalent number of nulls without reading raw blocks. > st_blocks at best tells you how many holes there are; it doesn't tell > you *where*. Right! So on a system with st_blocks, the archiver's responsibility is to restore the right number of holes. It can do this by making the first N zero-filled blocks into holes, with no regard to the original positions. This does *not* require access to the raw disk blocks. > Just as programs may, conceivably, care what st_blocks is > (care to name one that does?), they may also care where the holes are > (I have no examples of this one either, but it's equally imaginable). Yes, it is conceivable that a vendor would have a system returning different stat information. Here's the most important point I'm trying to make: On *that* system it is the archiver's responsibility to restore that stat information returned by *that* system. Do you understand this? It is even conceivable that a vendor will provide stat information that can't be restored properly without raw disk access. In your December 5 article you were trying to cast ``gloom'' on archivers for exactly this reason. But that's simply not true for System V or for standard BSD. > I conclude from this that good archivers are not portable. One can > arguably conclude that if you want a portable program, you can in good > conscience restore files with as many holes as possible, since you > can't get it right. No! This is what Tom said, and it is entirely wrong. On a BSD system the right strategy is #2: do what's necessary to restore st_blocks. A program can reasonably depend on that information, so an archiver that doesn't restore st_blocks is buggy. ---Dan
tchrist@convex.COM (Tom Christiansen) (12/12/90)
In article <2993:Dec1202:37:2090@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
:> I conclude from this that good archivers are not portable. One can
:> arguably conclude that if you want a portable program, you can in good
:> conscience restore files with as many holes as possible, since you
:> can't get it right.
:No! This is what Tom said, and it is entirely wrong. On a BSD system the
:right strategy is #2: do what's necessary to restore st_blocks. A
:program can reasonably depend on that information, so an archiver that
:doesn't restore st_blocks is buggy.
Taking my name in vain again, I see. :-)
I'll boldly state that any *application* (dump programs don't count) that
relies on whether or not a block of nulls is or isn't really allocated on
the disk is broken, and therefore it doesn't matter if you put in more
holes than were there. Can anyone show me a case where it would break
something?
--tom
--
Tom Christiansen tchrist@convex.com convex!tchrist
"With a kernel dive, all things are possible, but it sure makes it hard
to look at yourself in the mirror the next morning." -me
jonb@specialix.co.uk (Jon Brawn) (12/13/90)
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >No! This is what Tom said, and it is entirely wrong. On a BSD system the >right strategy is #2: do what's necessary to restore st_blocks. A >program can reasonably depend on that information, so an archiver that >doesn't restore st_blocks is buggy. (This isn't a flame, I'm quite serious now!) Can anyone think up a good use for looking at the st_blocks field? -- "These opinions where made up on the spur of the moment, to a formula kept secret from prying eyes for hundreds of years, and bear no relationship to my actual beliefs, let alone those of Specialix International" Jon Brawn, jonb@specialix.co.uk "I didn't do it. I wan't there."
jgreely@morganucodon.cis.ohio-state.edu (J Greely) (12/13/90)
In article <2993:Dec1202:37:2090@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >Does anyone else understand the importance of restoring as much stat >information as possible? Everyone does, Dan. I'd even wager that there are people who (say it isn't so!) understand it better than you. >Now Elizabeth's position has been that an archiver cannot do this >without going beyond the stat information and reading the raw disk. Not quite. Her original article said "get pretty intimate with", not necessarily "read the raw disk". To find out the actual space used by a file, you need the number of allocated blocks and the block size. The former is only available on systems that support st_blocks; the latter varies from system to system, and if your archiver has to cope with NFS, from filesystem to filesystem. The block size can usually be found with the statfs(3) call, unless the filesystem is mounted from a sun (in which case statfs returns the wrong answer). If you don't have statfs, you can compute the block size from a known hole-less file (a directory is a good choice on a local file system). Then, of course, you have to scan every block in the file to find out which ones are potential holes. Compared to what tar and cpio do, that looks like "pretty intimate" to me. > 1. On a system without st_blocks, an archiver can lseek past every > 0-filled region. The system will automatically use holes wherever > possible. > (C) This wastes only restore time, not dump time. It wastes bunches of dump time; every file has to be scanned for zero-filled blocks, or you waste shitloads of tape. > 2. On a system with st_blocks, an archiver can lseek past the first > N 0-filled regions, enough to restore st_blocks; and then it can > write explicit zeros in the rest. > (C) This wastes only restore time, not dump time; and it only > wastes restore time on files that actually do have holes. If you don't know the block size, this effectively degenerates to #1 above. Even if you do, it still wastes dump time finding the zero-filled blocks (but only in files that are known to have holes). Note that if you find the holes while dumping, you don't waste any time while restoring. >Right! So on a system with st_blocks, the archiver's responsibility is >to restore the right number of holes. Wrong! Its first responsibility is to not copy the contents of the right number of holes into the archive. Its *second* responsibility is to not restore the contents of the holes. Who cares about the value of st_blocks on the original filesystem? -- J Greely (jgreely@cis.ohio-state.edu; osu-cis!jgreely)
boyd@necisa.ho.necisa.oz (Boyd Roberts) (12/13/90)
In article <2993:Dec1202:37:2090@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >Right! So on a system with st_blocks, the archiver's responsibility is >to restore the right number of holes. It can do this by making the first >N zero-filled blocks into holes, with no regard to the original >positions. This does *not* require access to the raw disk blocks. > Sounds like just the argument why you don't want st_blocks. What a nightmare to maintain from an archivers point of view. Sure it can detect the holes and diddle about, but that's revolting. All you really want is the data in the file. Using general purpose archivers on special purpose (holed files) is just a no-no. Programs that create such files should also have programs to dump out the data in the files, which can then be used to recreate them. Dump and restor are special cases. They archive file-systems and it is their responsibility that the file-system structure is preserved correctly. Why System V deprecated them I'll never know. I'm equally puzzled/appalled by those who advocate tar or cpio to back up their file-systems. Those things take an image of a snapshot in time. They don't handle incrementals or file deletions properly. They are used to archive -- not backup. Boyd Roberts boyd@necisa.ho.necisa.oz.au ``When the going gets wierd, the weird turn pro...''
zwicky@erg.sri.com (Elizabeth Zwicky) (12/13/90)
In article <110689@convex.convex.com> tchrist@convex.COM (Tom Christiansen) writes: >I'll boldly state that any *application* (dump programs don't count) that >relies on whether or not a block of nulls is or isn't really allocated on >the disk is broken, and therefore it doesn't matter if you put in more >holes than were there. Can anyone show me a case where it would break >something? Actually, I can produce a very strained example where an otherwise working program that does not know anything about which blocks are filled on disks and which aren't nevertheless breaks if holes are restored in the wrong places. Postulate a program that has data which includes stretches of nulls that will always be nulls, and stretches that may change to other things. It then pre-allocates a file for the data, lseeking over the always null parts, and writing nulls for the parts that may change. Note that it does not care what the results of this are; the OS can make holes for the lseeks or not, as it pleases. However, that program can now assume that it has enough space to run in. If you restore it with all holes, you have freed space that something else may eat before the program wants it. If you restore it with the same number of holes it had before, but in different places, the file is no longer guaranteed to be a static length; when the known-to-be-changeable nulls change, the program may have to fill in a hole, and once again may not have the space it needs. I must admit that I can't produce any live examples of programs that do this, any more than I can produce any live examples of programs that care where their holes are for other reasons, or that care what st_blocks is. But this seems to me to be a defensibly non-broken behaviour. Elizabeth Zwicky
jik@athena.mit.edu (Jonathan I. Kamens) (12/14/90)
In article <1990Dec12.174807.12868@specialix.co.uk>, jonb@specialix.co.uk (Jon Brawn) writes: |> (This isn't a flame, I'm quite serious now!) |> Can anyone think up a good use for looking at the st_blocks field? I'm not sure whether you mean any use in general, or a use specifically related to backup and restore, since that's what's being discussed here. If you mean the former, then I can give you one way st_blocks is used. The expunge program in my undel2 package (available at a comp.sources.unix archive site near you, in volume22, make sure to also get "et" in the same volume :-) has options to report the amount of space made free by the expunging of each file, and the total amount of space freed after all appropriate files are expunged. It does this by multiplying the number of blocks (st_blocks) for each file by DEV_BSIZE (from <sys/param.h> and dividing by 1024 to get kilobytes instead of bytes (Yes, I should probably be calling statfs on the filesystem instead of using DEV_BSIZE, but it hasn't been a problem up to this point :-). The point of this is that when someone expunges files, they want to know how much space was actually freed, not what the total size of all the files in the filesystem is. Those two values would differ in the case of files with holes in them, and in the much more common case of files which are not exactly a multiple of the block size in length, since the space in the partially unfilled block at the end of a file is wasted. -- Jonathan Kamens USnail: MIT Project Athena 11 Ashford Terrace jik@Athena.MIT.EDU Allston, MA 02134 Office: 617-253-8085 Home: 617-782-0710
mcr@Latour.Sandelman.OCUnix.On.Ca (Michael Richardson) (12/14/90)
In article <2993:Dec1202:37:2090@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >Does anyone else understand the importance of restoring as much stat >information as possible? It's an archiver's duty to do as good a job as >it can. In theory, yes. > 1. On a system without st_blocks, an archiver can lseek past every > 0-filled region. The system will automatically use holes wherever > possible. (A) This doesn't require raw disk access. (B) Since stat This sounds like the best solution overall. If the backup format provides a method to store long sequences of zeros (either because it supports holes, or because it does compression), it seems like a good idea to use it. Most backing up is I/O bound (usually the tape) (the common cptree alias is not a form of backup, so I don't feel badly about making that take a little longer) > 2. On a system with st_blocks, an archiver can lseek past the first > N 0-filled regions, enough to restore st_blocks; and then it can > write explicit zeros in the rest. Even if it doesn't know the block While I understand the desire to restore the file exactly the way it was, I curious to hear an example of an application that cares about st_blocks, that would mind having long sequences of zeros turned into holes. Other than such things as swap files or other files that one might prefer to be contiguous, I can't see a reason. (And if your OS supports contiguous files [e.g. RTU] then your backup utilities had better understand them...) "Just because I can't think a reason doesn't mean ..." -- :!mcr!: | The postmaster never | So much mail, Michael Richardson | resolves twice. | so few cycles. mcr@julie.UUCP/michael@fts1.UUCP/mcr@doe.carleton.ca -- Domain address - Pay attention only to _MY_ opinions. - registration in progress.
bzs@world.std.com (Barry Shein) (12/14/90)
> While I understand the desire to restore the file exactly the >way it was, I curious to hear an example of an application that cares >about st_blocks Both ls and du care about st_blocks, to name two which are used fairly often. But I assume they don't count. -- -Barry Shein Software Tool & Die | {xylogics,uunet}!world!bzs | bzs@world.std.com Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD
rbj@uunet.UU.NET (Root Boy Jim) (12/14/90)
In article <1990Dec12.235535.29083@erg.sri.com> zwicky@erg.sri.com (Elizabeth Zwicky) writes: >In article <110689@convex.convex.com> tchrist@convex.COM (Tom Christiansen) writes: >>I'll boldly state that any *application* (dump programs don't count) that >>relies on whether or not a block of nulls is or isn't really allocated on >>the disk is broken, and therefore it doesn't matter if you put in more >>holes than were there. Can anyone show me a case where it would break >>something? You are correct. Holes and blocks both read as blocks of zeros, so how could you tell the difference? Most programs won't have access to the raw (or block) disks, and there are fewer and fewer hackers who could make any sense of (especially the BSD) filesystem anyway. The operating system is free to substitute one for the other at any time, and the user or program would never know the difference. >... Postulate a program that has data which... [does weird stuff] >However, that program can now assume that it has enough space to run in. What are you talking about? The only thing I can think of is quotas. Yes, it's possible that a program may die because of lack of space, but I consider this an administrative or environmental problem. >I must admit that I can't produce any live examples of programs ... Nuff Said. > Elizabeth Zwicky Oh, BTW to those who supposedly care how big the block size is, Don't Worry About It. Just lseek when you see a zero and let the kernel worry about what's big enuf to be a hole. WHILE <read a byte> DO IF <nonzero> THEN <write the byte> ELSE <seek one byte> FI DONE Making this buffered is left to the student as an exercise. People have posted blessed (make holey) copy routines before. It should probably be an option to cp, but it isn't that useful that often, and most database software usually have rebuild programs anyway. -- Root Boy Jim Cottrell <rbj@uunet.uu.net> Close the gap of the dark year in between
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/17/90)
In article <114453@uunet.UU.NET> rbj@uunet.UU.NET (Root Boy Jim) writes: [ holes versus allocated blocks of zeros ] > The operating system is free to substitute one for the other at any time, > and the user or program would never know the difference. Bad idea, if only for the reason Elizabeth pointed out: A program should be able to allocate space on disk and know that it is allocated. This is important for applications that don't want to deal with EDQUOT or ENOSPC on every write() (or close() :-)), or that need to write asynchronously to disk as fast as possible. Many such applications exist; this isn't just a theoretical problem. If anything, the system should lean in the direction of more allocation information, not less. There's no harm in providing stable information. > Yes, it's possible that a program may die because of lack of space, > but I consider this an administrative or environmental problem. I don't care what you call it. I want to (e.g.) tell my sound recorder ``Get ready for fifteen seconds of music.'' It had better reserve space on disk before doing anything else. You're saying it shouldn't be allowed to do that. ``But no!'' you say. ``It can just write blocks full of 1s. We're only talking about 0s.'' Oh? Today you're making allocation untenable for 0-filled blocks. How do we know that tomorrow you won't do the same thing to 1-filled blocks? And every block the next day? It's not the particular bit of information hiding that I'm worried about; it's this general philosophy of hiding details that applications need to see. > Making this buffered is left to the student as an exercise. People > have posted blessed (make holey) copy routines before. It should > probably be an option to cp, but it isn't that useful that often, On the contrary. It's ridiculous that, e.g., core files should explode on Suns just because you forgot to specify, say, -z. ---Dan
rbj@uunet.UU.NET (Root Boy Jim) (12/19/90)
I agree that a system call to allocate a big chunk of disk would be nice. Your example about a music program is well taken. It would also be nice if truncate would deallocate space in the middle of a file. The fact that that I termed it an environmental or administrative problem doesn't mean that it's not a problem. One solution is to just have lots of space. We stop batching news when our uucp spool directory gets below 10M. Since articles are rarely that big, we don't have a problem, even with alt.sex.pictures :-) Why would you want to copy a core file? Just debug it in place, or symlink it. And just how big can a core file be anyway? A few meg? Possibly 10? If you are running big applications, you need lots of room, and then somewhere between 10% to 20% extra. If you know exactly what you need, then preallocation by writing nonzero data might be good. OK, enuf speculation on what should be. Here's what is: Don't ever count on holes being there, figure on having to store it all. Exception: use experience, "Well, this database looks like 100M, but is really a 17M DBM file". Holes are not very well supported. Only lseek can create them, and write never does. Someone has to decide when to look thru a buffer and make a hole or write real data. Since it's not all that hard to do, the few users who care must shoulder the burden. Holes are fragile. If data is moved in any way, they are likely to be filled. Archivers *should* preserve them, but may not. If you care about making holey files, you will need a program to recreate them. And you need space for both files for awhile. Holes are not useful all that often. A database here and there and/or image data or whatever. Programs use bss to store zeros. If your application is heavily based on one of the above, perhaps you would be wise to use a raw disk instead and manage space yourself. -- Root Boy Jim Cottrell <rbj@uunet.uu.net> Close the gap of the dark year in between
gordon@sneaky.UUCP (Gordon Burditt) (12/19/90)
>Sorry. What I meant was that the archiver can just squish the first N >zero-filled blocks it finds into holes. Then it writes zeros into the >remaining zero-filled blocks. There seems to be a prevailing theory that this method is portable. You take the value of st_blocks, the st_size of the file, any other portable fields of the stat structure you want, and the somehow-obtained block size (I'll grant that there may be a portable way of figuring this out, if we limit this to systems that have st_blocks), and <handwave> <mumble> <black magic> obtain this mysterious value of N by way of another mysterious value of "number of blocks of holes we need to leave". Is there a portable way of computing this, without going to the raw disk and without write permission on the original file? You may NOT assume that the number of blocks in an indirect block is known, how many block numbers an indirect block holds, how many blocks are required to record the presence of holes ("excavation licenses"), nor that said numbers of blocks are independent of the order in which the file was filled in. Gordon L. Burditt sneaky.lonestar.org!gordon
goudreau@larrybud.rtp.dg.com (Bob Goudreau) (12/20/90)
In article <114818@uunet.UU.NET>, rbj@uunet.UU.NET (Root Boy Jim) writes: > > I agree that a system call to allocate a big chunk of disk would be > nice. Your example about a music program is well taken. It would > also be nice if truncate would deallocate space in the middle of a > file. Indeed, the latter is already available in V.4 via the F_FREESP command to fcntl(), which allows you to punch holes of arbitrary size at arbitrary locations in a file. Undocumented and so far unimplemented, but hinted at in the code, is the corresponding F_ALLOCSP command that would allow the opposite. Neither F_FREESP or F_ALLOCSP appear yet in the SVID or any other standard, but we can hope that they (or equivalent functionality) will eventually be standardized. ---------------------------------------------------------------------- Bob Goudreau +1 919 248 6231 Data General Corporation goudreau@dg-rtp.dg.com 62 Alexander Drive ...!mcnc!rti!xyzzy!goudreau Research Triangle Park, NC 27709, USA