peter@baylor.UUCP (Peter da Silva) (05/14/86)
> Wait a sec... A Seek() call on the Amiga HAS to read every sector. > > Each sector on the amiga contains a (byte, word, long, whatever) indicating > how much of the sector is used. So, if you want to read byte 600, you can't > just go to the second record-- you have to go to the first, find out how > much space it uses, go to the second, find out how much, etc. If this is the case, and if it is also the case that the Amiga directories don't contain the file names (what do they contain? a hash table?), then C= Amiga has some serious redesigning to do before I or anybody I know actually go out and buy this thing. Incidentally, despite the poor design of the files a seek() does not have to read every sector... a mistake often made by library writers is to try to make seek offsets simple integers. According to the library, the argument to an absolute seek() (lseek(fd, off, 0) or lseek(fd, off, 2)) only needs to be the returned value from a tell() call: it may indeed be a magic cookie like a sector/offset pair (and in fact "magic cookie" is the way it's described in the manual). It is under RSX/11M and on the ATARI 800. This error is not restricted to relative newcomers: there's an IBM mainframe implementation of 'C' that copies all files into fixed record length files when you open them just so you can use UNIX-like seeks. If you want to do a UNIX-like seek, build UNIX-like files (either one long "record" or a bunch of maximum length records) so your offset calculations work. It's not meaningful to seek to an unknown depth in a text file or other weird file anyway. The Lattice 'C' runtime library on the IBM-PC has an obscure bug related to this, by the way, so I'm not surprised they screwed up their Amiga library too... -- -- Peter da Silva -- UUCP: ...!shell!{baylor,graffiti}!peter; MCI: PDASILVA; CIS: 70216,1076
chris@umcp-cs.UUCP (Chris Torek) (05/16/86)
In article <645@baylor.UUCP> peter@baylor.UUCP (Peter da Silva) writes: >a mistake often made by library writers is to try to make seek >offsets simple integers. According to the library, the argument to >an absolute seek() (lseek(fd, off, 0) or lseek(fd, off, 2)) only >needs to be the returned value from a tell() call: it may indeed >be a magic cookie like a sector/offset pair (and in fact "magic >cookie" is the way it's described in the manual). It is under >RSX/11M and on the ATARI 800. Add `f' in front of the names (fseek, ftell) and you are correct; Unix systems, however---or at least 4.3beta---claim that lseek offsets are in `bytes'. (There is no `tell' system call.) Actually, those who write libraries such that fseek takes a byte index either have missed that point, or are very clever: Because PDP-11 and Vax Unix systems have always had fseek and ftell return byte indicies, there is no doubt quite a bit of code that depends on that behaviour. These programs are technically broken, but it is hard to keep a customer happy by informing it [note non-sexist word :-)] of this fact. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
zben@umd5.UUCP (Ben Cranston) (05/17/86)
In article <645@baylor.UUCP> peter@baylor.UUCP (Peter da Silva) writes: >Incidentally, despite the poor design of the files a seek() does not have to >read every sector... a mistake often made by library writers is to try to >make seek offsets simple integers. According to the library, the argument >to an absolute seek() (lseek(fd, off, 0) or lseek(fd, off, 2)) only needs >to be the returned value from a tell() call: it may indeed be a magic cookie >like a sector/offset pair (and in fact "magic cookie" is the way it's described >in the manual). It is under RSX/11M and on the ATARI 800. >This error is not restricted to relative newcomers: there's an IBM mainframe >implementation of 'C' that copies all files into fixed record length files >when you open them just so you can use UNIX-like seeks. If you want to do >a UNIX-like seek, build UNIX-like files (either one long "record" or a bunch >of maximum length records) so your offset calculations work. It's not >meaningful to seek to an unknown depth in a text file or other weird file >anyway. The Software Tools NOTE/SEEK design uses two Fortran integers to store SEEK addresses. The predominant text data format on the Sperry 1100 system is a variable length record, with the record length in a four byte header area. My implementation of the Tools for the Sperry uses the first of the two Fortran integers as the "character address within file" (i.e. 4 X wordaddr) and the second Fortran integer as "character number within this record", that is, how many characters back to go to get to the record header. The code uses this value to get "back in sync" after a random seek. This has the advantage that the first word of the address appears to be a normally-incrementing address, with 4-7 spaces between records. It would be possible to optimize NOTE address storage: if one knew that positions stored would always be at the beginning of record and the file was always ASCII one could keep just the first integer and supply "4" for the second. Oh, and if the character code is "Fieldata" (tm) rather than ASCII then the second word is negative. For historical reasons only... -- "We're taught to cherish what we have | Ben Cranston by what we have no longer..." | zben@umd2.umd.edu ...{seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben
campbell@maynard.UUCP (Larry Campbell) (05/17/86)
> Incidentally, despite the poor design of the files a seek() does not have to > read every sector... a mistake often made by library writers is to try to > make seek offsets simple integers. According to the library, the argument > to an absolute seek() (lseek(fd, off, 0) or lseek(fd, off, 2)) only needs > to be the returned value from a tell() call: it may indeed be a magic cookie > like a sector/offset pair (and in fact "magic cookie" is the way it's described > in the manual). It is under RSX/11M and on the ATARI 800. > -- Peter da Silva > -- UUCP: ...!shell!{baylor,graffiti}!peter; MCI: PDASILVA; CIS: 70216,1076 Wrongo. First, the manual describes the offset as a long (NOT a "simple integer" nor a "magic cookie"). Second, if the offsets AREN'T integers, then almost any database library or package won't work (like dbm(3)), because they often hash keys into offsets in the index file. -- Larry Campbell The Boston Software Works, Inc. ARPA: maynard.UUCP:campbell@harvard.ARPA 120 Fulton Street UUCP: {harvard,cbosgd}!wjh12!maynard!campbell Boston MA 02109
breuel@h-sc1.UUCP (thomas breuel) (05/17/86)
||Wait a sec... A Seek() call on the Amiga HAS to read every sector. || ||Each sector on the amiga contains a (byte, word, long, whatever) indicating ||how much of the sector is used. So, if you want to read byte 600, you can't ||just go to the second record-- you have to go to the first, find out how ||much space it uses, go to the second, find out how much, etc. | |If this is the case, and if it is also the case that the Amiga directories |don't contain the file names (what do they contain? a hash table?), then C= |Amiga has some serious redesigning to do before I or anybody I know actually |go out and buy this thing. No, this is not true. A Seek() call has to read the block allocation list, a linked list of pointers to blocks allocated to a file. Each block, except for the last one, is filled (or at least considered to be filled for the purpose of Seek's). This means that for every 72 blocks in the file (36k) there is another block in the block allocation list. Therefore, for large files (720k), a seek may have to go through as many as 20 blocks in the block allocation list, which can be quite slow. This was the point of my original complaint. Supposedly, something is being done about this in 1.2, although nobody from C/A has said anything specific as to *what* is being done. My proposals are: (1) change the file system to use allocation trees (2) buffer all allocation pointers for a file in memory (as a backwards compatible fix; on a two drive system, this can mean at most 20k of memory and gives the fastest seeks possible). |Incidentally, despite the poor design of the files a seek() does not have to |read every sector... a mistake often made by library writers is to try to |make seek offsets simple integers. According to the library, the argument |to an absolute seek() (lseek(fd, off, 0) or lseek(fd, off, 2)) only needs |to be the returned value from a tell() call: it may indeed be a magic cookie |like a sector/offset pair (and in fact "magic cookie" is the way it's described |in the manual). It is under RSX/11M and on the ATARI 800. Both lseek and fseek are guaranteed to seek to byte offsets under UN*X (2.9). There may be systems with other behaviours, but I would not want to be forced to use such a disaster... |It's not |meaningful to seek to an unknown depth in a text file or other weird file |anyway. Sure it is. You can do a binary search (and gain by doing it) even if you don't have a fixed record length. |The Lattice 'C' runtime library on the IBM-PC has an obscure bug related to |this, by the way, so I'm not surprised they screwed up their Amiga library |too... I'm not sure what you mean by this. The Lattice runtime library has (fortunately) nothing to do with the Amiga file system. Thomas.
jpl@allegra.UUCP (John P. Linderman) (05/19/86)
With a little help from Chris Torek, Peter da Silva noted >>According to the library, the argument to [fseek] needs to be the >>returned value from [an ftell()] call: it may indeed be a magic cookie >>like a sector/offset pair (and in fact "magic cookie" is the way it's >>described in the manual). It is under RSX/11M and on the ATARI 800. and Chris goes on to add >Actually, those who write libraries such that fseek takes a byte >index either have missed that point, or are very clever: Because >PDP-11 and Vax Unix systems have always had fseek and ftell return >byte indicies, there is no doubt quite a bit of code that depends >on that behaviour. These programs are technically broken, but it >is hard to keep a customer happy by informing it [note non-sexist >word :-)] of this fact. This points up the typical bind faced by those who wish to write programs that are portable but not pathetic. I wanted to write a sort program that checkpointed itself... after sorting and writing a coreload of data, the data is (kind of) safe, and, if the system crashes, it isn't necessary to sort it again, if we can just seek past it in the input file. In general, we don't know how many complete records will fit in a coreload until we are part way through a record that is too big to fit. If seek addresses are byte offsets, no big deal. Just keep track of how many bytes have been read after each complete record. But if seek addresses are magic cookies, bad news. We can use ftell to find the find the address of the middle of the current, incomplete, record, but that does us no good at all, since it is the address of the START of the record that we need, and you cannot do cookie arithmetic. Or, we can do an ftell after every complete record, just in case the next record won't fit. Ughleee! That ftell boils down to a system call, and after stdio goes to all the bother of minimizing system calls, we come along and add one after every record. I have tried this, and it results in performance that is anywhere from 10 to 100 times as bad! Completely unacceptable. In the particular case, there is a way out. Since restarting from a checkpoint is rare, and since input may be coming from a pipe anyway, it is both tolerable and preferable to get to byte offset N by doing N getc's (I like to call this a getceek.) But the solution is peculiar to this problem, and I would have chosen to ``technically break'' the program if I had been forced to choose between complete portability and acceptable performance. John P. Linderman Seek and Ye Shall Grind allegra!jpl