ross@spam.ua.oz.au (Ross Williams) (06/21/91)
From: jloup@nocturne.chorus.fr (Jean-Loup Gailly) CRIT_004: CONTIGUOUS MEMORY AND TYPING -------------------------------------- >| 2.2 Parameters >| -------------- >| 2.2.1 A conforming procedure must have a parameter list that conveys >| no more and no less information than that conveyed by the following >| "model" parameter list. >| >| [...] >| INOUT memory - A block of memory for use by the algorithm. >| [...] > >How do you deal with segmented architectures such as the 8086 and 286 >which impose a small limit (such as 64K) on the size of a segment? >(There are only a few million machines still using this architecture :-) >You could say that the MEMORY parameter is in fact a small array of >pointers to distinct segments, but how does the caller chose the >size of each segment? Allocating systematically 64K except possibly for the >last segment would generally be a waste of memory. (Take an algorithm >requiring two segments of 40K each.) I wasn't thinking of segmented architectures when I framed the standard as I have not used them much. The only time I did program a PC, I used a Microsoft C compiler which seemed to hide all the nastiness behind a large memory model... Can't we just make all the PCs go away? More seriously I think that the issues about memory raised should be divided into two issues: 1) input and output blocks. 2) working memory. Although Jean didn't seem to raise the problem of input and output blocks in segmented architectures I think it should be dispatched now so there is no confusion; I don't think there is a problem. Objects in different segments can be fed into the algorithm in different blocks within the same context block. No problem. For the memory parameter I agree that the standard as it stands has a serious problem on segmented architectures. >Even on non-segmented architectures, it may be cumbersome to force all >memory used by the algorithm to be contiguous. The data structures used >by a compression algorithm are usually much more complex than a single >linear array, so the algorithm has to map somehow these data structures onto >this linear sequence of bytes. This may be difficult with some progamming >languages. In low level languages such as C this is not so hard; one just writes a simple allocator. In high level languages there are typing and representation problems. One advantage of using a contiguous memory is that it absolutely nails down the amount of memory the algorithm is using. If one is allocating bits and pieces of various types all over the place, it is very easy to lose track of how much memory is actually being used (e.g. hidden bytes for alignments, hidden heap management pointers). By giving the programmer a sandpit with brick walls, there are no excuses. This makes comparing the memory consumption of algorithms easier. >It is possible instead to add an INIT action which would let the compression >algorithm allocate the memory in an optimal fashion and possibly >return a failure boolean (or an Ada exception). Of course you also need >a CLOSE operation to deallocate this memory. One problem here is that many algorithms allocate heap items incrementally as they build their data model. It seems silly for an algorithm that is told that it can have three megabytes to make 100000 allocation calls only to later find out that it only has to process 1K of data! If the standard has to be changed so that the algorithm allocates its own memory then I am in favour of allowing it to do so incrementally, but to ensure that it does not use more than it has been allowed. The main argument against would be the possibility that the algorithm could fail midstream. >Another objection about the MEMORY parameter as proposed is that it is by >definition typeless. Even specific implementations in a strongly typed >programming language would have to use a general type which is not >suited to the algorithm. So the algorithm is forced to use type >conversions (casts in C, unchecked conversions in Ada) which are >generally not portable. For example some implementations of Ada store >an array descriptor before the array data or in a separate location. >(Good implementations avoid the descriptor when the bounds are known >at compile time but this is not required by the language.) Such >descriptors must be constructed by the compiler. The implementer of >the compression algorithm has no way to magically transform a raw >sequence of bytes into a properly structured array together with its >descriptor. > >If you let the algorithm allocate the data, then the standard memory >allocation routine provided by the language can be used. The resulting >pointer(s) (access values in Ada terminology) are opaque objects which >can be stored in the MEMORY parameter. There is also at this point a >necessary type conversion but it is much less troublesome. The chances >of getting back a valid pointer after the reverse conversion are much >higher. (Note that an Ada access value may be quite different from a machine >address on some implementations.) In languages supporting opaque types >such as Ada and C++ it would be preferable to get rid of all the unsafe >type conversions completely and use a different type of the MEMORY >parameter for each compression algorithm. But again this requires >the compression algorithm to export a primitive to allocate this >opaque object since the caller of the compression algorithm no longer >knows how to allocate it. > >In short, I suggest that to avoid problems with segmented architectures >and/or strongly typed languages, the memory used by a compression >algorithm be allocated by the algorithm itself. The IDENTITY action >would still determine the maximum amount of memory that the algorithm >is allowed to allocate. > >Jean-loup Gailly I'm convinced. Although there are lots of advantages in handing over a block of memory to algorithms, I agree that it is too unpleasant in civilized programming languages. I think I made this error because I had C on the brain. When I was writing the standard I had some vague idea idea that conforming procedures in high level languages would somehow be able to turn the memory given to them into something strongly typed (e.g. hand the memory over to be managed by a strongly typed heap manager). I assumed that those writing the child standards for each language would sort out these difficulties. However, I now realize that this was a mistake and that this is a reasonably deep problem that needs to be resolved. I agree with your analysis and propose the following: 1. The identity record specifies the maximum amount of memory to be used. 2. This memory may be allocated from the heap only. (However, the procedure can still use up to 1024 bytes of other memory.) 3. The fresh flag will be eliminated and two more actions added: a. Start new context block - Initialize data structures. b. End new context block - Destroy data structures. 4. The memory parameter will become an INOUT generic pointer in which the algorithm should place a reference to the root of its data structure during an init call. 5. The procedure will convey a status at the end of all calls. This can be done with exceptions in Ada or with an extra OUT parameter or in C as the value of the function. Error conditions will be: memory_allocation_error bad_input_block bad_data_structures and anything else I can think of when I revise the standard. I think that it is a shame to move away from the simplicity of the block of memory parameter. However, I agree that it is necessary. Jean: Thanks for your comments. Do you agree with the proposal as given above? Ross Williams ross@spam.ua.oz.au PS: Please don't think I'm a typeless troglodyte. My SAKDC algorithm is written in Ada and is full of rock-solid highly typed abstractions using great big heap based data structures. It's just that my brain has been warped by coding in C recently.
jloup@nocturne.chorus.fr (Jean-Loup Gailly) (06/22/91)
In article <866@spam.ua.oz>, ross@spam.ua.oz.au (Ross Williams) writes in response to my earlier comments: > Can't we just make all the PCs go away? Agreed. All but one (mine). > One problem here is that many algorithms allocate heap items > incrementally as they build their data model. It seems silly for an > algorithm that is told that it can have three megabytes to make 100000 > allocation calls only to later find out that it only has to process 1K > of data! If the standard has to be changed so that the algorithm > allocates its own memory then I am in favour of allowing it to do so > incrementally, but to ensure that it does not use more than it has > been allowed. I also agree that incremental allocation is important for some algorithms. But I don't think that you need a bullet proof protection against algorithms which do not follow the rules. If you cannot prove from the source of your algorithm that the amount of required memory is bounded by a certain value you are in trouble anyway. Even in your original proposal you could not guard against malicious use of the stack or heap without checking the source code. > I agree with your analysis and propose the following: > > 1. The identity record specifies the maximum amount of memory to be > used. > 2. This memory may be allocated from the heap only. (However, the > procedure can still use up to 1024 bytes of other memory.) I think you mean that the memory must not be allocated before the "Initialize data structures" call and must be deallocated no later than the "Destroy data structures" call. This is sufficient to disallow static data. There is no need to make an explicit reference to "heap". I am not convinced that you must disallow stack usage. Many architectures do not have small limits on the stack size, and stack space is cheaper than heap space. Of course the part of the data structures which must live across compression calls must be in the heap and accessed through the MEMORY parameter, but temporary data could be on the stack. (For example the LZW uncompress requires a stack to reverse the bytes of a code string.) > 4. The memory parameter will become an INOUT generic pointer in which > the algorithm should place a reference to the root of its > data structure during an init call. Please allow this pointer to have a meaningful type for strongly typed languages. In other words, the type may be different for different algorithms. In C terminology allow (struct foo *) instead of (void *). > Jean: Thanks for your comments. Do you agree with the proposal as > given above? Yes. In particular it is nice to allow incremental allocation. Jean-loup jloup@chorus.fr PS: My only strong disagreement is to be named Jean :-) Hyphenated french names must not be split. > PS: Please don't think I'm a typeless troglodyte. My SAKDC algorithm > is written in Ada and is full of rock-solid highly typed abstractions > using great big heap based data structures. It's just that my brain > has been warped by coding in C recently. I have myself fallen from the Ada language design team down to earth (C++) :-) But I'd like to get your SAKDC.