daveh@marob.masa.com (Dave Hammond) (12/08/89)
Machine/OS: 286-compatible/MSDOS Compiler: Turbo-C 1.5 One method I use to avoid overflow while filling a buffer is to set a pointer to the address of the last element (&buffer[last]), then compare the current buffer position to the end pointer, testing that "current" is never greater than "end". An example fragment might be: some_function(char *buffer, int len) { char *p = buffer; char *e = &buffer[len]; while ((*p++ = getchar()) != EOF && p < e) { ... } ... } This method has never bitten me on Unix/Xenix/*nix systems of any flavor, on machines ranging from 286's to 68K's to Sparcs. Now, in an attempt to port to MSDOS/Turbo-C, this method breaks. The problem occurs when the address resulting from &buffer[len] exceeds 65535. For example, if &buffer[0] is 65535 and len is 100, &buffer[len] becomes 99, making `while (p < e)' immediately false. I was under the impression that (for n > 0) buffer[n] should not yield an address lower than buffer[0]. Is the pointer comparison I am doing non-portable, or otherwise ill-advised ? BTW, I am well aware of alternate methods of checking for buffer overflow, so please do not suggest alternatives. I am interested in why *this* method fails. Respond via e-mail if you feel this is of little net-wide interest. Post if you feel others might benefit. Thanks in advance. -- Dave Hammond daveh@marob.masa.com
c9h@psuecl.bitnet (12/08/89)
In article <257ECDFD.CDD@marob.masa.com>, daveh@marob.masa.com (Dave Hammond) writes: > The problem occurs when the address resulting from &buffer[len] exceeds > 65535. For example, if &buffer[0] is 65535 and len is 100, &buffer[len] > becomes 99, making `while (p < e)' immediately false. > > I was under the impression that (for n > 0) buffer[n] should not yield > an address lower than buffer[0]. Is the pointer comparison I am doing > non-portable, or otherwise ill-advised ? This will *only* occur if you are accessing an element outside of the array's boundaries. -- - Charles Martin Hannum II "Klein bottle for sale ... inquire within." (That's Charles to you!) "To life immortal!" c9h@psuecl.{bitnet,psu.edu} "No noozzzz izzz netzzzsnoozzzzz..." cmh117@psuvm.{bitnet,psu.edu} "Mem'ry, all alone in the moonlight ..."
gs940971@longs.LANCE.ColoState.Edu (glen sunada f84) (12/08/89)
In article <257ECDFD.CDD@marob.masa.com>, daveh@marob.masa.com (Dave Hammond) writes: > Machine/OS: 286-compatible/MSDOS > Compiler: Turbo-C 1.5 > [ stuff deleted in concern of bandwidth] > some_function(char *buffer, int len) > { > char *p = buffer; > char *e = &buffer[len]; > > while ((*p++ = getchar()) != EOF && p < e) { > ... > } > > ... > } > [ more stuff deleted ] > > The problem occurs when the address resulting from &buffer[len] exceeds > 65535. For example, if &buffer[0] is 65535 and len is 100, &buffer[len] > becomes 99, making `while (p < e)' immediately false. > > I was under the impression that (for n > 0) buffer[n] should not yield > an address lower than buffer[0]. Is the pointer comparison I am doing > non-portable, or otherwise ill-advised ? > > Thanks in advance. > > -- > Dave Hammond > daveh@marob.masa.com This question is probably one that a lot of people on the net have gotten bit by or will get bit by, so I will answer over the net. Yes the value of the pointer to the end of the string array is suposed to be larger than the pointer to the beginning of the string array. What is causeing the problems with the comparison in the MS-DOS enviroment is that the 8086 family of micro-processors has a segmented architecture. The net result of this is that in compareing pointers you need to compare both the segment and the offset to determine the placement in memory. Because of the segmented architecture compilers of MS-DOS machines support many memory models. These models are listed below: TINY - One segment - all pointers are a 16 bit offset SMALL - One segment for code and one for data - all pointers are a 16 bit offset but code is in a different segment than data (i.e. pointers to functions have a different segment that pointers to data) - usual default NOTE: The example you give should result in a stack corruption problem MEDIUM - any number of code segments on data segment pointers are non-normalized (i.e. there are many segment:offset pairs that resolve to the same linear address. (segmets are defined on 16 byte boundries wit a size of 64K) ) COMPACT - one code segment any number of data segments non-normalized pointers LARGE - any number of code segments, any number of data segments, non-normalized pointers HUGE - and number of code segments, any number of data segments, normalized pointers There exists also another difference for non-normalized and normalized pointers. The non- normalized pointers wrap around at 64K (65536 -> 0) and do not adjust the segment. Therefor, in the code fragment you posted when p is incremented it points to offset zero in the same segment as the start of the array, not the segment starting 64K above the one that holds the start of the array. But, a normalized pointer has it's offset automatically reset to keep the offset less than 16 bytes. To do this the segment is adjusted. This means that using huge pointers in the example above should (CMA - Cover My A**) fix the problem with wrap around. There is a dis-advantage to working with huge pointers - they take more time because of the normalization. I hope this quick overview of the architecture of the IBM-PC helps and is not just a waste of network bandwidth. BTW - this segmentation is not a poblem with *NIX on a PC because *NIX automatically uses the HUGE memory model for an 8088, 8086, 80188, 80186, and 80286 and either the HUGE or the protected mode flat memory model for the 80386 (unfortunately this model is not available to DOS on a 80386 beacuse DOS runs in real mode not protectred mode). Glen U. Sunada gs940971@longs.LANCE.ColoState.EDU
sdawalt@wright.EDU (Shane Dawalt) (12/10/89)
in article <257ECDFD.CDD@marob.masa.com>, daveh@marob.masa.com (Dave Hammond) says: > > The problem occurs when the address resulting from &buffer[len] exceeds > 65535. For example, if &buffer[0] is 65535 and len is 100, &buffer[len] > becomes 99, making `while (p < e)' immediately false. > That same problem also haunted me: Could a block be allocated such that the block was split in two by the segment boundry? I was inclined to believe not. I was correct for once. The question was asked in the Compuserve Borland forum (BPROGB). Both Borland techs and forum members assured me this cannot happen unless I attempt to access outside of the allocated block. Shane sdawalt@cs.wright.edu
ejp@bohra.cpg.oz (Esmond Pitt) (12/11/89)
In article <257ECDFD.CDD@marob.masa.com> daveh@marob.masa.com (Dave Hammond) writes: > >One method I use to avoid overflow while filling a buffer is to set a >pointer to the address of the last element (&buffer[last]), then compare >the current buffer position to the end pointer, testing that "current" >is never greater than "end" ... > >This method has never bitten me on Unix/Xenix/*nix systems of any >flavor, on machines ranging from 286's to 68K's to Sparcs. Now, in an >attempt to port to MSDOS/Turbo-C, this method breaks. That's because the last element is not &buffer[last] but &buffer[last-1], and so you should test for <= &buffer[last-1], not < &buffer[last]. You are incrementing a pointer to point outside the object, and this is not guaranteed to work under _any_ implementation of C. Your code should read: some_function(char *buffer, int len) { char *p = buffer; char *e = &buffer[len-1]; while ((*p++ = getchar()) != EOF && p <= e) { ... } ... } -- Esmond Pitt, Computer Power Group ejp@bohra.cpg.oz
bill@twwells.com (T. William Wells) (12/11/89)
In article <232@bohra.cpg.oz> ejp@bohra.UUCP (Esmond Pitt) writes:
: That's because the last element is not &buffer[last] but &buffer[last-1],
: and so you should test for <= &buffer[last-1], not < &buffer[last].
: You are incrementing a pointer to point outside the object, and this is
: not guaranteed to work under _any_ implementation of C.
Not only is this going to work in most implementations, it is
*required* to work by the ANSI standard.
My guess is that his object is exactly 64K bytes long and he is
running into the silly segmentation problems of the 8086. There
are two fixes: use an object of no more than 64K-1 bytes or use
whatever model it is that does the right thing with objects of
size 64K and larger.
---
Bill { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/12/89)
In article <232@bohra.cpg.oz> ejp@bohra.UUCP (Esmond Pitt) writes: | That's because the last element is not &buffer[last] but &buffer[last-1], | and so you should test for <= &buffer[last-1], not < &buffer[last]. | You are incrementing a pointer to point outside the object, and this is | not guaranteed to work under _any_ implementation of C. Your code | should read: There seems no reason why "< &buf[last]" should fail. ANSI C allows you to take the address of the element past the end of the array, although it doesn't require thast you be allowed to dereference it. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "The world is filled with fools. They blindly follow their so-called 'reason' in the face of the church and common sense. Any fool can see that the world is flat!" - anon
rob@cs.mu.oz.au (Robert Wallen) (12/12/89)
In article <232@bohra.cpg.oz| ejp@bohra.UUCP (Esmond Pitt) writes: |In article <257ECDFD.CDD@marob.masa.com> daveh@marob.masa.com (Dave Hammond) writes: |> |>One method I use to avoid overflow while filling a buffer is to set a |>pointer to the address of the last element (&buffer[last]), then compare |>the current buffer position to the end pointer, testing that "current" |>is never greater than "end" ... You dont work on GNU by any chance, do you? They seem to have this annoying habit. Annoying because its hard to port to most DOS compilers where memory is broken up into stupid little chunks with segment/offset pairs |>This method has never bitten me on Unix/Xenix/*nix systems of any |>flavor, on machines ranging from 286's to 68K's to Sparcs. Now, in an |>attempt to port to MSDOS/Turbo-C, this method breaks. Although K&R sez that you should be able to do p < q where p and q point to members of the same array, Turbo C hangs an extra requirement on you. That is, that the pointers are normallized as well. Either use the HUGE model or normalize the pointers before comparison. This has made my porting of gnu-egrep real tedious to the point of not bothering... Personally, I favor wasting a whole integer and keeping track of the number of elements that I have used already; integer comparison is not only guaranteed to work, it should be fasted given sizeof(int)<sizeof(int *) Now a beef. Why check for (p <= pend)? Why not just ( p == pend) ? After all, its not as though your code might accidentally skip over the last entry !!
chris@mimsy.umd.edu (Chris Torek) (12/13/89)
In article <232@bohra.cpg.oz> ejp@bohra.cpg.oz (Esmond Pitt) writes: >... you should test for <= &buffer[last-1], not < &buffer[last]. It makes no difference: > char *p = buffer; > char *e = &buffer[len-1]; > > while ((*p++ = getchar()) != EOF && p <= e) { (For the moment, let us ignore the fact that *p==EOF might well never be true.) Chances are that if the loop for (p = base; p <= &base[SIZE - 1]; p++) works, the loop for (p = base; p < &base[SIZE]; p++) will also work. In particular, the latter loop tends to fail when base+SIZE `wraps around', as it were, to 0. That is, &base[SIZE] is very small, and hence (p < &base[SIZE]) is always false. Suppose this is the case. Now let p be &buf[SIZE - 1], and assume we are working on the former loop (p <= &base[SIZE - 1]). Then, if arithmetic works `as expected', p++ will cause p to wrap around to zero, and the next iteration will test a very small `p' against a large address and the loop will continue. That is, the loop will run forever, rather than not at all. The only time something useful happens is when &base[SIZE] is computed in a different way than (&base[SIZE - 1]) + 1. Any system that does this is not compiling C code correctly. The situation is almost exactly the same as the case of for i := 0 to maxint do ... od; which must be coded in C as for (i = 0;; i++) { ... if (i == INT_MAX) break; } (or any equivalent). The variable i must not be incremented when it is INT_MAX, as the result can either be overflow or INT_MIN (e.g., from 32767 to -32768, or from 2147483647 to -2147483648). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
bright@Data-IO.COM (Walter Bright) (12/14/89)
In article <2908@murtoa.cs.mu.oz.au> rob@murtoa.UUCP (Robert Wallen) writes:
<Now a beef. Why check for (p <= pend)? Why not just ( p == pend) ?
<After all,
<its not as though your code might accidentally skip over the last entry !!
(p <= pend) generates more efficient code (on PC's) than (p == pend), because
on the former only the offset portions of the pointer need to be compared,
while on the latter both the segment and the offset must be compared.
karl@haddock.ima.isc.com (Karl Heuer) (12/14/89)
In article <2908@murtoa.cs.mu.oz.au> rob@murtoa.UUCP (Robert Wallen) writes: >Although K&R sez that you should be able to do p < q where p and q point to >members of the same array, Turbo C hangs an extra requirement on you. That >is, that the pointers are normallized as well. If the pointers point into the same array, which is smaller than 64K, why shouldn't they already be normalized? >Now a beef. Why check for (p <= pend)? Why not just (p == pend) ? I believe this one is in The Elements of Programming Style. To roughly paraphrase from memory (it's been a decade since I read it): when either of two comparisons is correct (because the case where they differ "can't happen"), it's considered better to use the one that would stop the loop in the event that the "impossible" happens. Also, in a segmented implementation (p == q) is generally slower. For equality compares, the compiler has to ensure that the pointers are normalized, but with non-huge use of (p < q) it can assume that the pointers will already be in the same segment. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
peter@ficc.uu.net (Peter da Silva) (12/14/89)
> <Now a beef. Why check for (p <= pend)? Why not just ( p == pend) ? In article <2241@dataio.Data-IO.COM> bright@dataio.Data-IO.COM (Walter Bright) writes: > (p <= pend) generates more efficient code (on PC's) than (p == pend), because > on the former only the offset portions of the pointer need to be compared, > while on the latter both the segment and the offset must be compared. Implementation-dependent micro-optimisation isn't a good reason. A good one is that !p <= pend! is a broader termination condition, and thus safer. -- `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>. 'U` Also <peter@ficc.lonestar.org> or <peter@sugar.lonestar.org>. "Scientific progress is predicated on name-calling and intolerance, not to mention asking lots of stupid questions." -- Mark S Madsen
fogel@dvinci.usask.ca (Earl Fogel) (12/15/89)
daveh@marob.masa.com (Dave Hammond) writes: >Machine/OS: 286-compatible/MSDOS >Compiler: Turbo-C 1.5 > >The problem occurs when the address resulting from &buffer[len] exceeds >65535. For example, if &buffer[0] is 65535 and len is 100, &buffer[len] >becomes 99, making `while (p < e)' immediately false. > >I was under the impression that (for n > 0) buffer[n] should not yield >an address lower than buffer[0]. Is the pointer comparison I am doing >non-portable, or otherwise ill-advised ? > The problem, I believe, is in the Memory Model you are using in Turbo C. If your data does not all fit into one 65535 byte segment, then you have to declare pointers to be 'huge' for them to properly address your buffer.