keie@cs.vu.nl (Ed Keizer) (11/29/89)
This article is posted as a reaction to a debate in comp.os.minix about the definitions for the limits of the values of the integral types. This concerns the definitions of UCHAR_MAX, USHRT_MAX, UINT_MAX, ULONG_MAX, LONG_MIN and LONG_MAX. While writing this article I had to change my opinion on the meaning of the proposed ANSI C standard in this regard. My personal opinion was, and still is, that these constants should be specified as integers, in the mathematical sense of integer. I always thought this was reflected in the Standard. It is not. 1. THE WAY IT IS The relevant part of the standard is section 2.2.4.2 The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX , the following shall be replaced by expressions that have the same type converted according to the integral promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign. The intention of this section is that these constants behave in expressions where `int's are allowed as would objects of the corresponding type. In expressions the value of such objects becomes `int', `unsigned int', or stays `long' or `unsigned long'. The relevant text on these conversions, called the `integral promotions' is in section 3.2: A char, a short int, or an int bit-field, or their signed or unsigned varieties, or an object that has enumeration type, may be used in an expression wherever an int or unsigned int may be used. If an int can represent all values of the original type, the value is converted to an int; otherwise it is converted to an unsigned int. These are called the integral promotions. In other words: If UCHAR_MAX < INT_MAX then UCHAR_MAX has type `int' else UCHAR_MAX has type `unsigned int'. If USHRT_MAX < INT_MAX then USHRT_MAX has type `int'. else USHRT_MAX has type `unsigned int'. UINT_MAX has type `unsigned int'. ULONG_MAX has type `unsigned long'. If the type `char' behaves as `unsigned char' in an implementation, then that implementation should follow the same rules for CHAR_MAX and UCHAR_MAX. The `unsigned'ness of UINT_MAX and possibly other constants can be reached in two ways: 1- using the U or u suffix as in #define UINT_MAX 65535U 2- using octal or hexadecimal constants as in #define UINT_MAX 0xffff As an aside I would like to remark that section 2.2.4.2 specifies that the constant LONG_MIN should be replaced by something that has type `long'. The following definition of LONG_MIN on a 32-bit two's complement machine with sizeof(int)==sizeof(long) is incorrect: #define INT_MIN (-2147483468) because `2147483468' will be interpreted as an unsigned int according to section 3.1.3.2. The correct definition is :-) #define INT_MIN (-2147483467-1) The same kind of definition should be used for INT_MIN. 2. SOME ARGUMENTS ABOUT POSSIBLE CHOICES The basic choice open to the C standard committee was: A- do we define these limits as integer constants. With integer as close to the mathematical meaning as possible. B- do we define these constants as representatives of their type. Both choices cause problems in expressions: A- In function calls where actual parameters are not converted to the type of the corresponding formal parameter. As in functions without prototypes and functions declared with the ellipsis notation. (And K&R C.) Example: insert_uint() unsigned int x ; insert_uint(UINT_MAX) ; The function call will pass a long. These problems can be caught with programs like `lint'. B- In implicit conversions in expressions. An example on a machine where sizeof(int)==sizeof(short): int x ; x= -1 ; if ( x<USHRT_MAX/2 ) ..... Most users would expect that the controlling expression would evaluate to true. Alas, it does not. The type if `USHRT_MAX' and thus of `USHRT_MAX/2' is `unsigned int'. Thus, x will be converted to an `unsigned int' by adding UINT_MAX. The controlling expression will reduce to UINT_MAX-1<UINT_MAX/2, which is false. It is for the very same reason that the standard fixes the types of the constants in 2.2.4.2. An implementor adding a `U' to the definition of INT_MAX would cause problems in situations similar to the example above: int x ; x= -1 ; if ( x<INT_MAX/2 ) ..... It is not possible to fully enforce choice B. A good example is the definition of UCHAR_MAX. According to choice B one would like that to have type `unsigned char' 1- One can not use `255U', because this is a constant with type `unsigned int' according to the typing rules for integer constants from paragraph 3.1.3.2, given below. 2- Neither can one use `(unsigned char) 255' because this invalidates the requirement in paragraph 2.2.4.2 that the expression must be suitable for the #if preprocessing directive. `unsigned' is just another identifier for the preprocessor. It will replace #if UCHAR_MAX>0 by #if (0 0) 255>0 which is definitely not a valid expression. Paragraph 3.8.1 expressly forbids casts in controlling expressions for conditional inclusion. Consequences of choices: A-1 Causes `unexpected' behavior when the limits are used as arguments in certain function calls. A-2 The limits can be used to compare any value with a limit with the relational operators <, <=, >, >= and the bitwise shift operators << and >> without `unexpected' behavior. B-1 The limits can everywhere be used as representatives of their corresponding types. B-2 Using the relational operators and the bitwise shift operators might result in `unexpected' results. Opinion: Either choice causes problems. Choice A with passing the values as parameters, choice B in expressions. The problems caused by choice B are much more unexpected and much more hard to find. Besides, in my viewpoint the limits are there to indicate which interval of the `mathematical' integers is allowed for a certain type. Forcing these constants in a iron cast of types will lead to problems. As indicated above: the C standard committee X3J11 decided to standardize a version of choice B. Last remark: The rationale does not seem to be in accordance with the draft standard in this respect. Page 18 states that: The limits for the maxima and minima of unsigned types are specified as unsigned constants (e.g., 65535u) to avoid surprising widenings of expressions involving these extrema. Ed Keizer Vrije Universiteit Amsterdam Member of ISO/IEC JTC1/SC22/WG14-C The opinions stated here are mine and not those of WG14. Excerpt from the pANS C Standard 3.1.3.2 The type of an integer constant is the first of the corresponding list in which its value can be represented. Unsuffixed decimal: int, long int, unsigned long int ; unsuffixed octal or hexadecimal: int, unsigned int, long int, unsigned long int ; suffixed by the letter u or U : unsigned int, unsigned long int ; suffixed by the letter l or L : long int, unsigned long int ; suffixed by both the letters u or U and l or L : unsigned long int .