dant@tekla.TEK.COM (Dan Tilque;1893;92-789;LP=A;60aC) (01/09/88)
A question occured to me about hex escapes. Where does the padding nybble go when an odd number of hex digits are in the escaped string? For example: "\x1A2B3 example" The escaped constant has 5 hex digits which fit into 2.5 bytes. Some byte has to be padded with (I assume) null bits. Which byte is it: the initial or the trailing? Does the proposed standard say? This is something that was not needed with octal escapes since they always fit into one byte. Perhaps the standard should be changed to require an even number of hex digits in an escaped string. --- Dan Tilque
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/11/88)
In article <2938@zeus.TEK.COM> dant@tekla (Dan Tilque) writes: >A question occured to me about hex escapes. Where does the padding nybble >go when an odd number of hex digits are in the escaped string? For example: > "\x1A2B3 example" >The escaped constant has 5 hex digits which fit into 2.5 bytes. No! First, it is not specified how big a char (byte) must be, except that it must be AT LEAST 8 bits. It could be much larger, although 16 bits is the largest I expect to find in any C implementation in the near future. Next, the escape \x1A2B3 represents a SINGLE char, not a sequence of chars. If the number is too large to fit in a single char, as it would be for chars through 16 bits in size, then how it is interpreted (still as a SINGLE char) is implementation- defined. Generally, excess high-order bits are discarded, although that is not required. The corresponding character literal '\x1A2B3' is, as always, an int (NOT a char, no matter how small the value). Again, if the value doesn't fit within a SINGLE char, the interpretation is up to the implementation. Generally, the overflow bits are used to assemble additional char subfields within the int, but again that is not required. Portable C programming requires that one not use such over-long character literals. Note that long hex escapes are intended for non-portable usage, primarily in multi-byte character set environments, although they are useful on unusual architectures having chars > 8 bits.
ray@micomvax.UUCP (Ray Dunn) (01/19/88)
In article <7021@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: > >Note that long hex escapes are intended for non-portable usage, >primarily in multi-byte character set environments, although they >are useful on unusual architectures having chars > 8 bits. > I was going to try to make this reponse witty, but it's too late in the day.. This Doug, seems to me either idiocy or arrogance, someone please tell me which, or is the above statement included in the Semantics section of the description of hex escapes in string constants, so I can ensure I'm using them as the committee intended. The unfortunate thing is, string constants are not just used for messages etc, but as arrays of 8-bit data. These often contain hex constants. These hex constants are often followed by hex digits. So, not only do we have a major existing code breaking problem, but also another example of the verbosity being added to C (having to concatenate strings to avoid the hex problem). We can write "ABC\x12H... but must remember to write "ABC\x12""F... Great! What is the expression about being committee'd to death? Ray Dunn. ..philabs!micomvax!ray
rbutterworth@watmath.waterloo.edu (Ray Butterworth) (02/02/88)
I too think that allowing arbitrarily long hex strings is not a good idea, though not for the same reason. Something that I've always felt should be in the standard printf functions is allowing the "#" qualifier for %c and %s formats. "%#c" would print the character as an appropriate escape sequence if it wasn't a printing character. (hex vs. octal would be implementation dependent) e.g. \n, \f, \001, \0xff. This would make error messages a lot easier to read. Typically they now say either "Illegal character '%c'\n", or "Illegal character %#o\n". The first is very ugly and unreadable for non-printing characters, the second is unreadable for printing characters if you haven't memorized the ASCII codes for '*', '}', etc. Using "%#c" would give the correct version every time. Similarly, "%#s" would perform this same expansion on strings. printf("%#s\n", "one\ntwo"); would produce a single 8 character line of output: "one\ntwo". If ANSI allows arbitrarily long hex sequences, this scheme cannot be implemented, since there is no way to indicate the end of a hex sequence, and so "%#s" output could be ambiguous. The previously mentioned suggestion of a null-escape, e.g. "\z", that did nothing, would solve the problem.