TLIMONCE%DREW.BITNET@CUNYVM.CUNY.EDU (03/13/88)
(This discussion is looking for a good pun... like "lawn chairs"? No... I said GOOD pun.) The "short char vs char" problem can't be solved very easily. Why not a "long char". That wouldn't break much code now, would it? Now I'm not demanding that it goes into v1.0 of the standard but maybe we can look at this for the next "congress". For now, if you want to make some progress, try to get one of the biggies (like MS) to add it as an extension. You can tell them that they'll hit on the "multi-nation/multi-language vendor market" with it. Of course, in my programming I don't have a use for it, but if you do, try typedef short LONG_CHAR; or typedef char LONG_CHAR[2]; (Hmmm... I like the former) and then you can implement a lstrcmp() and a lstrcpy() and an assortment of routines like that. Then when you're done, those can be re-used in all your programs. When it get's suggested to ANSI C II (or whatever it'll be called) you'll be there to warn us about implementation difficulties and ideas. And when it gets passed you can do a search-and-replace from "LONG_CHAR" to "long char" "And there was much rejoicing" -- Monty Python Tom Limoncelli | Drew U/Box 1060/Madison NJ 07940 | tlimonce@drew.BITNET Disclaimer: These are my views, not my employer or Drew Univesity --------------------------------------------------------------------------
gwyn@brl-smoke.ARPA (Doug Gwyn ) (03/13/88)
In article <12341@brl-adm.ARPA> TLIMONCE%DREW.BITNET@CUNYVM.CUNY.EDU writes: >The "short char vs char" problem can't be solved very easily. Why not a >"long char". This was basically what the Japanese originally requested. The main drawback is that any code that handles text characters (i.e. most applications!) would have to be changed to use long-chars in order to work in an international environment, and there would have to be long-char versions of the usual string handling functions. The short-char proposal does not suffer from this drawback because a char is already the right size to hold a text unit. Its only problem is that a fair amount of code has been written dependent on the assumption that sizeof(char)==1, although some programmers have been careful not to assume that all along.
kennedy@tolerant.UUCP (Bill Kennedy) (03/15/88)
In article <12341@brl-adm.ARPA> TLIMONCE%DREW.BITNET@CUNYVM.CUNY.EDU writes: >[ pun reference omitted ] > >The "short char vs char" problem can't be solved very easily. Why not a >"long char". That wouldn't break much code now, would it? Now I'm not >demanding that it goes into v1.0 of the standard but maybe we can look at >this for the next "congress". There are already specifications for it, AT&T has one and I think I read something from HP about it as well. It can be solved rather easily and it need not break much code if the code is well written. The same old dragon that breathed up the pointer/int thing just rears its ugly head again for characters. >For now, if you want to make some progress, try to get one of the biggies >(like MS) to add it as an extension. You can tell them that they'll hit >on the "multi-nation/multi-language vendor market" with it. I disagree. I am using long characters for a specific purpose and adding the baggage to domestic computing wouldn't serve any useful purpose. I don't think that you will get a software vendor to weave it in if it costs performance at compile or run time (which they do, both...). The hardware manufacturers will implement it themselves if they want to penetrate farther into the overseas markets. Remember it's not just a world of 7 or 15 bit characters, variations on the Roman alphabet are handled, e.g. Europeans, with the eighth bit (has it's own problems too, not pertinent). I don't think that you will get any momentum at all from software houses but I have first hand knowledge :-) that the computer manufacturers get pretty interested. >Of course, in my programming I don't have a use for it, but if you do, try > >typedef short LONG_CHAR; >or >typedef char LONG_CHAR[2]; >(Hmmm... I like the former) No offense intended but I wholeheartedly agree with "don't have a use..." and I would suggest it reads "haven't had any experience with...". I'm also not scolding you, I work with the things every day and there are some very real traps. If you just make it a typedef you'll get your storage sizes right (for the most part) but you can't manupulate either of your examples very well. I use lchar because it's easier to type then LONG_CHAR. You need a further refinement so that you can look at each byte and the bits within each byte, I use a structure and a union within that. >and then you can implement a lstrcmp() and a lstrcpy() and an assortment >of routines like that. Then when you're done, those can be re-used in all >your programs. You also need routines to convert into and out of strings containing long characters and some way to insulate yourself from cases and while(c) things that make assumptions about character size and content. To qualify the long character structure/union approach, vi, the shell, and I'm sure other programs use the MSbit of a character for their own pruposes. Many Asian terminals set the MSbit of a byte as a flag that another byte is coming with the rest of the character. In some European countries it's quite normal for the MSbit to be set for a special character native to their alphabet but absent from ASCII. So here you see but three uses of the MSbit that are darned near mutually exclusive and require further inspection of the byte stream. >When it get's suggested to ANSI C II (or whatever it'll be called) you'll >be there to warn us about implementation difficulties and ideas. And when >it gets passed you can do a search-and-replace from "LONG_CHAR" to "long >char" I'm not convinced that it belongs in the language specification because it is so implementation specific. In fact I'm not sure that it even needs to exist for hardware destined for a technical audience. Those professionals have learned to read ASCII like some of us did APL :-) When you start to bring in commercial applications where you want to drive down the level of skill required to operate a program, that's where you need the additional capability/overhead. You made a good start and now I have overkilled it for you... These are my opinions and observations, Tolerant is nice enough to let me use their equipment; so don't blame me on them. Bill Kennedy {rutgers,cbosgd,killer}!ssbn!bill or bill@ssbn.WLK.COM
karl@haddock.ISC.COM (Karl Heuer) (03/15/88)
In article <12341@brl-adm.ARPA> TLIMONCE%DREW.BITNET@CUNYVM.CUNY.EDU writes: >[Until "long char" gets added, probably in the 2nd standard, try] >typedef short LONG_CHAR; [and add a bunch of library functions]. Something equivalent is in fact in the current standard; they called it wchar_t (wide character type). I've only just gotten hold of a dpANS recent enough to include this, and haven't finished reading it, but my impression is that only the type, the corresponding constants (L'x' for wchar_t, L"x" for wchar_t[]), and a few utility functions are being added for this standard. The real problem is that "char" has been inappropriately overloaded. We need to distinguish between a text character (wchar_t or long char), a small integer (short short int), and a quantum of memory (byte_t or short char). Ideally, all three of these should have names other than "char", and the type "char" should be deprecated. Unfortunately, there's so much inertia to overcome that this will probably never be fixed in C. Fix it in "D"... Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
jay@splut.UUCP (Jay Maynard) (03/22/88)
From article <7447@brl-smoke.ARPA>, by gwyn@brl-smoke.ARPA (Doug Gwyn ): > [...] Its only problem is that a fair amount of code has been > written dependent on the assumption that sizeof(char)==1, although > some programmers have been careful not to assume that all along. My initial knee-jerk reaction to this was "Hey, wait a minute! sizeof(char) is defined to be 1!" Before I made a fool of myself on the net (wouldn't be the first time...:-), though, I picked up the copy of K&R that my recent desk-cleaning revealed. Sure enough, section 7.2, at the top of page 188 in my copy, says: "A _byte_ is undefined in the language except in terms of the value of sizeof. However, IN ALL EXISTING IMPLEMENTATIONS a byte is the space required to hold a char." (Emphasis added.) I don't know how much existing code this would break (though I'd bet there would be quite a bit of it). It does mean that I, too, will be careful not to make that assumption... -- Jay Maynard, EMT-P, K5ZC...>splut!< | GEnie: JAYMAYNARD CI$: 71036,1603 uucp: {uunet!nuchat,academ!uhnix1,{ihnp4,bellcore,killer}!tness1}!splut!jay Never ascribe to malice that which can adequately be explained by stupidity. The opinions herein are shared by none of my cats, much less anyone else.
flaps@dgp.toronto.edu (Alan J Rosenthal) (03/25/88)
*sigh*... it took me a full year of the start of my C career to decide finally that sizeof(char) really was guaranteed to be 1, due to the constraint that all objects were made up of chars (i.e. a char * can traverse any object), recently formally formalized by ANSI, previously informally formalized by the existence of memcpy() / bcopy() and friends. Why do you need to make sizeof(char) == 2 just to make chars 16 bits? Make chars 16 bits, keep sizeof(char) == 1, also make sizeof(int) == 1 and sizeof(long) == 2, etc. If ANSI requires plain char to be signed in all implementations in which sizeof(char) == sizeof(int), we're all set. ajr -- If you had eternal life, would you be able to say all the integers?
gwyn@brl-smoke.ARPA (Doug Gwyn ) (03/26/88)
In article <8803250401.AA01184@champlain.dgp.toronto.edu> flaps@dgp.toronto.edu (Alan J Rosenthal) writes: >Why do you need to make sizeof(char) == 2 just to make chars 16 bits? >Make chars 16 bits, keep sizeof(char) == 1, ... The idea is that you not only need to handle fat chars, you also have applications that need to handle smaller objects (bytes, or bits). Therefore there would have to be some object type smaller than a char (e.g. a "short char").
hermit@shockeye.UUCP (Mark Buda) (03/26/88)
In article <439@splut.UUCP> jay@splut.UUCP (Jay Maynard) writes: >"A _byte_ is undefined in the language except in terms of the value of >sizeof. However, IN ALL EXISTING IMPLEMENTATIONS a byte is the space >required to hold a char." (Emphasis added.) Therefore, if your compiler says that sizeof(char) != 1, it clearly does not exist. -- Mark Buda Smart UUCP: hermit@chessene.uucp Dumb UUCP: ...{rutgers,ihnp4,cbosgd}!bpa!vu-vlsi!devon!chessene!hermit "One look at you, sir, is proof that anything is possible."
nevin1@ihlpf.ATT.COM (00704a-Liber) (03/30/88)
In article <7546@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >In article <8803250401.AA01184@champlain.dgp.toronto.edu> flaps@dgp.toronto.edu (Alan J Rosenthal) writes: >>Why do you need to make sizeof(char) == 2 just to make chars 16 bits? >>Make chars 16 bits, keep sizeof(char) == 1, ... >The idea is that you not only need to handle fat chars, you also >have applications that need to handle smaller objects (bytes, or >bits). Therefore there would have to be some object type smaller >than a char (e.g. a "short char"). This makes me think that way back when KK&R defined C, they should have called the 'char' type a 'byte' type instead. Because of existing practice (whether it be good or bad, it is common), I feel that the sizeof(char) == 1. 70% of the time that I use char I use it for doing byte-type operations (reading in from a file, etc.). There is a need for having a fundamental type (call it foo) such that sizeof(foo) == 1 can be guaranteed in *ALL* implementations. Due to existing practice, I would like that type to be called char. Just add things like 'long char' to accomodate the people who need them. -- _ __ NEVIN J. LIBER ..!ihnp4!ihlpf!nevin1 (312) 510-6194 ' ) ) "The secret compartment of my ring I fill / / _ , __o ____ with an Underdog super-energy pill." / (_</_\/ <__/ / <_ These are solely MY opinions, not AT&T's, blah blah blah
gwyn@brl-smoke.ARPA (Doug Gwyn ) (03/31/88)
In article <4191@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704a-Liber,N.J.) writes: >There is a need for having a fundamental type (call it foo) such that >sizeof(foo) == 1 can be guaranteed in *ALL* implementations. Due to >existing practice, I would like that type to be called char. Just add >things like 'long char' to accomodate the people who need them. sizeof(bit)==1 can be guaranteed universally. If you mean addressable object, there is no single size universally supported by computer hardware. The problem with preempting "char" for small objects is that most C code thinks that a "char" is big enough to hold a primitive unit of text. This is plainly wrong in some environments unless "char" is made pretty large. (It needs to be 16 bits for Imagen's GASCII, for example.) "char" cannot play both roles at once, and "long char" is contrary to the current use of "char" majority of existing code (as well as requiring a whole slew of lstr*() library functions).
karl@haddock.ISC.COM (Karl Heuer) (03/31/88)
In article <4191@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704a-Liber,N.J.) writes: >There is a need for having a fundamental type (call it foo) such that >sizeof(foo) == 1 can be guaranteed in *ALL* implementations. Due to >existing practice, I would like that type to be called char. Just add >things like 'long char' to accomodate the people who need them. The problem is that there are three distinct types of objects (small integers, allocation quanta, and characters), all of which have traditionally been called "char". We can't keep existing practice on all three, and still have useful programs in large-alphabet environments. The current dpANS still equates the first two, but has created wchar_t for the third. I'm seriously considering adopting a convention that eschews all use of the word "char" (much as some people avoid "int") in favor of a good set of typedefs. (Certainly I'd change this for "D".) Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
nevin1@ihlpf.ATT.COM (00704a-Liber) (03/31/88)
In article <7586@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >The problem with preempting "char" for small objects is that most C >code thinks that a "char" is big enough to hold a primitive unit of >text. This is plainly wrong in some environments unless "char" is >made pretty large. C code *should* think that a "char" is big enough to hold a primitive unit of text. That's because K&R (1st edition) said (section 4, paragraph 4 of the C reference manual): "Objects declared as characters (char) are large enough to store any member of the implementation's character set, ..." Currently (pre-dpANS), if this is not true, then the language being implemented is not K&R C (although, I'll admit, it's probably pretty close :-)). I do agree with you that right now "char" has too many uses and there is no easy way to separate them due to the volume of existing code that uses "char"s in different ways (assuming that I am not mis-paraphrasing you; if I am, I'm sorry). -- _ __ NEVIN J. LIBER ..!ihnp4!ihlpf!nevin1 (312) 510-6194 ' ) ) "The secret compartment of my ring I fill / / _ , __o ____ with an Underdog super-energy pill." / (_</_\/ <__/ / <_ These are solely MY opinions, not AT&T's, blah blah blah
karl@haddock.ISC.COM (Karl Heuer) (04/01/88)
In article <4216@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704a-Liber,N.J.) writes: |In article <7586@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: |>The problem with preempting "char" for small objects is that most C |>code thinks that a "char" is big enough to hold a primitive unit of |>text. This is plainly wrong in some environments unless "char" is |>made pretty large. | |[But K&R says] "Objects declared as characters (char) are large enough to |store any member of the implementation's character set, ..." Ah, but a "primitive unit of text" need not be in "the implementation's character set". In particular, the latter can be an 8-bit superset of ASCII which implements some Natural Language characters with two-byte codes. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
flaps@dgp.toronto.edu (Alan J Rosenthal) (04/05/88)
I, flaps@dgp.toronto.edu (Alan J Rosenthal) wrote: >>Why do you need to make sizeof(char) == 2 just to make chars 16 bits? >>Make chars 16 bits, keep sizeof(char) == 1, make sizeof(int) == 1, ... gwyn@brl.arpa (Doug Gwyn) responded: >The idea is that you not only need to handle fat chars, you also >have applications that need to handle smaller objects (bytes, or >bits). Therefore there would have to be some object type smaller >than a char (e.g. a "short char"). I now respond: First of all, why would you possibly want to access bytes? Bytes are machine-dependent things with no high-level analogue. You certainly might want to access some object which is small enough to use for traversing an arbitrary object. 16-bit chars would still have this property so long as all objects were a multiple of 16 bits long. As for being able to access bits, sizeof(char) would have to be equal to 8 or 16 for that, not just 2. Also, creating smaller objects than chars will cause a lot of other problems, such as requiring either introducing the concept of alignment into the language (e.g. arguments to memcpy must be char-aligned) or making the arguments to routines like memcpy be pointers to this smaller object at the expense that that incurs. ajr -- "Comment, Spock?" "Very bad poetry, Captain."