jtc@motcad.portal.com (J.T. Conklin) (04/08/91)
Are there any freely redistributable tools, perl scripts, etc. for maintaining message catalogs? I'm thinking of something that would automagically create a *.h to be included by a program and *.msg files for each locale from one master text file. Are there any freely redistributable message catalog libraries for old systems which don't have them? If not, are there any that can be licenced? --jtc -- J.T. Conklin jtc@motcad.portal.com, ...!portal!motcad!jtc
nazgul@alphalpha.com (Kee Hinckley) (04/09/91)
In article <1991Apr7.190119.24825@motcad.portal.com> jtc@motcad.portal.com (J.T. Conklin) writes: >Are there any freely redistributable tools, perl scripts, etc. for maintaining >message catalogs? I'm thinking of something that would automagically >create a *.h to be included by a program and *.msg files for each locale >from one master text file. > >Are there any freely redistributable message catalog libraries for old >systems which don't have them? If not, are there any that can be licenced? Here's the README from a package I wrote for our product. We're shipping our product at the end of this month and I plan on cleaning the message catalog stuff up and shipping it out to comp.sources.xxx after that, but if you want a copy now just let me know. This software has been run and tested on Apollo, Sparc, MIPS, SCO, DECStation and other platforms. The implementaion is fast and uses a minimum of memory, yet the ondisk structure is the same as the inmemory one. The gencat code is a bit crufty and could use new error handling routines (not to mention using the msgcatalog itself!), but in general it's pretty clean. ------ First a note on the copyright. This is the same one used by the X Consortium (fortunately no one has started copyrighting copyrights), so if your lawyers don't mind that one, they shouldn't mind this one. Simply put, you can do what you want with this, although if you are so inclined we'd appreciate it if you sent us back any enhancements, bug fixes, or other related material, so we can make it available to everyone else. But if you don't want to, you don't have to. So be it. So. What's here? It's an implementation of the Message Catalog System, as described in the X/Open Portability Guide (XSI Supplementary Definitions, X/Open Company, Ltd, Prentice Hall, Englewood Cliffs, New Jersey 07632, ISBN: 0-13-685850-3). Included is a version of gencat, to generate message catalogs, as well as the routines catgets, catopen, and catclose. There is also the beginings of an X/Open compliant set of print routines, but we'll talk about those later. I haven't done a man page yet (sorry, but I've got a product to get out the door, the pretty stuff has to come later). However you can use the definitions in the X/Open docs and it should all work. I have, however, added a series of pretty significant enhancements, particularly to gencat. As follows: Use: gencat [-new] [-or] [-lang C|C++|ANSIC] catfile msgfile [-h <header-file>]... This version of gencat accepts a number of flags. -new Erase the msg catalog and start a new one. The default behavior is to update the catalog with the specified msgfile(s). This will instead cause the old one to be deleted and a whole new one started. -h <hfile> Output identifiers to the specified header files. This creates a header file with all of the appropriate #define's in it. Without this it would be up to you to ensure that you keep your code in sync with the catalog file. The header file is created from all of the previous msgfiles on the command line, so the order of the command line is important. This means that if you just put it at the end of the command line, all the defines will go in one file gencat foo.m bar.m zap.m -h all.h If you prefer to keep your dependencies down you can specify one after each message file, and each .h file will receive only the identifiers from the previous message file gencat foo.m -h foo.h bar.m -h bar.h zap.m -h zap.h As an added bonus, if you run the following sequence: gencat foo.m -h foo.h gencat foo.m -h foo.h the file foo.h will NOT be modified the second time. gencat checks to see if the contents have changed before modifying things. This means that you won't get spurious rebuilds of your source everytime you change a message. You can thus use a Makefile rule such as: MSGSRC=foo.m bar.m GENFLAGS=-or -lang C GENCAT=gencat NLSLIB=nlslib/OM/C $(NLSLIB): $(MSGSRC) @for i in $?; do cmd="$(GENCAT) $(GENFLAGS) $@ $$i -h `basename $$i .m`.H"; echo $$cmd; $$cmd; done foo.o: foo.h The for-loop isn't too pretty, but it works. For each .m file that has changed we run gencat on it. foo.o depends on the result of that gencat (foo.h) but foo.h won't actually be modified unless we changed the order (or added new members) to foo.m. (I hope this is clear, I'm in a bit of a rush.) -lang <l> This governs the form of the include file. Currently supported is C, C++ and ANSIC. The latter two are identical in output. This argument is position dependent, you can switch the language back and forth inbetween include files if you care to. -or This is a hack, but a real useful one. MessageIds are ints, and it's not likely that you are going to go too high there if you generate them sequentially. catgets takes a msgId and a setId, since you can have multiple sets in a catalog. What -or does is shift the setId up to the high end of a long, and put the msgId in the low half. Assuming you don't go over half a long (usually 2 bytes nowadays) in either your set or msg ids, this will work great. Along with this are generated several macros for extracting ids and putting them back together. You can then easily define a macro for catgets which uses this single number instead of the two. Note that the form of the generated constants is somewhat different here. Take the file aboutMsgs.m $ aboutMsgs.m $ OmegaMail User Agent About Box Messages $ $set 4 #OmAbout $ About Box message and copyrights $ #Message # Welcome to OmegaMail(tm) $ #Copyright # Copyright (c) 1990 by Alphalpha Software, Inc. $ #CreatedBy # Created by: $ #About # About... # A # # $ #FaceBitmaps # /usr/lib/alphalpha/bitmaps/%s Here is the the output from: gencat foo aboutMsgs.m -h foo.h #define OmAboutSet 0x4 #define OmAboutMessage 0x1 #define OmAboutCopyright 0x2 #define OmAboutCreatedBy 0x3 #define OmAboutAbout 0x4 #define OmAboutFaceBitmaps 0x8 and now from: gencat -or foo aboutMsgs.m -h foo.h /* Use these Macros to compose and decompose setId's and msgId's */ #ifndef MCMakeId # define MCMakeId(s,m) (unsigned long)(((unsigned short)s<<(sizeof(short)*8))\ |(unsigned short)m) # define MCSetId(id) (unsigned int) (id >> (sizeof(short) * 8)) # define MCMsgId(id) (unsigned int) ((id << (sizeof(short) * 8))\ >> (sizeof(short) * 8)) #endif #define OmAboutSet 0x4 #define OmAboutMessage 0x40001 #define OmAboutCopyright 0x40002 #define OmAboutCreatedBy 0x40003 #define OmAboutAbout 0x40004 #define OmAboutFaceBitmaps 0x40008 Okay, by now, if you've read the X/Open docs, you'll see I've made a bunch of other extensions to the format of the msg catalog as well. Note that you don't have to use any of these and, with one exception, they are all compatible with the standard format. $set 4 #OmAbout In the standard the third argument is a comment. Here if the comment begins with a # then it is used to generate the setId constant (with the word "Set" appended). This constant is also prepended onto all of the msgId constants for this set. Anything after the first token is treated as a comment. $ #Message As with set, I've modified the comment to indicate an identifier. There are cleaner ways to do this, but I was trying to retain a modicom of compatibility. The identifier after # will be retained and used as the identifier for the next message (unless overridden before we get there). If a message has no previous identifier then no identifier is generated in the include file (I use this quite a bit myself, the first identifier is a Menu item, the next three are accelerator, accelerator-text and mnemonic - I don't need identifiers for them, I just add 1, 2 and 3). # Welcome to OmegaMail(tm) Finally the one incompatible extension. If a line begins with # a msgId number is automatically generated for it by adding one to the previous msgId. This wouldn't have been useful in the standard, since it didn't generate include files, but it's wonderful for this version. It makes it easy to reorder the message file to put things where they belong and not have to worry about renumber anything (although of course you'll have to recompile). That's about all for that. Now, what about the print routines? These are embarassing. They are a first pass. They support only %[dxo] and %[s], although they support *all* of the modifiers on those arguments (I had no idea there were so many!). They also, most importantly, support the position arguments that allow you to reference arguments out of order. There's a terrible hack macro to handle varargs which I wrote because I wasn't sure if it was okay to pass the address of the stack to a subroutine. I've since seen supposedly portable code that in fact does this, so I guess it's okay. If that's the case the code could become a lot simpler. I welcome anyone who would like to fix it up. I just don't know when I'll get the chance; it works, it's just ugly. One last comment. You probably want to know how reliable it is. I've tested the print routines pretty well. I've used the msgcat routines intensely, but I haven't exercised all of the options in the message catalog file (like all of the \ characters) although I have implemented them all. I'm pretty confident that all the basic stuff works, beyond that it's possible that there are bugs. As for portability, I've run it under BSD4.3 (Apollo) and SYSV-hybrid (SCO). (And I never want to see the words "System V with BSD extensions" again in my life.) I don't believe that there are any heavy dependencies on Unix, although using another system would probably require #ifdef's. I apologize for the state of the documentation, the lack of comments, the lack of testing, and all of the other things. This project is subsidiary to my primary goal (Graphical Email for Unix) and I'm afraid I haven't been able to spend the time on it that I would have liked. However I'll happily answer any questions that you may have, and will be glad to serve as a distribution point for future revisions. So if you make any changes or add more X/Open functions, send me a copy and I'll redistribute them. Best of luck! Kee Hinckley September 12, 1990 -- Alfalfa Software, Inc. | Poste: The EMail for Unix nazgul@alfalfa.com | Send Anything... Anywhere 617/646-7703 (voice/fax) | info@alfalfa.com I'm not sure which upsets me more: that people are so unwilling to accept responsibility for their own actions, or that they are so eager to regulate everyone else's.
eliot@chutney.rtp.dg.com (Topher Eliot) (04/10/91)
On more than one occasion I have seen references to tools that accept as input a message catalog that contains symbolic names (rather than numbers) for the messages, and produce as output new message catalogs containing numbers (or perhaps compiled message catalogs) and .h files containing the appropriate #define lines mapping the symbolic names to the numbers. I am here to argue that these are a Bad Thing. At first glance, they seem great. Who wants to keep track of a bunch of small integers, when they can use symbolic identifiers instead? I mean, we figured that out back when the first assembler was written. The problem is the context in which such a tool is being used. One of the main points of symbolic names (especially #defined "constants") is that they allow one to change the numeric value of the "constant" without having to edit all the source files. Thus, for example, one could add a new message to the middle of a message catalog, rebuild everything, and it would all be in sync. Or so it would appear. But what's the point of message catalogs? The point is that you don't just have one, you have lots, in all different languages. Creating new versions of those translated catalogs is NOT just a matter of rebuilding. They have to be sent off to translators, and then reincorporated into the product distribution after being translated. They may arrive at different times (getting something translated into French can probably be done locally; Serbo-Croation is more of a challenge). Depending on how you distribute them to customers, they may or may not arrive in sync with the new executable code. Customers may or may not load all the new message catalogs. On and on. All in all, keeping message catalogs synchronized with programs that use them is a real bitch. The moral of this is that ONE SHOULDN'T DO THINGS THAT REQUIRE MAINTAINING SYNCHRONIZATION BETWEEN THE APPLICATION AND THE MESSAGE CATALOG, like inserting a new message into the middle of an existing message catalog. One should only add new messages to the end of a catalog. If a message is no longer required, it's place should be filled with a zero-length message (or just not be used, depending on whether you are using AT&T or Xopen message facilities). That slot (message number) should not be re-used for a different message. Given these guidelines, the usefulness of the tool I described above is much less than one might initially think. In fact, I argue that such a tool tempts one to break the guidelines, or perhaps I should say makes it easy to break the guidelines without realizing that one has done so. Without the tool, one writes the message number right into the application program, and leaves that value there forever. Which is exactly what one should do. Presumably if one types in the wrong number, this error will be discovered early on in testing. (You do, after all, test each possible message usage, don't you? :-) To reiterate: when one is writing an application, every time one creates a new message, it should be added to the message catalog, a message number should be created for it, that message number should be hard-coded into the application source code, and then it should stay that way until doomsday. You should never WANT automatic numbering of your messages. Some people may point out that using symbolic identifiers for messages allows a reader of the source code to figure out easily what the message is, rather than having to flip back and forth through a message catalog. I would counter that the source code is supposed to have a compiled-in default message anyway, to cover those occasions when the message catalog is for some reason unavailable. Given the default message, a symbolic message identifier doesn't add much. Whew. And so early in the morning, too. Have I made my point clear? Would anyone care to point out flaws in my logic? Does anyone still think that a tool to create a .h file out of a message catalog is useful? -- Topher Eliot Data General DG/UX Internationalization (919) 248-6371 62 T. W. Alexander Dr., Research Triangle Park, NC 27709 eliot@dg-rtp.dg.com {backbone}!mcnc!rti!dg-rtp!eliot Obviously, I speak for myself, not for DG.
worley@compass.com (Dale Worley) (04/10/91)
From: nazgul@alphalpha.com (Kee Hinckley) First a note on the copyright. This is the same one used by the X Consortium (fortunately no one has started copyrighting copyrights), Well, the GPL has a copyright -- you can reuse it but you can't change it: GNU EMACS GENERAL PUBLIC LICENSE (Clarified 11 Feb 1988) Copyright (C) 1985, 1987, 1988 Richard M. Stallman Everyone is permitted to copy and distribute verbatim copies of this license, but changing it is not allowed. You can also use this wording to make the terms for other programs. Dale Dale Worley Compass, Inc. worley@compass.com -- Klein bottle for sale ... inquire within.
composer@chem.bu.edu (Jeff Kellem) (04/11/91)
In article <1991Apr10.145552.7955@uvaarpa.Virginia.EDU> worley@compass.com (Dale Worley) writes: > From: nazgul@alphalpha.com (Kee Hinckley) > > First a note on the copyright. This is the same one used by the > X Consortium (fortunately no one has started copyrighting copyrights), > > Well, the GPL has a copyright -- you can reuse it but you can't change > it: But, the GPL is a license, not a copyright. ;-) -jeff Jeff Kellem Internet: composer@chem.bu.edu
nazgul@alphalpha.com (Kee Hinckley) (04/11/91)
In article <1991Apr10.122642.3991@dg-rtp.dg.com> eliot@dg-rtp.dg.com writes: >Serbo-Croation is more of a challenge). Depending on how you distribute them >to customers, they may or may not arrive in sync with the new executable >code. Customers may or may not load all the new message catalogs. On and on. >All in all, keeping message catalogs synchronized with programs that use >them is a real bitch. ... >Have I made my point clear? Would anyone care to point out flaws in my logic? >Does anyone still think that a tool to create a .h file out of a message >catalog is useful? Absolutely. First of all you're trying to protect me from myself. I'd rather establish conventions to do that than require it in the code. While you could argue that the code readability doesn't change, your mechanism makes it more likely that people will enter the wrong number in the code, and thus get the wrong error message. Furthermore it makes it impossible to reorganize the message catalog. I have some very large catalogs where I like to organize things (like having all of the "File" menu items in one place). If the numbers are hardcoded into my program I can never do this, even when I know it's safe. So the readability of the message catalog gets much worse over time. I'm clearly on the other side of the fence from you on this one. Not only does my gencat create .h files, it also allows me to create a catalog that doesn't specify any numbers at all - it generates them automatically. I agree that there is potential for screw ups here, particularly since people get real annoyed everytime they modify the catalog, get new header files and have to rebuild (that's the reason I make my gencat compare the new and old header files and only update if the actual values/names have changed). I don't think that the danger here, however outweighs the advantages. In addition there is a way to, if not prevent the problem, at least spot it. Simply have a convention, as a user of message catalogs, that messageId #1 is a version number. Every time you make an incompatible change to the catalog, change the version number. Have your application check the version number and complain if it doesn't match. All that said, I don't think there is so much a "flaw" in your arguments, as a matter of preference and tradeoffs. Even if gencat does create header files, you are certainly under no obligation to use them, so it seems to me we can both coexist happily (so long as I never have to use any of your code :-). -- Alfalfa Software, Inc. | Poste: The EMail for Unix nazgul@alfalfa.com | Send Anything... Anywhere 617/646-7703 (voice/fax) | info@alfalfa.com I'm not sure which upsets me more: that people are so unwilling to accept responsibility for their own actions, or that they are so eager to regulate everyone else's.
eliot@chutney.rtp.dg.com (Topher Eliot) (04/12/91)
In article <1991Apr11.084924.1951@alphalpha.com>, nazgul@alphalpha.com (Kee Hinckley) writes: |> Absolutely. First of all you're trying to protect me from myself. No, I'm trying to protect other people who aren't experienced in the pains of distributing message catalogs from the mistakes your* tools might tempt them to make. Obviously anybody with enough experience with internationalization to create new tools for it isn't going to make foolish mistakes. But others may well. *It was you who posted the note about the new tools, right? |> I'd rather establish conventions to do that than require it in the |> code. While you could argue that the code readability doesn't change, |> your mechanism makes it more likely that people will enter the wrong |> number in the code, and thus get the wrong error message. Yes, they may be more likely to get it wrong once, during development, where it will be found during an initial pass of testing. But then it will be gotten right, and it will then stay right. Your system makes it easier for the numbers to become wrong later on, in ways that are harder to detect and keep track of. |> Furthermore |> it makes it impossible to reorganize the message catalog. Nothing is impossible. My approach makes it appropriately difficult. You SHOULD be reluctant to reorganize your catalog. |> I have |> some very large catalogs where I like to organize things (like having |> all of the "File" menu items in one place). I would have thought that message sets would fill this need very nicely. Note that everywhere that I said "add new messages only to the end of the catalog", I really should have said "add new messages only to the ends of sets, and add new sets only to the end of the catalog". ... |> Not only does my gencat create .h files, it also allows me to |> create a catalog that doesn't specify any numbers at all - it generates |> them automatically. So instead of having to type in numbers, you have to type in symbolic names. Why is this any easier? Symbolic names are an advantage when you want to be able to change the underlying value later on. I claim that with message numbers, you shouldn't change those values! These numbers should be CONSTANTS! |> I agree that there is potential for screw ups here, particularly since |> people get real annoyed everytime they modify the catalog, get new |> header files and have to rebuild ... You won't hear any complaints from me about rebuilding. That isn't my point. But your "rebuild the .h file only if necessary" feature is a nice touch. ... |> In addition there is a way to, if not prevent |> the problem, at least spot it. Simply have a convention, as a user |> of message catalogs, that messageId #1 is a version number. Every |> time you make an incompatible change to the catalog, change the version |> number. Have your application check the version number and complain |> if it doesn't match. In other words, protect yourself from yourself. Isn't this what you were complaining about? Why build a tool that makes a certain class of errors more likeley, and then invent a convention to try to head off those errors? Do you really think that the class of programmers that are likely to screw up message catalogs is the same class of programmers that will diligently put this checking code into their applications? I don't. Moreover, such a convention would make it so that if the developer added one message to the end of a catalog, and distributed new executables and new English-language catalogs on January 1st, and the Serbo-Croation catalog didn't get distributed until March 1st, the Serbo-Croations would not get to use new new application and their own language catalog for two months. With my approach, they would only see the one new message in English for those two months. The old Serbo-Croation catalog would serve just fine for all the other messages. |> All that said, I don't think there is so much a "flaw" in your arguments, |> as a matter of preference and tradeoffs. Even if gencat does create |> header files, you are certainly under no obligation to use them, so it |> seems to me we can both coexist happily (so long as I never have to |> use any of your code :-). Ah, but who will maintain the code that you have written this way? I assert that code developed using my approach will be much less of a headache to maintain and support than will be code developed and maintained using your tools. Yes, the first time around, during development, your approach is easier to use. Once that first translated catalog goes out to customers, my approach is much more robust. Meanwhile, erik@srava.sra.co.jp (Erik M. van der Poel) says: |> Kee Hinckley writes: |> > While you could argue that the code readability doesn't change, |> > your mechanism makes it more likely that people will enter the wrong |> > number in the code, and thus get the wrong error message. |> |> Instead of treating the symptoms, we should try to cure the disease. |> Using numbers for the message ids was a bad idea in the first place. |> (Thank goodness XPG3 and AT&T's specs are not International |> Standards.) |> |> Wouldn't it be possible to create a reasonably efficient |> implementation using hashing and caching with symbolic names instead |> of numeric ids? Then we can add/delete/modify messages at will. We |> should leave numbering and counting to the computer. This sounds even better to me. After I posted my note, I heard that Uniforum has advanced a proposal exactly along these lines. Does anyone have any specifics on it? -- Topher Eliot Data General DG/UX Internationalization (919) 248-6371 62 T. W. Alexander Dr., Research Triangle Park, NC 27709 eliot@dg-rtp.dg.com {backbone}!mcnc!rti!dg-rtp!eliot Obviously, I speak for myself, not for DG.
peter@ficc.ferranti.com (Peter da Silva) (04/13/91)
In article <1991Apr10.122642.3991@dg-rtp.dg.com> eliot@dg-rtp.dg.com writes: > Some people may point out that using symbolic identifiers for messages allows > a reader of the source code to figure out easily what the message is, rather > than having to flip back and forth through a message catalog. It also lets him more easily verify that the right message is being generated at each point. > Have I made my point clear? Would anyone care to point out flaws in my logic? > Does anyone still think that a tool to create a .h file out of a message > catalog is useful? Sure, as long as it's only run on the master message catalog that's kept in the programmer's native language, and that new messages are only added at the end. Translations are done on processed copies with fixed message numbers. It's a tool. Like any, it can be abused. That doesn't mean it's not useful. -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
preece@urbana.mcd.mot.com (Scott E. Preece) (04/14/91)
In article <1991Apr10.122642.3991@dg-rtp.dg.com> eliot@chutney.rtp.dg.com (Topher Eliot) writes: | To reiterate: when one is writing an application, every time one creates a | new message, it should be added to the message catalog, a message number should | be created for it, that message number should be hard-coded into the | application source code, and then it should stay that way until doomsday. | You should never WANT automatic numbering of your messages. |... | Have I made my point clear? Would anyone care to point out flaws in my logic? | Does anyone still think that a tool to create a .h file out of a message | catalog is useful? --- Well, actually, I still think the use of symbolic names makes code reading easier (and code reading is critical to delivered quality). I also think that use of symbolic names is in no way counter to the principle of never changing existing message number assignments; just don't do it. Finally, an automatic tool would be useful for the important case of the first version of a program, even if subsequent versions need to maintained manually (though, actually, I think the synchronization problem would be better addressed in other ways, like keeping the source for the message catalog in a development environment that linked it to the code and generated the new catalog as part of the normal release process; I tend to think that the safest way to keep things synchronized is to always reissue them together (surely you're going to test the whole message catalog when you release a new version, anyway, right? :-)). scott -- scott preece motorola/mcg urbana design center 1101 e. university, urbana, il 61801 uucp: uunet!uiucuxc!udc!preece, arpa: preece@urbana.mcd.mot.com phone: 217-384-8589 fax: 217-384-8550
nazgul@alphalpha.com (Kee Hinckley) (04/15/91)
In article <1991Apr12.122701.9545@dg-rtp.dg.com> eliot@dg-rtp.dg.com writes: >*It was you who posted the note about the new tools, right? I plead guilty :-). >|> all of the "File" menu items in one place). >I would have thought that message sets would fill this need very nicely. Quite right. I hadn't considered using them at that level of granularity. >|> of message catalogs, that messageId #1 is a version number. Every >|> time you make an incompatible change to the catalog, change the version >|> number. Have your application check the version number and complain >|> if it doesn't match. >In other words, protect yourself from yourself. Isn't this what you were >complaining about? Why build a tool that makes a certain class of errors Not really. The difference is that this is a restriction I can institute if I feel it is needed, as opposed to one forced on me by the implementation of the tools. But if this were the only issue I wouldn't mind. >Do you really think that the class of programmers that are likely to screw >up message catalogs is the same class of programmers that will diligently >put this checking code into their applications? I don't. I can't argue with this. >Moreover, such a convention would make it so that if the developer added >one message to the end of a catalog, and distributed new executables and >new English-language catalogs on January 1st, and the Serbo-Croation >catalog didn't get distributed until March 1st, the Serbo-Croations would >not get to use new new application and their own language catalog for two >months. With my approach, they would only see the one new message in English >for those two months. The old Serbo-Croation catalog would serve just fine >for all the other messages. This is an interesting point, because it makes me realize another way that people can misuse symbolic numbers. You see it turns out that they are terifficly convenient to use as identifiers for other things. For instance I use them to identify my menubutton objects. Based on the identifier I execute the appropriate action. If I had to do a string compare I wouldn't do it that way. But furthermore, I use the identifier (plus a fixed count) to find the accelerator, mnemonic and string-form of the accelerator in the catalog. This leads to an unfortunate side effect, namely, there are no fallback strings for those values. Without the catalog the program is usable, but not full-function. So yes, you are right; numeric identifiers can be abused. I won't know whether the tradeoffs I'm making are worth it until we start shipping lots of different language versions - but it's definitely something to think about. >Ah, but who will maintain the code that you have written this way? I assert >that code developed using my approach will be much less of a headache to >maintain and support than will be code developed and maintained using your I'm not sure, I still think it's too easy to get burned by the runtime typing. How do you verify that all of your strings in fact correspond to message catalog symbols? That issue and the speed/memory issues are my major concerns. -- Alfalfa Software, Inc. | Poste: The EMail for Unix nazgul@alfalfa.com | Send Anything... Anywhere 617/646-7703 (voice/fax) | info@alfalfa.com I'm not sure which upsets me more: that people are so unwilling to accept responsibility for their own actions, or that they are so eager to regulate everyone else's.
eliot@chutney.rtp.dg.com (Topher Eliot) (04/16/91)
In article <PREECE.91Apr13223807@etude.urbana.mcd.mot.com>, preece@urbana.mcd.mot.com (Scott E. Preece) writes: |> ... Finally, an automatic tool would be useful for the |> important case of the first version of a program, even if subsequent |> versions need to maintained manually (though, actually, I think the |> synchronization problem would be better addressed in other ways, like |> keeping the source for the message catalog in a development environment |> that linked it to the code and generated the new catalog as part of the |> normal release process; I tend to think that the safest way to keep |> things synchronized is to always reissue them together Sure, in an ideal world I would like to have one huge build process that starts with my source archives for everything, and burps out a tape at the far end that I can ship to any customer, anywhere in the world, and have it work correctly, no matter what earlier versions of software they have on their machine. I have yet to see any company that actually implements such a system, and I've worked at some of the largest computer manufacturers in the world. The realities of getting things translated into other languages are horrendous. Suppose you have a new executable and new catalog that you have built using one of these tools that generates message numbers automatically. You HOPE that the numbers on existing messages haven't been accidentally changed, but unless you build another tool to verify that, you don't know. You're all set to ship, complete with translated catalogs in seven of the eight languages you support, when civil war breaks out in upper Lithuania, and your Lithuanian translator patriotically returns to defend the motherland :-). You're left one language short in your catalogs. What are your choices? 1) Hold all your shipments until you have a complete set of message catalogs; 2) Ship to all customers except those that use Lithuanian, and ship those later (glad I'm not in charge of THAT operation :-); 3) Ship to everyone and hope that the old Lithuanian catalog will work ok with the new executable; 4) Heave a sigh, and say "boy, dealing with crufty old numeric message identifiers sure has been tacky all this time, but now we can ship this tape even to the Lithuanians, and still sleep tonight knowing that it's very unlikely that we've screwed up the numbers of any old messages. Good thing we followed Topher Eliot's advice and didn't use tools that automatically renumber them. Let's hire him as a high-priced consultant" :-) I grant that you actually suggested only using the tool during the initial development, and then switching to manual numbering. However, I haven't seen the tools being promoted that way, and I don't know of any tool that will generate a .c file that is a hard-numbered version of your source, for use after the first release. They all assume you will continue to use the tool during each build. |> (surely you're |> going to test the whole message catalog when you release a new version, |> anyway, right? :-)). In Norwegian? And Portuguese? and, and? Surely you're kidding. I mean, think about it -- to do such testing, you need to know both the program being tested, and the language. I don't know anybody who will be able to do a really good job of testing their application in all the languages for which message catalogs will be provided (unless, of course, "all" is just English :-). Doing it just once is a big task; doing it all over again every time you release an update would be outrageously expensive. You may say "well, have a developer and a translator sit side by side and run through the test suite". What if the translator is an ocean away? You may say "well, everyone really should have automated regression tests anyway", but those automated tests will need catalogs listing the expected output from the program, and we're right back where we started, i.e. trying to keep message catalogs in sync with the executable. So I'll say it again: making sure that an application program and all the different language catalogs available for it correspond correctly is a very error-prone process, particularly when dealing with updating things in the field. We should do everything we can to make sure they don't get out of sync, including gritting our teeth and using numbers instead of those oh-so- nice automatic numbering tools. Something just occurred to me: how about if the automatic numbering tool knew enough about the source archiving system (SCCS, RCS, or whatever) so that it could compare the latest version of the catalog against all previous versions, to make sure that no incompatibilities were being introduced? This might require some special flagging of messages to indicate that you really did intend to change them, but the tool could catch egregious errors, such as bumping all the message numbers by one. This would go a long way towards keeping everyone happy, wouldn't it? -- Topher Eliot Data General DG/UX Internationalization (919) 248-6371 62 T. W. Alexander Dr., Research Triangle Park, NC 27709 eliot@dg-rtp.dg.com {backbone}!mcnc!rti!dg-rtp!eliot Obviously, I speak for myself, not for DG.
peter@ficc.ferranti.com (Peter da Silva) (04/16/91)
In article <1991Apr12.122701.9545@dg-rtp.dg.com> eliot@dg-rtp.dg.com writes: > Why is this any easier? Symbolic names are an advantage when you want to be > able to change the underlying value later on. I claim that with message > numbers, you shouldn't change those values! These numbers should be CONSTANTS! Like these constants? #define PI 3.141592653589 /* values from memory... apologies if */ #define E 2.171828182845 /* they're incorrect */ ... It's pointless making them symbolics, because they're not going to change. Quick, what's the numeric value for ENOMEM? SIGPWR? TIOCSETC? Symbolic names are an advantage to the person writing and debugging the program, because they reduce the number of meaningless magic numbers they need to track. -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
hansm@cs.kun.nl (Hans Mulder) (04/16/91)
In <1991Apr15.170901.18836@dg-rtp.dg.com> eliot@chutney.rtp.dg.com (Topher Eliot) writes: >Sure, in an ideal world I would like to have one huge build process that >starts with my source archives for everything, and burps out a tape at the >far end that I can ship to any customer, anywhere in the world, and have it >work correctly, no matter what earlier versions of software they have on their >machine. I have yet to see any company that actually implements such >a system, and I've worked at some of the largest computer manufacturers in >the world. The realities of getting things translated into other languages >are horrendous. You don't really expect a company to delay the release of their latest product until they are able to bundle with it free message catalogs in a dozen major languages, do you? In the real world the product is shipped as soon as the English version of the product is ready. Simultaneously the message catalog (but not the executable) is sent to the translator. Non-English versions of the product hit the market 6 months later than the English version, at the earliest. And they are sold as separate products. And combining the correct version of the executable with a translated message catalog is really trivial, compared to the problem of getting the catalog translated in the first place. -- Hans Mulder hansm@cs.kun.nl
rschwartz@OFFICE.WANG.COM (R. Schwartz@Wang R&D Net) (04/16/91)
eliot@chutney.rtp.dg.com (Topher Eliot) writes: > (much omitted) > > The moral of this is that ONE SHOULDN'T DO THINGS THAT REQUIRE MAINTAINING > SYNCHRONIZATION BETWEEN THE APPLICATION AND THE MESSAGE CATALOG, like > inserting a new message into the middle of an existing message catalog. > > (more omitted) > > You should never WANT automatic numbering of your messages. > > (still more omitted) > > Have I made my point clear? Would anyone care to point out flaws in my logic > Does anyone still think that a tool to create a .h file out of a message > catalog is useful? YES!!! Your point is clear. YES!!! I absoultely insist that generating .h files is required. The flaws are not in your logic. The flaws are in your assumptions about the tools that should be used to synchronize code and messages when they reside in separate files. I.e., you presume that there are no such tools, and I grant that it is normal for there to be none. The dangers that you point out are completely valid, and your point that these dangers are exacerbated by the logistics involved in sending materials hither and yon for translation is well taken. But the solution isn't to make a bad software engineering decision. Invent the right tools instead! The use of mnemonic names in message catalogs is an absolute necessity in any application other than trivial toys. Most of the benefits are too obvious to mention. One that bears special attention is the ability to re-organize multiple catalogs without re-numbering. If the run-time organization of code changes from one release to the next, it may make perfectly good sense to divide or merge message catalogs, or to re-locate individual messages. Mnemonic labels can minimize the code impact of such changes. I might even suggest going to enough lengths to remove the code impact completely by adding a level of indirection so that code is unaware which message catalog a given message comes from. Another point that strongly supports the use of such a tool is that it helps translators to identify their mistakes. Comparison of the .h file generated with the translated catalog against the version from the release is a sure way to detect inadvertantly deleted messages and a host of other errors. I haven't met a translator who wouldn't love to have a way to check for such editing errors. Something to help us developers, too: tracking down obsolete messages is a snap if you use a cross-referencer to find unused #defines in your generated .h files. Maybe it's really obsolete and should be gotten rid of since translating obsolete messages to a dozen or so languages can cost big bucks, pounds, marks, yen, etc. Maybe you added an error message to the catalog you knew you'd need it, but you forgot to code that else clause! Am I reaching? Am I stretching my logic to make a point? Yup! But does anyone still think that a tool to create a .h file out of a message catalog is useless? :-) erik@srava.sra.co.jp (Erik M. van der Poel) writes: > Using numbers for the message ids was a bad idea in the first place. > (Thank goodness XPG3 and AT&T's specs are not International > Standards.) Once compiled into an executable, no one need care what the representation of a message id is. Nobody says that the the #define in the generated header ultimately has to resolve to an integer. It merely has to resolve to whatever the functional interface requires, and if that changes you just change the .h generation tool. Information hiding strikes again! > Wouldn't it be possible to create a reasonably efficient > implementation using hashing and caching with symbolic names instead > of numeric ids? Then we can add/delete/modify messages at will. We > should leave numbering and counting to the computer. Yes it is possible, but why bother? The organization of the run-time store of messages can be changed for efficiency without any impact on the functional interface. As an example, I have implemented a (non-unix based) system that compiles the (equivalent of) the message catalog into assembler code for a function that retrieves the messages from (again the equivalent of) the text segment of a shared runtime archive. The performance is frighteningly good, and I don't do any fancy indexing or hashing. I could add it, but for a large-scale multi-user application the big bang for the buck was in reducing paging by using non-modifiable shared memory instead of data space. Yes, it just uses integer ids, and yes, it generates the headers. nazgul@alphalpha.com (Kee Hinckley) writes: > In addition there is a way to, if not prevent > the problem, at least spot it. Simply have a convention, as a user > of message catalogs, that messageId #1 is a version number. Every > time you make an incompatible change to the catalog, change the version > number. Have your application check the version number and complain > if it doesn't match. More than that, have it check for the last and one-past-the-last message to verify that the catalog has exactly the right number of entries. Don't tolerate any errors in the message configuration -- they're just as critical as errors in configuration of executables. Just don't take a checksum! :-) If you want real safety, make the versioning mechanism automatic. Have your make file bump it after any change that affected the .h file, and drop the new version number into both the message cat and the .h. Have your code do its version check comparing the run-time version against a symbolic constant from the very same include file! A re-compile of the code that includes the .h is forced anyhow, so the code is always in step with the message catalog version. Now, provide a modified version of the make file for your translators that does the same checking but instead of triggering a bump in version and re-compile (you don't give them source anyhow) it simply triggers an error. A final comment: The main reason that I am concerned about this is that internationalization of code must not violate developers' sense of what is right. The only people I have run into who are more fanatic than non-English speakers who (rightly) flame against non-translatable code, are developers who (rightly) flame against un-readable code. There is finally real recognition of the need for designing internationalization in applications from Day One, and this has been a hard-fought victory. Let's not make the software so ugly that everyone will go back to the old attitude of "we'll worry about international in release 2". rich schwartz (All views expressed are my own, and not Wang Labs, Inc.'s.). rschwartz@office.wang.com VOICE (508) 967 5027 FAX (508) 967 0947m. Wang Labs, Inc., M/S 019-58A, 1 Industrial Ave., Lowell, MA 01851
nazgul@alphalpha.com (Kee Hinckley) (04/17/91)
In article <1991Apr15.170901.18836@dg-rtp.dg.com> eliot@dg-rtp.dg.com writes: >are horrendous. Suppose you have a new executable and new catalog that you >have built using one of these tools that generates message numbers >automatically. You HOPE that the numbers on existing messages haven't been >accidentally changed, but unless you build another tool to verify that, you >don't know. You're all set to ship, complete with translated catalogs in How about if I add a feature to the gencat which checks to see if the message numbers have change incompabibly? And if so it issues a warning? I think it's doable, and I've pretty much been convinced it's useful. >Something just occurred to me: how about if the automatic numbering tool >knew enough about the source archiving system (SCCS, RCS, or whatever) so >that it could compare the latest version of the catalog against all previous >versions, to make sure that no incompatibilities were being introduced? This This'll get me for replying before reading everything. Anyway, you could do it there, but I think doing it with the message catalog iself would be sufficient (and certainly easier). -- Alfalfa Software, Inc. | Poste: The EMail for Unix nazgul@alfalfa.com | Send Anything... Anywhere 617/646-7703 (voice/fax) | info@alfalfa.com I'm not sure which upsets me more: that people are so unwilling to accept responsibility for their own actions, or that they are so eager to regulate everyone else's.
ch@dce.ie (Charles Bryant) (04/19/91)
In article <1991Apr10.122642.3991@dg-rtp.dg.com> eliot@dg-rtp.dg.com
advocates the use of integer constants to identify messages in a message
catalogue. Many others argue that symbolic names are better.
I have never used such a thing, but I assume the choice is between
MSG_NUMTOOBIG "Number too big"
which is processed into:
#define MSG_NUMTOOBIG 1
which then gets used in the program instead of the string, and
1 "Number too big"
and then `1' gets used in the source.
I hope everyone would agree that:
a) symbolic names allow messages to get out of sync with the
program (e.g. swap two lines in the message file, or add a new
one in the wrong place)
b) it is easy to forget or get confused over which number
corresponds to each message
Why not get the benefits of both? Have the input be:
1 MSG_NUMTOOBIG "Number too big"
which produces the #defines as before. The programmer can now use a
meaningful symbolic name, and cannot renumber without making the same
change to the message file as would be necessary if no symbolic name is
used.
--
Charles Bryant (ch@dce.ie)
--
If you like the opinions expressed in this message, they may be available
for rent - contact your local sales office. Low interest deals available.
eliot@chutney.rtp.dg.com (Topher Eliot) (04/19/91)
In article <1991Apr17.053943.6263@alphalpha.com>, nazgul@alphalpha.com (Kee Hinckley) writes: |> How about if I add a feature to the gencat which checks to see if the |> message numbers have change incompabibly? And if so it issues a warning? |> I think it's doable, and I've pretty much been convinced it's useful. |> |> >Something just occurred to me: how about if the automatic numbering tool |> >knew enough about the source archiving system (SCCS, RCS, or whatever) so |> >that it could compare the latest version of the catalog against all previous |> >versions, to make sure that no incompatibilities were being introduced? This |> This'll get me for replying before reading everything. Anyway, you could |> do it there, but I think doing it with the message catalog iself would |> be sufficient (and certainly easier). Easier to implement, absolutely. Sufficient? I guess it depends on how you handle your builds, etc. Such a feature would definitely be good, but one could still lose track of an incompatible change if one were careless in the development process. Someone sent me some mail with a suggestion that I thought was good. I was waiting to see it posted, but I'll go ahead and do it. The suggestion was that the input catalog should look like: $set 1 BASEMSGS 1 ERRMSG "Error in application foo:" 2 WARNMSG "Warning:" and so on. From this one could generate a .h file, and allow the .c file to use the symbolic values (BASEMSGS, ERRMSG, WARNMSG, etc), and yet still avoid the danger of accidental renumbering. Of course if you WANT automatic renumbering then this approach isn't for you. But with this approach manual renumbering is much easier than with my original "NO SYMBOLIC IDENTIFIERS" gospel. Renumbering would be a 5-minute editing job, all in one file. This idea seems pretty straightforward to implement, and more robust than the approach of trying to automatically detect incompatible changes and issue warning messages. So, far, of all the ideas I've seen, I like this one best. Does anyone see anything wrong with it? Various people have said things here and in private mail that, in my mind, essentially pooh-pooh the difficulties of keeping executables and translated message catalogs in sync on customers' machines. Well, what can I say. _I_ think it's a hard problem. At the other end of the spectrum, some people have suggested very strong mechanisms to rigidly enforce such coordination -- if you don't have the right message catalog, you can't run. This is certainly the safest approach to solving this particular set of problems, but my feeling is that it would be overruled the first time a fatal bug was found in an application, which had to be fixed immediately, but translated message catalogs weren't available yet. I try to use policies that are flexible enough to bend when they need to, yet will still do you some good when bent. The absolute-synchronization rule is too all-or-nothing for my taste. -- Topher Eliot Data General DG/UX Internationalization (919) 248-6371 62 T. W. Alexander Dr., Research Triangle Park, NC 27709 eliot@dg-rtp.dg.com {backbone}!mcnc!rti!dg-rtp!eliot Obviously, I speak for myself, not for DG.
nazgul@alphalpha.com (Kee Hinckley) (04/21/91)
In article <1991Apr19.130632.17861@dg-rtp.dg.com> eliot@dg-rtp.dg.com writes: >Someone sent me some mail with a suggestion that I thought was good. I was >waiting to see it posted, but I'll go ahead and do it. The suggestion was >that the input catalog should look like: > >$set 1 BASEMSGS >1 ERRMSG "Error in application foo:" >2 WARNMSG "Warning:" Unless I misunderstand you, this is essentially what I do, except that I tried to remain compatible with the standard. The spec says that anything after "$set n" is a comment, and anything after "$ " is a comment, so I just made comments that begin with "#" special. $set 1 #Foo $ #ErrMsg 1 "Error in application foo:" $ #WarnMsg 2 "Warning" This generates #define FooSet 0x1 #define FooErrMsg 0x1 #define FooWarnMsg 0x2 Unless you use the '-or' option, which is useful if you want to simplify things down to a single set and msgid number. /* Use these Macros to compose and decompose setId's and msgId's */ #ifndef MCMakeId # define MCMakeId(s,m) (unsigned long)(((unsigned short)s<<(sizeof(short)*8))\ |(unsigned short)m) # define MCSetId(id) (unsigned int) (id >> (sizeof(short) * 8)) # define MCMsgId(id) (unsigned int) ((id << (sizeof(short) * 8))\ >> (sizeof(short) * 8)) #endif #define FooSet 0x1 #define FooErrMsg 0x10001 #define FooWarnMsg 0x10002 >and so on. From this one could generate a .h file, and allow the .c file to >use the symbolic values (BASEMSGS, ERRMSG, WARNMSG, etc), and yet still avoid >the danger of accidental renumbering. Of course if you WANT automatic >renumbering then this approach isn't for you. But with this approach manual Right. The automatic number I simply do by replacing the initial number with '#', which is the main thing I believe you disagree with. But it's optional. -- Alfalfa Software, Inc. | Poste: The EMail for Unix nazgul@alfalfa.com | Send Anything... Anywhere 617/646-7703 (voice/fax) | info@alfalfa.com I'm not sure which upsets me more: that people are so unwilling to accept responsibility for their own actions, or that they are so eager to regulate everyone else's.
nazgul@alphalpha.com (Kee Hinckley) (04/21/91)
In article <1991Apr19.130632.17861@dg-rtp.dg.com> eliot@dg-rtp.dg.com writes: >$set 1 BASEMSGS >1 ERRMSG "Error in application foo:" >2 WARNMSG "Warning:" I should note, I prefer this syntax to mine, I just wanted to do something that could be run through strictly conforming gencats as well. -- Alfalfa Software, Inc. | Poste: The EMail for Unix nazgul@alfalfa.com | Send Anything... Anywhere 617/646-7703 (voice/fax) | info@alfalfa.com I'm not sure which upsets me more: that people are so unwilling to accept responsibility for their own actions, or that they are so eager to regulate everyone else's.
peter@ficc.ferranti.com (Peter da Silva) (04/23/91)
In article <1991Apr19.103905.486@dce.ie> ch@dce.ie (Charles Bryant) writes: > a) symbolic names allow messages to get out of sync with the > program (e.g. swap two lines in the message file, or add a new > one in the wrong place) Well, actually, they let newer message files get out of sync with older ones. But you need to check this anyway, to handle accidental deletions or improper changes in messages... as well as improper use of messages in the source (as, for example, the classic case where a constant with the initial value of "10" was used as a numeric base in base conversions, and when the constant (which had nothing to do with base conversion) changed everything went higgledy-piggledy). > b) it is easy to forget or get confused over which number > corresponds to each message Why should you have to know? > Why not get the benefits of both? Have the input be: > 1 MSG_NUMTOOBIG "Number too big" Sounds good. You have to watch out for stuff like: 15698 MSG_EMACS "Editor too big" ... 15968 MSG_SWAPPER "Out of swap space in message file" -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
eliot@chutney.rtp.dg.com (Topher Eliot) (04/26/91)
In article <1991Apr21.043742.28994@alphalpha.com>, nazgul@alphalpha.com (Kee Hinckley) writes: |> In article <1991Apr19.130632.17861@dg-rtp.dg.com> eliot@dg-rtp.dg.com writes: |> >Someone sent me some mail with a suggestion that I thought was good. I was |> >waiting to see it posted, but I'll go ahead and do it. The suggestion was |> >that the input catalog should look like: |> > |> >$set 1 BASEMSGS |> >1 ERRMSG "Error in application foo:" |> >2 WARNMSG "Warning:" |> |> Unless I misunderstand you, this is essentially what I do, except that I |> tried to remain compatible with the standard. The spec says that anything |> after "$set n" is a comment, and anything after "$ " is a comment, so |> I just made comments that begin with "#" special. |> |> $set 1 #Foo |> $ #ErrMsg |> 1 "Error in application foo:" |> $ #WarnMsg |> 2 "Warning" I have to admit jumping to an unwarranted conclusion. I'm not sure if I mis- read your original posting, or what, but I saw "automatic numbering" somewhere, and immediately what came to mind was a different implementation, which did not offer what you describe here (i.e. in that implementation, the original source message files could not contain both a number and a symbol). So I was really protesting that earlier design, not yours. Sorry. |> Right. The automatic number I simply do by replacing the initial number |> with '#', which is the main thing I believe you disagree with. But it's |> optional. I guess it would be fair to say you could use your design in the way that I think is good, and in a way that I think is dangerous. -- Topher Eliot Data General DG/UX Internationalization (919) 248-6371 62 T. W. Alexander Dr., Research Triangle Park, NC 27709 eliot@dg-rtp.dg.com {backbone}!mcnc!rti!dg-rtp!eliot Obviously, I speak for myself, not for DG.