arndt@zyx.ZYX.SE (Arndt Jonasson) (02/19/91)
We all have seen and, in various degrees, been irritated by texts such as: 1 files were copied when it should have been: 1 file was copied I'd like to know: when programming, how do you avoid such errors? Are there features in the programming language you use (or other languages you know) that make plural handling especially easy or difficult? Are there features in your native language that make plural handling especially difficult? All opinions and information are welcome (though I'm more interested in how people actually produce a correct text, rather than their excuses for not doing so). Reply to me by email, and I'll post a summary. [I cross-post this to comp.lang.c, because I think I will reach a larger audience that way.] -- Arndt Jonasson, ZYX AB, Styrmansgatan 6, 114 54 Stockholm, Sweden email address: arndt@zyx.SE or <backbone>!mcsun!sunic!zyx!arndt
net@opal.cs.tu-berlin.de (Oliver Laumann) (02/19/91)
In article <1991Feb19.104810.549@ZYX.SE> arndt@zyx.ZYX.SE (Arndt Jonasson) writes: > We all have seen and, in various degrees, been irritated by texts such as: > > 1 files were copied > > when it should have been: > > 1 file was copied > > I'd like to know: when programming, how do you avoid such errors? Are > there features in the programming language you use (or other languages > you know) that make plural handling especially easy or difficult? The Common Lisp function "format" has a mechanism to automatically pluralize a word by appending an `s' when appropriate. This is no wonder considering that "format" even has formatting requests to print a number with roman numerals or as english words (e.g. "twentyfour files copied")... In C, a common idiom (well, at least in my programs) is printf("%d file%s copied.\n", nfiles, "s"+(nfiles==1)); -- Oliver Laumann net@tub.cs.tu-berlin.de net@tub.UUCP net@pogo.ai.mit.edu
nick@cs.edinburgh.ac.uk (Nick Rothwell) (02/20/91)
I just lifted some SML code from my Make system which does this kind of thing. PrintCount takes a word split into stem, singular and plural. It handles nasty words like "dependencies" (what does Common Lisp come up with: "dependencys"?) fun PrintCount(1, (word, single, _)) = Busy.print("1 " ^ word ^ single) | PrintCount(n, (word, _, plural)) = Busy.print(makestring n ^ " " ^ word ^ plural) ... fun PrintCounts() = (Busy.print "Tag information: "; PrintCount(!DepCount, ("dependenc", "y", "ies")); Busy.print " found involving "; PrintCount(!TagCount, ("tag", "", "s")); Busy.print " in "; PrintCount(!FileCount, ("file", "", "s")); Busy.println "" ) -- Nick Rothwell, Laboratory for Foundations of Computer Science, Edinburgh. nick@lfcs.ed.ac.uk <Atlantic Ocean>!mcsun!ukc!lfcs!nick ~~ ~~ ~~ ~~ Captain Waldorf has analogue filters. You do not. ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ Do not try to imitate them or any of their actions. ~~ ~~ ~~ ~~
jerry@TALOS.UUCP (Jerry Gitomer) (02/20/91)
arndt@zyx.ZYX.SE (Arndt Jonasson) writes:
:We all have seen and, in various degrees, been irritated by texts such
:as:
: 1 files were copied
:when it should have been:
: 1 file was copied
:I'd like to know: when programming, how do you avoid such errors? Are
:there features in the programming language you use (or other languages
:you know) that make plural handling especially easy or difficult? Are
:there features in your native language that make plural handling
:especially difficult?
:All opinions and information are welcome (though I'm more interested
:in how people actually produce a correct text, rather than their
:excuses for not doing so). Reply to me by email, and I'll post a
:summary.
The easiest solution is to avoid the problem through
rewording your messages. For example:
Number of files copied = 1
--
Jerry Gitomer at National Political Resources Inc, Alexandria, VA USA
I am apolitical, have no resources, and speak only for myself.
Ma Bell (703)683-9090 (UUCP: ...{uupsi,vrdxhq}!pbs!npri6!jerry
barmar@think.com (Barry Margolin) (02/20/91)
In article <6464@skye.cs.ed.ac.uk> nick@lfcs.ed.ac.uk writes: >It handles nasty words like "dependencies" (what does Common Lisp come >up with: "dependencys"?) No, it comes up with "dependencies". The ~P construct is replaced by a null string or "s" depending on the value of the argument; the ~*P construct is replaced by "y" or "ies" depending on the value. So, one writes: (format t "You have ~D dependenc~*:P." n-dep) The : modifier causes it to back up the argument list. -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
nick@cs.edinburgh.ac.uk (Nick Rothwell) (02/20/91)
In article <1991Feb20.001242.9592@Think.COM>, barmar@think.com (Barry Margolin) writes: > So, one writes: > > (format t "You have ~D dependenc~*:P." n-dep) Yucko, I think I preferred doing it my way... (he says, being otherwise quite fond of the huge amounts of chrome adorning C's printf() lib call). -- Nick Rothwell, Laboratory for Foundations of Computer Science, Edinburgh. nick@lfcs.ed.ac.uk <Atlantic Ocean>!mcsun!ukc!lfcs!nick ~~ ~~ ~~ ~~ Captain Waldorf has analogue filters. You do not. ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ Do not try to imitate them or any of their actions. ~~ ~~ ~~ ~~
tchrist@convex.COM (Tom Christiansen) (02/26/91)
From the keyboard of browns@iccgcc.decnet.ab.com (Stan Brown): :Several persons emailed to point out one or more of the the obvious errors :in the first one. I would correct it to : printf("%d file%s copied\n", nfiles, nfiles=1?" was":"s were"); printf("%d file%s copied\n", nfiles, nfiles==1?" was":"s were"); Hardly close to correct; operator precedence dictates an implict parethesization of: printf("%d file%s copied\n", nfiles, nfiles=(1?" was":"s were")); The "or" branch of the conditional will therefore never be taken. You surely want `==' where you have `=' right now. --tom -- "UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things." -- Doug Gwyn Tom Christiansen tchrist@convex.com convex!tchrist
rcd@ico.isc.com (Dick Dunn) (02/26/91)
browns@iccgcc.decnet.ab.com (Stan Brown) writes: > > arndt@zyx.ZYX.SE (Arndt Jonasson) writes: [about annoying program output] > >> 1 files were copied > >> when it should have been: > >> 1 file was copied > printf("%d file%s copied\n", nfiles, nfiles=1?" was":"s were"); > This assumes that "0 files were copied" is correct. For that reason, and > because it's less susceptible to bonehead errors like mine, I prefer the > second form, which gives "files copied: 0", "files copied: 1",... Your second form ("files copied: n") is better for another reason: It will prevent much tearing of hair if anyone attempts to translate the messages to another natural language. The ?: singular/plural hacks calling printf are cute for programs which don't travel much, but they don't translate. There's generally no hope for any sort of automated rework, either. -- Dick Dunn rcd@ico.isc.com -or- ico!rcd Boulder, CO (303)449-2870 ...But is it art?
avg@hq.demos.su (Vadim Antonov) (02/27/91)
In <1991Feb26.012135.6029@ico.isc.com> rcd@ico.isc.com (Dick Dunn) writes: >browns@iccgcc.decnet.ab.com (Stan Brown) writes: >> > arndt@zyx.ZYX.SE (Arndt Jonasson) writes: >[about annoying program output] >> >> 1 files were copied >> >> when it should have been: >> >> 1 file was copied >> printf("%d file%s copied\n", nfiles, nfiles=1?" was":"s were"); Think about us, poor Russians, I have to write something like: { char *p = "failov skopirovano"; if( n < 10 || n > 20 ) switch( n % 10 ) { case 1: p = "fail skopirovan"; break; case 2: case 3: case 4: p = "faila skopirovano"; break; } printf("%d %s\n", n, p); } Got the point? I've spent a hell lot of time making bilingual release of Unix (Russian and English) and finally I think it would be better to have a standard function for producing plural forms, something like: printf("%d file%s copied\n", n, plural(n, ":s", "english")); printf("%d fail%s skopirovan%s\n", n, plural(n, "ov::a", "russian"), plural(n, "o::o", "russian")); I have some ideas about how to handle plural forms and multilingual diagnostics on binary installations but it requires a lot of changes in compilers, languages, link editors and kernels. It's a long story but it is possible. Vadim Antonov DEMOS, Moscow, USSR
rjohnson@shell.com (Roy Johnson) (02/27/91)
In article <3404.27c905aa@iccgcc.decnet.ab.com> browns@iccgcc.decnet.ab.com (Stan Brown) writes: >> In article <1991Feb19.104810.549@ZYX.SE>, arndt@zyx.ZYX.SE (Arndt Jonasson) writes: >>> We all have seen and, in various degrees, been irritated by texts such >>> as: >>> 1 files were copied >>> when it should have been: >>> 1 file was copied >>> >>> I'd like to know: when programming, how do you avoid such errors? >In article <3331.27c23984@iccgcc.decnet.ab.com>, >I stuck my coding pencil in my ear: >> printf("%d error%s copied\n", nfiles, nfiles?"s were":" was"); >> or >> printf("files copied: %d\n", nfiles); >Several persons emailed to point out one or more of the the obvious errors >in the first one. I would correct it to printf("%d file%s copied\n", nfiles, nfiles=1?" was":"s were"); -----------------------------------------------------^ you mean == 8^) -- ======= !{sun,psuvax1,bcm,rice,decwrl,cs.utexas.edu}!shell!rjohnson ======= Feel free to correct me, but don't preface your correction with "BZZT!" Roy Johnson, Shell Development Company
dbrooks@osf.org (David Brooks) (02/27/91)
browns@iccgcc.decnet.ab.com (Stan Brown) writes: |> printf("%d error%s copied\n", nfiles, nfiles?"s were":" was"); Here you must assume your program is never going to be altered so as to be usable by a non-English speaker. Otherwise, you've given them the job of not only finding all embedded strings and directly translating them, but also recoding to allow for different grammatical constructions. Of course, if you prepared for internationalization you'd use other techniques anyway. |> printf("files copied: %d\n", nfiles); Another reason to prefer this. -- David Brooks dbrooks@osf.org Systems Engineering, OSF uunet!osf.org!dbrooks "It's not easy, but it is simple."
daw@cbnewsh.att.com (David Wolverton) (02/27/91)
In article <3433.27ca3674@iccgcc.decnet.ab.com>, browns@iccgcc.decnet.ab.com (Stan Brown) writes: > >> We all have seen and, in various degrees, been irritated by texts such > >> as: > >> 1 files were copied > >> when it should have been: > >> 1 file was copied > >> > >> I'd like to know: when programming, how do you avoid such errors? > ... > Which all goes to show: simpler is better. The "files copied: %d" version > has been error free since the beginning--as far as I know. As an addendum to this discussion, you might consider how these sorts of problems occur when a program is internationalized (that is, converted into another language) (that is, a human language other than English). A truly "world class" (pun intended) programmer will consider such issues during development. As an example of the kind of problem that you can get into, consider a printf() like this: printf("Your account balance on %s is %s\n", balance, date); Assuming that you pull the 1st argument out into a #define in a header or a text file somewhere, at least the following other issues come up: 1. How to print the correct currency symbol? 2. How to format values > 999? 3. How to format the date? 4. Whether, in a language other than English, would the message read more sensibly with the second and third printf() arguments reversed? The ANSI C standard library (in a hosted implementation) addresses the first three points [Doug, no need to respond on those points unless I've screwed something up], but not the last. HP's prinf() family contains some extensions for this purpose. I'm not aware of any other vendor that has attacked this issue. Dave Wolverton David.Wolverton@att.com ...!att!honshu!daw
john@mingus.mitre.org (John D. Burger) (02/28/91)
daw@cbnewsh.att.com (David Wolverton) writes: >> [discussion of pluralization deleted] >As an addendum to this discussion, you might consider how these sorts >of problems occur when a program is internationalized (that is, converted >into another language) (that is, a human language other than English). >A truly "world class" (pun intended) programmer will consider such >issues during development. >Assuming that you pull the 1st argument out into a #define in a header >or a text file somewhere, at least the following other issues come up: > 1. How to print the correct currency symbol? > 2. How to format values > 999? > 3. How to format the date? > 4. Whether, in a language other than English, would > the message read more sensibly with the second and > third printf() arguments reversed? Acting as a Lisp gadfly, my response would be "Too bad you don't program in a language with a real object system, in which the solution would be to have dates, currency amounts and so forth each know how to form their own printed representation. One would then have subtypes of DATE such as EUROPEAN-DATE, and subtypes of CURRENCY-AMOUNT such as RUBLE, DUETSCH-MARK and DOLLAR." Such an implementation has other wins as well. For instance, if currency amounts are represented as numbers, it gets very inconvenient to represent $19.99. Of course, one often wants more light-weight representations of such things, in which case the appropriate thing to do is program in an object-oriented manner even if you don't use explicit objects. -- John Burger john@mitre.org "You ever think about .signature files? I mean, do we really need them?" - alt.andy.rooney
bright@nazgul.UUCP (Walter Bright) (02/28/91)
In article <1991Feb19.104810.549@ZYX.SE> arndt@zyx.ZYX.SE (Arndt Jonasson) writes:
/We all have seen and, in various degrees, been irritated by texts such as:
/ 1 files were copied
/when it should have been:
/ 1 file was copied
/I'd like to know: when programming, how do you avoid such errors? Are
I've never discovered any magic way, other than brute force:
printf((n == 1) ? "%d file was copied\n" : "%d files were copied\n",n);
or:
printf("%d file%s copied\n",n,(n == 1) ? " was" : "s were");
karl@ima.isc.com (Karl Heuer) (02/28/91)
In article <1991Feb26.224900.2283@cbnewsh.att.com> daw@cbnewsh.att.com (David Wolverton) writes: >[The one point not addressed by the ANSI C standard library is] > 4. Whether, in a language other than English, would > the message read more sensibly with the second and > third printf() arguments reversed? ANSI didn't address it, but X/Open did. ISC has this feature in the XPG3- conforming release of Interactive Unix. (I wrote the code for this one.) Karl W. Z. Heuer (karl@ima.isc.com or uunet!ima!karl), The Walking Lint
gudeman@cs.arizona.edu (David Gudeman) (02/28/91)
In article <1991Feb27.161925.8675@linus.mitre.org> John D. Burger writes:
]
]Acting as a Lisp gadfly, my response would be "Too bad you don't
]program in a language with a real object system, in which the solution
]would be to have dates, currency amounts and so forth each know how to
]form their own printed representation.
What does this have to do with having a "real object system"? You can
write different printing routines for different data types in almost
any language.
--
David Gudeman
gudeman@cs.arizona.edu
noao!arizona!gudeman
john@ethel.mitre.org (Ralph Marshall 617 271-8784) (03/01/91)
gudeman@cs.arizona.edu (David Gudeman) writes: > [I wrote] >>Acting as a Lisp gadfly, my response would be "Too bad you don't >>program in a language with a real object system, in which the solution >>would be to have dates, currency amounts and so forth each know how to >>form their own printed representation. > >What does this have to do with having a "real object system"? You can >write different printing routines for different data types in almost >any language. Maybe I don't understand you, but how do you do this in C? -- John Burger john@mitre.org "You ever think about .signature files? I mean, do we really need them?" - alt.andy.rooney
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (03/01/91)
In article <6479@skye.cs.ed.ac.uk>, nick@cs.edinburgh.ac.uk (Nick Rothwell) writes: > In article <1991Feb20.001242.9592@Think.COM>, barmar@think.com (Barry Margolin) writes: > > (format t "You have ~D dependenc~*:P." n-dep) > Yucko, I think I preferred doing it my way... (he says, being otherwise > quite fond of the huge amounts of chrome adorning C's printf() lib call). The DEC-10 Prolog library has a command writef(Format, [X1,...,Xn]) which has as one of its features "% <n> j" where <n> is an integer defining the language rules to use. Why "j"? Well, it's the obvious choice for Esperanto... Why is there no Common Lisp format that optionally adds "ob", the Quechua plural suffix? Why is the distinction only between singular and plural -- what happened to the dual number? For English, how about the words borrowed from French that add an "x"? I think that it is unrealistic to expect a formatting sublanguage to handle the complexities of human languages. The best I've been able to come up with is void print_variant(number, stem, singular, dual, plural) int number; char *stem, *singular, *dual, *plural; { printf("%s%s", stem, ( number == 1 ? singular : number == 2 ? dual : plural )); } This can be packed, using TAB to separate <stem> [TAB <plural> [TAB <singular> [TAB <dual>]]] -- default plural: empty "sheep" -- default singular: empty "cow s" -- default dual: same as plural "m en an" void print_variant(number, string) int number; char *string; { char *p, *q; p = strchr(string, '\t'); if (p == NULL) { printf("%s", string); return; } printf("%.*s", p-string, string); string = p+1; p = strchr(string, '\t'); q = p == NULL ? NULL : strchr(p+1, '\t'); /* plural p>TAB singular q>TAB dual or plural p>TAB singular NUL; q==NULL or plural NUL; p==q==NULL */ if (number == 1) { if (p == NULL) return; if (q == NULL) printf("%s", p+1); else printf("%.*s", q-p-1, p+1); } else if (number == 2 && q != NULL) { printf("%s", q+1); } else { if (p == NULL) printf("%s", string); else printf("%.*s", p-string, string); } } So printf("%d", NFiles); print_variant(NFiles, " file\ts were\t was"); printf(" deleted.\n"); One could perhaps add this to something like C, where %<n>j would take n as the number, and the string from the argument list, getting xprintf("%d %*j deleted.\n", Nfiles, NFiles, "file\ts were\t was"); -- The purpose of advertising is to destroy the freedom of the market.
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/02/91)
In article <1991Feb26.161256.14202@hq.demos.su> avg@hq.demos.su (Vadim Antonov) writes: > Got the point? I've spent a hell lot of time making bilingual release > of Unix (Russian and English) and finally I think it would be better > to have a standard function for producing plural forms, something like: > printf("%d file%s copied\n", n, plural(n, ":s", "english")); > printf("%d fail%s skopirovan%s\n", n, plural(n, "ov::a", "russian"), > plural(n, "o::o", "russian")); In my forthcoming error-message library: %1 file%{%1%(e-plural)} copied% %1 fail%{%1%(r1-plural)} skopirovan%{%1%(r2-plural)}% where e-plural is ``%_1%_%=%?s:'' and r{1,2}-plural are something like ``%_%d1%_%-10%_%/10%_%*%-%d1%_%=%?ov:%!%?{5%_%<%?a:}:'' Okay, okay, so the interface still needs quite a bit of work, but this sort of internationalization can certainly be done within configuration files. ---Dan
davis@grenelles.ilog.fr (Harley Davis) (03/05/91)
In article <1991Feb26.224900.2283@cbnewsh.att.com> daw@cbnewsh.att.com (David Wolverton) writes:
[Messages in programs in multiple languages]
I'm not aware of any other vendor that has attacked this issue.
Because of its international market and origins in a (ahem)
non-standard CS language, ILOG, Europe's largest Lisp vendor, has had
to address this issue in Le-Lisp. The solution is simple, but so far
has been effective. Instead of putting constant strings in messages,
a message object is referenced. The message object provides
translations of the object into several languages. At runtime,
several languages can be simultaneously loaded, and a current one
selected. Alternatively, languages need not be loaded into particular
applications; this decision can be made at link or runtime.
eg:
(defun foobar (x)
(assert (and (fixp x) (>= x 0))
(error #M:argument-not-natural x))
...)
(defmessage :argument-not-natural
(english "Argument not a natural")
(french "L'argument n'est pas un entier positif")
(german "Ich nicht haben ein idea how to sprechen this in deutsch"))
More complicated permutations can be made by writing specialized
functions which are conditionalized on the current language.
-- Harley Davis
--
------------------------------------------------------------------------------
nom: Harley Davis ILOG S.A.
net: davis@ilog.fr 2 Avenue Gallie'ni, BP 85
tel: (33 1) 46 63 66 66 94253 Gentilly Cedex, France