[comp.lang.misc] Printing plural forms.

arndt@zyx.ZYX.SE (Arndt Jonasson) (02/19/91)

We all have seen and, in various degrees, been irritated by texts such
as:

	1 files were copied

when it should have been:

	1 file was copied

I'd like to know: when programming, how do you avoid such errors? Are
there features in the programming language you use (or other languages
you know) that make plural handling especially easy or difficult? Are
there features in your native language that make plural handling
especially difficult?

All opinions and information are welcome (though I'm more interested
in how people actually produce a correct text, rather than their
excuses for not doing so). Reply to me by email, and I'll post a
summary.

[I cross-post this to comp.lang.c, because I think I will reach a
larger audience that way.]

-- 
Arndt Jonasson, ZYX AB, Styrmansgatan 6, 114 54 Stockholm, Sweden
email address:   arndt@zyx.SE   or      <backbone>!mcsun!sunic!zyx!arndt

net@opal.cs.tu-berlin.de (Oliver Laumann) (02/19/91)

In article <1991Feb19.104810.549@ZYX.SE> arndt@zyx.ZYX.SE (Arndt Jonasson) writes:
> We all have seen and, in various degrees, been irritated by texts such as:
> 
> 	1 files were copied
> 
> when it should have been:
> 
> 	1 file was copied
> 
> I'd like to know: when programming, how do you avoid such errors? Are
> there features in the programming language you use (or other languages
> you know) that make plural handling especially easy or difficult?

The Common Lisp function "format" has a mechanism to automatically
pluralize a word by appending an `s' when appropriate.  This is no
wonder considering that "format" even has formatting requests to
print a number with roman numerals or as english words (e.g.
"twentyfour files copied")...

In C, a common idiom (well, at least in my programs) is

   printf("%d file%s copied.\n", nfiles, "s"+(nfiles==1));

--
Oliver Laumann    net@tub.cs.tu-berlin.de  net@tub.UUCP  net@pogo.ai.mit.edu

nick@cs.edinburgh.ac.uk (Nick Rothwell) (02/20/91)

I just lifted some SML code from my Make system which does this kind
of thing. PrintCount takes a word split into stem, singular and plural.
It handles nasty words like "dependencies" (what does Common Lisp come
up with: "dependencys"?)

            fun PrintCount(1, (word, single, _)) =
                   Busy.print("1 " ^ word ^ single)   |
                PrintCount(n, (word, _, plural)) =
                   Busy.print(makestring n ^ " " ^ word ^ plural)
...
            fun PrintCounts() =
               (Busy.print "Tag information: ";
                PrintCount(!DepCount, ("dependenc", "y", "ies"));
                Busy.print " found involving ";
                PrintCount(!TagCount, ("tag", "", "s"));
                Busy.print " in ";
                PrintCount(!FileCount, ("file", "", "s"));
                Busy.println ""
               )

-- 
Nick Rothwell,	Laboratory for Foundations of Computer Science, Edinburgh.
		nick@lfcs.ed.ac.uk    <Atlantic Ocean>!mcsun!ukc!lfcs!nick
~~ ~~ ~~ ~~  Captain Waldorf has analogue filters. You do not.  ~~ ~~ ~~ ~~
~~ ~~ ~~ ~~ Do not try to imitate them or any of their actions. ~~ ~~ ~~ ~~

jerry@TALOS.UUCP (Jerry Gitomer) (02/20/91)

arndt@zyx.ZYX.SE (Arndt Jonasson) writes:


:We all have seen and, in various degrees, been irritated by texts such
:as:

:	1 files were copied

:when it should have been:

:	1 file was copied

:I'd like to know: when programming, how do you avoid such errors? Are
:there features in the programming language you use (or other languages
:you know) that make plural handling especially easy or difficult? Are
:there features in your native language that make plural handling
:especially difficult?

:All opinions and information are welcome (though I'm more interested
:in how people actually produce a correct text, rather than their
:excuses for not doing so). Reply to me by email, and I'll post a
:summary.

The easiest solution is to avoid the problem through
rewording your messages.  For example:

	Number of files copied = 1


-- 
Jerry Gitomer at National Political Resources Inc, Alexandria, VA USA
I am apolitical, have no resources, and speak only for myself.
Ma Bell (703)683-9090      (UUCP:  ...{uupsi,vrdxhq}!pbs!npri6!jerry 

barmar@think.com (Barry Margolin) (02/20/91)

In article <6464@skye.cs.ed.ac.uk> nick@lfcs.ed.ac.uk writes:
>It handles nasty words like "dependencies" (what does Common Lisp come
>up with: "dependencys"?)

No, it comes up with "dependencies".  The ~P construct is replaced by a
null string or "s" depending on the value of the argument; the ~*P
construct is replaced by "y" or "ies" depending on the value.  So, one
writes:

(format t "You have ~D dependenc~*:P." n-dep)

The : modifier causes it to back up the argument list.
--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

nick@cs.edinburgh.ac.uk (Nick Rothwell) (02/20/91)

In article <1991Feb20.001242.9592@Think.COM>, barmar@think.com (Barry Margolin) writes:
> So, one writes:
> 
> (format t "You have ~D dependenc~*:P." n-dep)

Yucko, I think I preferred doing it my way... (he says, being otherwise
quite fond of the huge amounts of chrome adorning C's printf() lib call).

-- 
Nick Rothwell,	Laboratory for Foundations of Computer Science, Edinburgh.
		nick@lfcs.ed.ac.uk    <Atlantic Ocean>!mcsun!ukc!lfcs!nick
~~ ~~ ~~ ~~  Captain Waldorf has analogue filters. You do not.  ~~ ~~ ~~ ~~
~~ ~~ ~~ ~~ Do not try to imitate them or any of their actions. ~~ ~~ ~~ ~~

tchrist@convex.COM (Tom Christiansen) (02/26/91)

From the keyboard of browns@iccgcc.decnet.ab.com (Stan Brown):
:Several persons emailed to point out one or more of the the obvious errors
:in the first one.  I would correct it to
:       printf("%d file%s copied\n", nfiles, nfiles=1?" was":"s were");
       printf("%d file%s copied\n", nfiles, nfiles==1?" was":"s were");

Hardly close to correct; operator precedence dictates an implict
parethesization of:

       printf("%d file%s copied\n", nfiles, nfiles=(1?" was":"s were"));

The "or" branch of the conditional will therefore never be taken.  
You surely want `==' where you have `=' right now.

--tom
-- 
"UNIX was not designed to stop you from doing stupid things, because
 that would also stop you from doing clever things." -- Doug Gwyn

 Tom Christiansen                tchrist@convex.com      convex!tchrist

rcd@ico.isc.com (Dick Dunn) (02/26/91)

browns@iccgcc.decnet.ab.com (Stan Brown) writes:
> > arndt@zyx.ZYX.SE (Arndt Jonasson) writes:
[about annoying program output]
> >> 	1 files were copied
> >> when it should have been:
> >> 	1 file was copied

>        printf("%d file%s copied\n", nfiles, nfiles=1?" was":"s were");
> This assumes that "0 files were copied" is correct.  For that reason, and
> because it's less susceptible to bonehead errors like mine, I prefer the
> second form, which gives "files copied: 0", "files copied: 1",...

Your second form ("files copied: n") is better for another reason: It will
prevent much tearing of hair if anyone attempts to translate the messages
to another natural language.

The ?: singular/plural hacks calling printf are cute for programs which
don't travel much, but they don't translate.  There's generally no hope for
any sort of automated rework, either.
-- 
Dick Dunn     rcd@ico.isc.com -or- ico!rcd       Boulder, CO   (303)449-2870
   ...But is it art?

avg@hq.demos.su (Vadim Antonov) (02/27/91)

In <1991Feb26.012135.6029@ico.isc.com> rcd@ico.isc.com (Dick Dunn) writes:

>browns@iccgcc.decnet.ab.com (Stan Brown) writes:
>> > arndt@zyx.ZYX.SE (Arndt Jonasson) writes:
>[about annoying program output]
>> >> 	1 files were copied
>> >> when it should have been:
>> >> 	1 file was copied

>>        printf("%d file%s copied\n", nfiles, nfiles=1?" was":"s were");

Think about us, poor Russians, I have to write something like:

	{
		char *p = "failov skopirovano";

		if( n < 10 || n > 20 )
			switch( n % 10 ) {
			    case 1:
				p = "fail skopirovan";
				break;
			    case 2: case 3: case 4:
				p = "faila skopirovano";
				break;
			}
		printf("%d %s\n", n, p);
	}

Got the point? I've spent a hell lot of time making bilingual release
of Unix (Russian and English) and finally I think it would be better
to have a standard function for producing plural forms, something like:

	printf("%d file%s copied\n", n, plural(n, ":s", "english"));

	printf("%d fail%s skopirovan%s\n", n, plural(n, "ov::a", "russian"),
					      plural(n, "o::o", "russian"));

I have some ideas about how to handle plural forms and multilingual
diagnostics on binary installations but it requires a lot of changes
in compilers, languages, link editors and kernels. It's a long
story but it is possible.

Vadim Antonov
DEMOS, Moscow, USSR

rjohnson@shell.com (Roy Johnson) (02/27/91)

In article <3404.27c905aa@iccgcc.decnet.ab.com> browns@iccgcc.decnet.ab.com (Stan Brown) writes:
>> In article <1991Feb19.104810.549@ZYX.SE>, arndt@zyx.ZYX.SE (Arndt Jonasson) writes:
>>> We all have seen and, in various degrees, been irritated by texts such
>>> as:
>>> 	1 files were copied
>>> when it should have been:
>>> 	1 file was copied
>>> 
>>> I'd like to know: when programming, how do you avoid such errors? 

>In article <3331.27c23984@iccgcc.decnet.ab.com>, 
>I stuck my coding pencil in my ear:
>>      printf("%d error%s copied\n", nfiles, nfiles?"s were":" was");
>> or
>>      printf("files copied: %d\n", nfiles);

>Several persons emailed to point out one or more of the the obvious errors
>in the first one.  I would correct it to
	  printf("%d file%s copied\n", nfiles, nfiles=1?" was":"s were");
-----------------------------------------------------^
you mean ==
8^)
--
======= !{sun,psuvax1,bcm,rice,decwrl,cs.utexas.edu}!shell!rjohnson =======
Feel free to correct me, but don't preface your correction with "BZZT!"
Roy Johnson, Shell Development Company

dbrooks@osf.org (David Brooks) (02/27/91)

browns@iccgcc.decnet.ab.com (Stan Brown) writes:
|>      printf("%d error%s copied\n", nfiles, nfiles?"s were":" was");

Here you must assume your program is never going to be altered so as
to be usable by a non-English speaker.  Otherwise, you've given them
the job of not only finding all embedded strings and directly
translating them, but also recoding to allow for different grammatical
constructions.

Of course, if you prepared for internationalization you'd use other
techniques anyway.

|>      printf("files copied: %d\n", nfiles);

Another reason to prefer this.

-- 
David Brooks				dbrooks@osf.org
Systems Engineering, OSF		uunet!osf.org!dbrooks
"It's not easy, but it is simple."

daw@cbnewsh.att.com (David Wolverton) (02/27/91)

In article <3433.27ca3674@iccgcc.decnet.ab.com>, browns@iccgcc.decnet.ab.com (Stan Brown) writes:
> >> We all have seen and, in various degrees, been irritated by texts such
> >> as:
> >> 	1 files were copied
> >> when it should have been:
> >> 	1 file was copied
> >> 
> >> I'd like to know: when programming, how do you avoid such errors? 
>  ...
>  Which all goes to show: simpler is better.  The "files copied: %d" version
>  has been error free since the beginning--as far as I know.

As an addendum to this discussion, you might consider how these sorts
of problems occur when a program is internationalized (that is, converted
into another language) (that is, a human language other than English).
A truly "world class" (pun intended) programmer will consider such
issues during development.

As an example of the kind of problem that you can get into,
consider a printf() like this:
	printf("Your account balance on %s is %s\n", balance, date);

Assuming that you pull the 1st argument out into a #define in a header
or a text file somewhere, at least the following other issues come up:
        1. How to print the correct currency symbol?
        2. How to format values > 999?
        3. How to format the date?
        4. Whether, in a language other than English, would
	the message read more sensibly with the second and
	third printf() arguments reversed?	

The ANSI C standard library (in a hosted implementation)
addresses the first three points [Doug, no need to respond
on those points unless I've screwed something up], but not the last.
HP's prinf() family contains some extensions for this purpose.
I'm not aware of any other vendor that has attacked this issue.

Dave Wolverton
David.Wolverton@att.com
...!att!honshu!daw

john@mingus.mitre.org (John D. Burger) (02/28/91)

daw@cbnewsh.att.com (David Wolverton) writes:

>> [discussion of pluralization deleted]

>As an addendum to this discussion, you might consider how these sorts
>of problems occur when a program is internationalized (that is, converted
>into another language) (that is, a human language other than English).
>A truly "world class" (pun intended) programmer will consider such
>issues during development.

>Assuming that you pull the 1st argument out into a #define in a header
>or a text file somewhere, at least the following other issues come up:
>        1. How to print the correct currency symbol?
>        2. How to format values > 999?
>        3. How to format the date?
>        4. Whether, in a language other than English, would
>	the message read more sensibly with the second and
>	third printf() arguments reversed?	

Acting as a Lisp gadfly, my response would be "Too bad you don't
program in a language with a real object system, in which the solution
would be to have dates, currency amounts and so forth each know how to
form their own printed representation.  One would then have subtypes
of DATE such as EUROPEAN-DATE, and subtypes of CURRENCY-AMOUNT such as
RUBLE, DUETSCH-MARK and DOLLAR."

Such an implementation has other wins as well.  For instance, if
currency amounts are represented as numbers, it gets very inconvenient
to represent $19.99.

Of course, one often wants more light-weight representations of such
things, in which case the appropriate thing to do is program in an
object-oriented manner even if you don't use explicit objects.
--
John Burger                                               john@mitre.org

"You ever think about .signature files? I mean, do we really need them?"
  - alt.andy.rooney

bright@nazgul.UUCP (Walter Bright) (02/28/91)

In article <1991Feb19.104810.549@ZYX.SE> arndt@zyx.ZYX.SE (Arndt Jonasson) writes:
/We all have seen and, in various degrees, been irritated by texts such as:
/	1 files were copied
/when it should have been:
/	1 file was copied
/I'd like to know: when programming, how do you avoid such errors? Are

I've never discovered any magic way, other than brute force:

	printf((n == 1) ? "%d file was copied\n" : "%d files were copied\n",n);
or:
	printf("%d file%s copied\n",n,(n == 1) ? " was" : "s were");

karl@ima.isc.com (Karl Heuer) (02/28/91)

In article <1991Feb26.224900.2283@cbnewsh.att.com> daw@cbnewsh.att.com (David Wolverton) writes:
>[The one point not addressed by the ANSI C standard library is]
>        4. Whether, in a language other than English, would
>	the message read more sensibly with the second and
>	third printf() arguments reversed?

ANSI didn't address it, but X/Open did.  ISC has this feature in the XPG3-
conforming release of Interactive Unix.  (I wrote the code for this one.)

Karl W. Z. Heuer (karl@ima.isc.com or uunet!ima!karl), The Walking Lint

gudeman@cs.arizona.edu (David Gudeman) (02/28/91)

In article  <1991Feb27.161925.8675@linus.mitre.org> John D. Burger writes:
]
]Acting as a Lisp gadfly, my response would be "Too bad you don't
]program in a language with a real object system, in which the solution
]would be to have dates, currency amounts and so forth each know how to
]form their own printed representation.

What does this have to do with having a "real object system"?  You can
write different printing routines for different data types in almost
any language.
--
					David Gudeman
gudeman@cs.arizona.edu
noao!arizona!gudeman

john@ethel.mitre.org (Ralph Marshall 617 271-8784) (03/01/91)

gudeman@cs.arizona.edu (David Gudeman) writes:
> [I wrote]
>>Acting as a Lisp gadfly, my response would be "Too bad you don't
>>program in a language with a real object system, in which the solution
>>would be to have dates, currency amounts and so forth each know how to
>>form their own printed representation.
>
>What does this have to do with having a "real object system"?  You can
>write different printing routines for different data types in almost
>any language.

Maybe I don't understand you, but how do you do this in C?
--
John Burger                                               john@mitre.org

"You ever think about .signature files? I mean, do we really need them?"
  - alt.andy.rooney

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (03/01/91)

In article <6479@skye.cs.ed.ac.uk>, nick@cs.edinburgh.ac.uk (Nick Rothwell) writes:
> In article <1991Feb20.001242.9592@Think.COM>, barmar@think.com (Barry Margolin) writes:
> > (format t "You have ~D dependenc~*:P." n-dep)

> Yucko, I think I preferred doing it my way... (he says, being otherwise
> quite fond of the huge amounts of chrome adorning C's printf() lib call).

The DEC-10 Prolog library has a command writef(Format, [X1,...,Xn]) which
has as one of its features "% <n> j" where <n> is an integer defining the
language rules to use.  Why "j"?  Well, it's the obvious choice for
Esperanto...  Why is there no Common Lisp format that optionally adds
"ob", the Quechua plural suffix?  Why is the distinction only between
singular and plural -- what happened to the dual number?  For English,
how about the words borrowed from French that add an "x"?

I think that it is unrealistic to expect a formatting sublanguage to
handle the complexities of human languages.  The best I've been able
to come up with is
	void print_variant(number, stem, singular, dual, plural)
	    int number;
	    char *stem, *singular, *dual, *plural;
	    {
		printf("%s%s", stem,
		    ( number == 1 ? singular
		    : number == 2 ? dual : plural ));
	    }
This can be packed, using TAB to separate 
	<stem> [TAB <plural> [TAB <singular> [TAB <dual>]]]
	-- default plural: empty	"sheep"
	-- default singular: empty	"cow	s"
	-- default dual: same as plural	"m	en	an"

	void print_variant(number, string)
	    int number;
	    char *string;
	    {
		char *p, *q;
	
		p = strchr(string, '\t');
		if (p == NULL) { printf("%s", string); return; }
		printf("%.*s", p-string, string);
		string = p+1;
		p = strchr(string, '\t');
		q = p == NULL ? NULL : strchr(p+1, '\t');
		/* plural p>TAB singular q>TAB dual
		or plural p>TAB singular NUL; q==NULL
		or plural NUL; p==q==NULL
		*/
		if (number == 1) {
		    if (p == NULL) return;
		    if (q == NULL) printf("%s", p+1);
		    else printf("%.*s", q-p-1, p+1);
		} else
		if (number == 2 && q != NULL) {
		    printf("%s", q+1);
		} else {
		    if (p == NULL) printf("%s", string);
		    else printf("%.*s", p-string, string);
		}
	    }

So	printf("%d", NFiles);
	print_variant(NFiles, " file\ts were\t was");
	printf(" deleted.\n");

One could perhaps add this to something like C, where %<n>j would
take n as the number, and the string from the argument list, getting
	xprintf("%d %*j deleted.\n",
		Nfiles, NFiles, "file\ts were\t was");

-- 
The purpose of advertising is to destroy the freedom of the market.

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/02/91)

In article <1991Feb26.161256.14202@hq.demos.su> avg@hq.demos.su (Vadim Antonov) writes:
> Got the point? I've spent a hell lot of time making bilingual release
> of Unix (Russian and English) and finally I think it would be better
> to have a standard function for producing plural forms, something like:
> 	printf("%d file%s copied\n", n, plural(n, ":s", "english"));
> 	printf("%d fail%s skopirovan%s\n", n, plural(n, "ov::a", "russian"),
> 					      plural(n, "o::o", "russian"));

In my forthcoming error-message library:

  %1 file%{%1%(e-plural)} copied%
  %1 fail%{%1%(r1-plural)} skopirovan%{%1%(r2-plural)}%

where e-plural is ``%_1%_%=%?s:'' and r{1,2}-plural are something like
``%_%d1%_%-10%_%/10%_%*%-%d1%_%=%?ov:%!%?{5%_%<%?a:}:'' Okay, okay, so
the interface still needs quite a bit of work, but this sort of
internationalization can certainly be done within configuration files.

---Dan

davis@grenelles.ilog.fr (Harley Davis) (03/05/91)

In article <1991Feb26.224900.2283@cbnewsh.att.com> daw@cbnewsh.att.com (David Wolverton) writes:

   [Messages in programs in multiple languages]

   I'm not aware of any other vendor that has attacked this issue.

Because of its international market and origins in a (ahem)
non-standard CS language, ILOG, Europe's largest Lisp vendor, has had
to address this issue in Le-Lisp.  The solution is simple, but so far
has been effective.  Instead of putting constant strings in messages,
a message object is referenced.  The message object provides
translations of the object into several languages.  At runtime,
several languages can be simultaneously loaded, and a current one
selected.  Alternatively, languages need not be loaded into particular
applications; this decision can be made at link or runtime.

eg:

(defun foobar (x)
  (assert (and (fixp x) (>= x 0))
    (error #M:argument-not-natural x))
  ...)

(defmessage :argument-not-natural
  (english "Argument not a natural")
  (french "L'argument n'est pas un entier positif")
  (german "Ich nicht haben ein idea how to sprechen this in deutsch"))

More complicated permutations can be made by writing specialized
functions which are conditionalized on the current language.

-- Harley Davis
--
------------------------------------------------------------------------------
nom: Harley Davis			ILOG S.A.
net: davis@ilog.fr			2 Avenue Gallie'ni, BP 85
tel: (33 1) 46 63 66 66			94253 Gentilly Cedex, France