[comp.protocols.iso.x400] Dutch names in X.400 and/or RFC 1148

Erik.Huizer%SURFNET.NL@cunyvm.cuny.EDU (06/07/90)

One of the main problems I am facing, now I'm introducing 'real' users to
X.400 in SURFnet, is a peculiarity in Dutch surnames that I do not know how
to handle in X.400 and or RFC 1148 (987).

As the problem is easily recognisible for Dutch people, being used to this
peculiarity, I like some advice from 'foreign' experts on how to handle
this.

A surprisingly large part of the Dutch population has a surname consisting
of 1 or two adjectives and the real surname, e.g.:

Jan van der Steen   (givenName=Jan, surname=Steen, adjectives=van der)
Jan de Vries
Jan van Keulen

Names like these can be found in directories like the PTT telephone white
pages directory under the first letter of the surname. So you'd find Jan
van der Steen under the S.

If Jan van der Steen registers for a conference abroad, he has a random
chance to be found in the list of participants either under the S or under
the V or even maybe under the D.

My problem is of course where do I put the adjectives in an X.400
OR-address? In the Surname or in the GivenName?
My personal view would be to put the adjectives in the Surname.

Then how does a RFC987/1148 gateway handle this? What comes out will be
something like (RFC1148 par 4.2.1 and 3.4):
GivenName = Jan
Surname = van der Steen

mapping into: Jan.van der Steen@domain (RFC822)
Which is of course an unusable RFC-822 address.

The nicest RFC-822 adres would of course be:
Jan.van.der.Steen@domain, but the algorithm proposed in RFC1148 (par.
4.3.4.) does not handle this.

Of course I made lots of mistakes in interpreting the standards, so please
correct me. Any suggestions, opinions etc. on this subject are welcome,
except of course the suggestion that all the Dutch inhabitants should
Americanise (or Belgiumanise) their names to something like: Vandersteen


                           _   _   _   _
                          |S| |U| |R| |F|
___________________________|___|___|___|_________________________
                             |   |   |
                            (n) (e) (t)
Erik Huizer                                    tel: +31 30 310290
Network development                            fax: +31 30 340903
SURFnet b.v.                            E-mail: Huizer@SURFnet.nl
P.O.box 19035
3501 DA Utrecht
The Netherlands

grimm@darmstadt.gmd.dbp.de (Ruediger Grimm) (06/07/90)

As far as I understand the state of the arts:

Jan van der Steen   ->    X.400: S=van der Steen;G=Jan;OU=...;C=nl
		   <->   RFC822: Jan.van(b)der(b)Steen@somewhere.in.nl

Not nice. But Ludwig van Beethoven wouldn't look much better.
Greetings --- Ruediger

mark@cbmark.cbcc.att.COM (Mark Horton) (06/08/90)

The problem is not unique to Dutch names - O'Brien causes similar
problems.

We have a similar problem in AT&T, where everyone is in a name
database.  It seems to be handled by ignoring the blanks.  If I
ask for people named "de vries", I get 6 people, 3 as "de vries"
and 3 different people as "devries".  If I ask for "devries" I get
the same 6 people.

If I ask for "obrien" I get many "obrien", many "o brien" and many
"o'brien".  So it's ignoring apostrophes too.

I propose that you map Jan van der Steen into S="van der Steen"
which in 822 would become Jan.vanderSteen@domain , and have all
your lookup algorithms squeeze out blanks, apostrophes, and other
characters that cause trouble, such as hyphens.

	Mark

pv@Eng.Sun.COM (Peter Vanderbilt) (06/08/90)

>  As far as I understand the state of the arts:
>
>  Jan van der Steen   ->    X.400: S=van der Steen;G=Jan;OU=...;C=nl
>  		   <->   RFC822: Jan.van(b)der(b)Steen@somewhere.in.nl

Actually, I think the 822 form should be (according to RFC 987/1148)

     Jan.van_der_Steen@somewhere.in.nl

Pete

jaap@uunet.uu.NET (Jaap Akkerhuis) (06/08/90)

In article <9006071947.AA05364@cbmark.cbcc.att.com> mark@cbmark.cbcc.att.COM (Mark Horton) writes:
 > The problem is not unique to Dutch names - O'Brien causes similar
 > problems.
 >
 > We have a similar problem in AT&T, where everyone is in a name
 > database.  It seems to be handled by ignoring the blanks.  If I
 > ask for people named "de vries", I get 6 people, 3 as "de vries"
 > and 3 different people as "devries".  If I ask for "devries" I get
 > the same 6 people.

So it is time for AT&T to fix there database.

Note that the original requester was dealing with ``real world
problems''. For Dutch people there a name Jan van der Steen denotes
an other person then Jan vander Steen or Jan van Dersteen etc.

 >
 > If I ask for "obrien" I get many "obrien", many "o brien" and many
 > "o'brien".  So it's ignoring apostrophes too.

But mister O'Brian might not!

 >
 > I propose that you map Jan van der Steen into S="van der Steen"
 > which in 822 would become Jan.vanderSteen@domain , and have all
 > your lookup algorithms squeeze out blanks, apostrophes, and other
 > characters that cause trouble, such as hyphens.
 >

People dealing with computers on a regular base might find this
acceptable, but it isn't for ordinary people. Should we also drop
diactical marks?  That might be an insult. The list with problems continues....

	jaap

piet@cwi.nl (Piet Beertema) (06/08/90)

	As the problem is easily recognisible for Dutch people, being used
	to this peculiarity
Americans should know about it too, since many Dutch names
are known there. Usually they are "americanized" though;
a famous example these days is vanGogh or VanGogh. I don't
presume this contraction was done with X.400 in mind... :-)

	My problem is of course where do I put the adjectives in an X.400
	OR-address? In the Surname or in the GivenName?
	My personal view would be to put the adjectives in the Surname.
Logically speaking that's the place where they belong:
adjectives definitely are part of a person's surname.

	Then how does a RFC987/1148 gateway handle this? What comes out will
	be something like (RFC1148 par 4.2.1 and 3.4):
	GivenName = Jan
	Surname = van der Steen
	mapping into: Jan.van der Steen@domain (RFC822)
	Which is of course an unusable RFC-822 address.
	The nicest RFC-822 adres would of course be:
	Jan.van.der.Steen@domain
Not only "nice", but correct too for most mailers nowadays.

	Any suggestions, opinions etc. on this subject are welcome, except of
	course the suggestion that all the Dutch inhabitants should Americanise
	(or Belgiumanise) their names to something like: Vandersteen
An alternative might be to Frenchise them: van-der-Steen
(or van_der_Steen for computer addicts...). ;-)


	Piet

Christian.Huitema@mirsa.inria.fr (Christian Huitema) (06/08/90)

>  As far as I understand the state of the arts:
>>
>>  Jan van der Steen   ->    X.400: S=van der Steen;G=Jan;OU=...;C=nl
>>  		   <->   RFC822: Jan.van(b)der(b)Steen@somewhere.in.nl
>
>Actually, I think the 822 form should be (according to RFC 987/1148)
>
>     Jan.van_der_Steen@somewhere.in.nl
>
>Pete
Pete,

I do agree with you, and this is in fact what we have implemented in
our X.400 gateway (M.PLUS). Blanks are useless in RFC-822 adresses,
and the translation from "S=van der Steen" to "van_der_Steen", and
back, is very natural.

However, you are not quite correct when you state that
this transform is ``according to RFC 987/1148''; due to some bizarre
infuences (English influences, indeed), RFC-1148 states that
"van_der_Steen@somewhere.in.nl" should be converted back to
"S=van(u)der(u)Steen". The mapping between space and underline,
which was defined in the appendix A of RFC-987 has been removed from
RFC-1148...

If you were to completely follow RFC-1148 on
that point (which I dont recommend), the mapping of the X.400
address above would become:
	<"/S=van der Steen/G=Jan"@somewhere.in.nl>
Delightful, isn't it?

Christian Huitema
PS.
Note that <Jan.van(b)der(b)Steen@somewhere.in.nl> does not make any
sense in RFC-822. Anything between parenthesis is a comment, and is
striped off by the RFC-822 parser; (b) is the escape for a bang (!)
within an X.400 Printable string.

zben@umd5.umd.EDU (Ben Cranston) (06/09/90)

Please excuse comment from one who is NOT terribly educated to the X.400
way of doing things.  In the Real World telephone companies arrange to have
all these names sort together so a human being can see them all in one
context and have the maximum information available when she makes her choice.
Which exact place is really a secondary question, although there will usually
be good human-interface grounds for making this decision.

If this means "van der" gets ignored in the sorting this is OK.  As stated
by others, the apostrophe in o'brien gets ignored, the space in "de vries"
(of course this is going to be language and culture dependant).  Perhaps
the Scottish and Irish have to deal with Mc/Mac/Mac<space> as well.

So, we have to figure out how to get all the "equivalent" names onto a screen
for the sender to choose which one she really wants.  This requires a very
"loose" match procedure.  When the choice is made (or in return addresses)
we need a UNIQUE address, which implies a very "tight" matching procedure.

I suggest the appropriateness of X400 be judged on its ability to support
this kind of real-world problem solving rather than try to arbitrarily get
people to change their names to fit an insufficiently general preconception.

Sorry if this is belaboring the obvious...


--

"It's all about Power, it's all about Control
 All the rest is lies for the credulous"
-- Man-in-the-street interview in Romania one week after Ceaucescu execution.

houttuin@tis.llnl.GOV (06/09/90)

What is the problem with mapping these kind of addresses into
e.g.

   jan."van der steen"@somewhere.in.nl

It is a completely valid RFC822 local-part constructed of
2 words, one of which is a qouted string.
Please take care with mapping undescores to spaces and v.v.,
it has been causing us lots of problems lately. People also
use underscores in local-parts. They shouldn't necessarily
be mapped into spaces!

___________________________
 Jeroen Houttuin
 +41 1 2565837
 houttuin@ks.id.ethz.ch
---------------------------

S.Kille@cs.ucl.ac.UK (Steve Kille) (06/09/90)

 >From:  Ruediger Grimm <grimm@de.dbp.gmd.darmstadt>
 >To:    mhsnews@ch.switch.edu.uci.ics,
	 rare-wg1@ch.switch
 >Subject: Re: Dutch names in X.400 and/or RFC 1148
 >Date:  Fri, 8 Jun 90 12:07:43 +0000


 >What happened to (b) (a) (d) (instead of blank, @, $) ?
 >---Ruediger

 These mappings are not relevant here.   (b) maps to "!" (bang) and
 (d) is not valid 1148


 Steve

vcerf@NRI.Reston.VA.US (06/10/90)

If embedded blanks are converted to underscores on entry
into the RFC822 Internet and converted back to embedded
blanks on return into the X.400 world, what is done with
addressees in the X.400 environment which contain
underscores? We ran into the problem recently when we
found some addresses in the MCI Mail system which require
underscores. We had been converting embedded blanks into
underscores (since MCI Mail formal names have embedded
blanks). We automatically converted underscore to blank
on entry into MCI Mail from the RFC822 domain. This broke
when we realized we had to retain some underscored addresses
because that is how they were supposed to look in the
MCI Mail environment.

We found that attempting to use "\" as the quote character
did not work uniformly in the Internet (some mailers clean
up the address strings by removing the "\" characters).

Has the need for preserving underscored addresses while
in the X.400 environment arisen? If so, what has been the
preferred solution?

thanks,

Vint Cerf

Christian.Huitema@mirsa.inria.fr (Christian Huitema) (06/11/90)

>If embedded blanks are converted to underscores on entry
>into the RFC822 Internet and converted back to embedded
>blanks on return into the X.400 world, what is done with
>addressees in the X.400 environment which contain
>underscores?

1) Underscore is not a valid ``Printable string'' character.
As far as X.400(84) is concerned, all OR-naming attribute, including
DD attributes, have either a NumericString or PrintableString
syntax. Underscore cannot be legally present in this case.

2) If you plan to use X.400(88), the underscore can
only ``validly'' be found when the syntax of the X.400 attribute is
either ``IA5'' (0x5F) or ``T61''. This could be the case either in
the T.61 representation of a standard attribute, or in a particular
extension. The escape mechanism of RFC-1148 should be applied there,
i.e. represent the T.61 character 0x5F by the escape sequence
"{095}".

In short: underscores cannot be legally present in current X.400
ORNames. Their usage by MCI is a protocol violation. It probably
results from a direct transposition in X.400 attributes of some
internal MCI addresses; that transposition should be done more
cleverly, e.g. by mapping the MCI underscores to X.400 spaces.

One will never repeat too loud that the current RFC-1148, on that
point, stinks. It tries to enforce a mapping (of X.400 spaces to
RFC-822 spaces) which requires the quoting mechanism. As the usage
of quotes is hardly supported by some networks, e.g. UUCP and
DECNET, a companion RFC (1137) has been defined which allows the
conversion of space to underscores. Which means that the same ORName
will be converted to <"van der Steen"@foodam.nl> if it passes
through a RFC conformant X.400 to SMTP gateway, and to
<van_der_Steen@foodam.nl> if it passes first through a X.400 to
UUCP gateway, to be relayed to SMTP later. Hourrah!

Christian Huitema
PS.
Back to the initial point. If "Piet van der Steen" should be
sorted as "steen" rather than "vandersteen", then the convention
in French libraries would be to enter it as <S=Steen(van der); G=Piet>

solomon@cs.wisc.EDU (Marvin Solomon) (06/11/90)

The whole "common name" syntax of X.400 always struct me as cultural
imperialism of the worst kind.  The vast majority of people in this world
do not have a concept of "surname", "given name" (at least they didn't call
it "Christian name" :-), "generational qualifier", etc.  I'm reminded of
an anecdote by a friend of mine from southern India.  In that part of India,
a person normally has one name that serves the functions of both the
European given name and surname:  It is individually assigned (not inherited),
it is used as the common form of address, it is used formally, and it
is used as the primary key for indexing.  One also has several other
names derived from the names of ones parents, ones village, etc., which are
almost always abbreviated to intials, as in "Mr. A.B.C. Ramanujan".
(As Dr. Satyanarayanan of CMU once put it, "My name is Satya.  The rest is a
checksum.")

My friend was married by a justice of the peace in a small town in central
Pennsylvania.  The poor justice of the peace struggled for some time trying
to pronounce all the names ("Do you, mummble mumble take mumble mumble...").
As my friend tells the story, "By the end of the ceremony, her grandfather
had married my village."

The framers of X.400 would have done much better to define a Common Name
as simply a SEQUENCE OF PrintableString, together with some comments to
the effect that the most significant name should go first.
Thus we'd have
	{ "Solomon", "Marvin", "H" },
	{ "Porter", "Joseph", "KCB", "Sir" },
	{ "Steen", "van", "der", "Piet" },
	{ "Mao", "Tse", "Tung" },
	{ "Ramanujan", "A", "B", "C" },
etc.  Of course, that still imposes an English bias by insisting on an
unadorned subset of the Latin alphabet, but to do otherwise would introduce
a host of much more serious technological problems.

S.Kille@cs.ucl.ac.UK (Steve Kille) (06/11/90)

RFC-822-Headers:
Phone: +44-71-380-7294
In-reply-to: Your message of Mon, 11 Jun 90 11:07:04 -0000.
             <9006111102.AA14796@gjetost.cs.wisc.edu>

==================
Marv,

I wish that I could be as confident as you are about the "right" way to do
things.  You point out a number of very reasonable criticisms of structured
names in X.400.

Interestingly, X.500 moved awy from the X.400 approach to an unstructured
Common Name.  This is flexible, and useful for some aspects.  It gives
problems of managment, and leads to storage of too much data.  Alternate
values are a very serious problem.  Sometimes, the semantics of the
components would give useful information.

There are comprimises between these.  Your approach suggests an ordered
list, without giving semantics to the components.  The recent Xerox approach
has both forms available in parallel.  These seem to have some advantages
and disadvantages relative to the extreme approaches.

I remain to be convinced that there is a single "right" way of solving this
problem.   The different appoaches have varing (dis)advantages.


Steve

vcerf@NRI.Reston.VA.US (06/12/90)

Christian,

thanks for the helpful tutorial.

I want to remove a possible misunderstanding about MCI Mail: it
does not use underscores in its X.400 ORnames, but there exist
private mail systems which link to MCI Mail and which do use
the underscore character in their mailbox names. These strings
emerge into the Internet through our relay at NRI (which is
trying to do appropriate conversion where necessary).

Vint Cerf

SOLOMON%CS.WISC.EDU@cunyvm.cuny.EDU (06/13/90)

>X-Mailer: ELM [version 2.2 PL10]

The whole "common name" syntax of X.400 always struct me as cultural
imperialism of the worst kind. The vast majority of people in this world
do not have a concept of "surname", "given name" (at least they didn't call
it "Christian name" :-), "generational qualifier", etc. I'm reminded of
an anecdote by a friend of mine from southern India. In that part of India,
a person normally has one name that serves the functions of both the
European given name and surname: It is individually assigned (not inherited),
it is used as the common form of address, it is used formally, and it
is used as the primary key for indexing. One also has several other
names derived from the names of ones parents, ones village, etc., which are
almost always abbreviated to intials, as in "Mr. A.B.C. Ramanujan".
(As Dr. Satyanarayanan of CMU once put it, "My name is Satya. The rest is a
checksum.")

My friend was married by a justice of the peace in a small town in central
Pennsylvania. The poor justice of the peace struggled for some time trying
to pronounce all the names ("Do you, mummble mumble take mumble mumble...").
As my friend tells the story, "By the end of the ceremony, her grandfather
had married my village."

The framers of X.400 would have done much better to define a Common Name
as simply a SEQUENCE OF PrintableString, together with some comments to
the effect that the most significant name should go first.
Thus we'd have
	{ "Solomon", "Marvin", "H" },
	{ "Porter", "Joseph", "KCB", "Sir" },
	{ "Steen", "van", "der", "Piet" },
	{ "Mao", "Tse", "Tung" },
	{ "Ramanujan", "A", "B", "C" },
etc. Of course, that still imposes an English bias by insisting on an
unadorned subset of the Latin alphabet, but to do otherwise would introduce
a host of much more serious technological problems.

mark@cbmark.cbcc.att.COM (Mark Horton) (06/13/90)

> > We have a similar problem in AT&T, where everyone is in a name
> > database.  It seems to be handled by ignoring the blanks.  If I
> > ask for people named "de vries", I get 6 people, 3 as "de vries"
> > and 3 different people as "devries".  If I ask for "devries" I get
> > the same 6 people.
>
>So it is time for AT&T to fix there database.
>
>Note that the original requester was dealing with ``real world
>problems''. For Dutch people there a name Jan van der Steen denotes
>an other person then Jan vander Steen or Jan van Dersteen etc.

Exactly.  In the real world, you have to deal with lots of real world
issues, such as databases full of records that were prepared by
different people using different rules, and bureaucracy that prevents
"fixing" the database.  (For example, our database is the result of
merging several separate payroll databases, and it's extremely unlikely
that the payroll department is going to change the spelling of the names
of some people to match the names of other people.  Just getting these
folks to store email addresses and process updates for them is like
pulling teeth.)

The real issue for us is how to design the algorithms to map among
the various possibilities.  Ignoring certain characters such as blank,
apostrophe, hyphen, etc will cause a looser match, which is exactly
what we want for a name lookup.  For X.400 to 822 translation, as
someone pointed out, Jan.van_der_Steen@whatever is even better,
although you'd still want to ignore the _'s in the lookup algorithm.

	Mark