[comp.unix.questions] spell bug?????

bs@augusta.UUCP (Burch Seymour) (08/07/87)

If this is an old item, someone let me know by e-mail. There's too much
traffic on this group to follow all of it. Otherwise, we discovered, quite
by accident, that there are patterns of letters which spell accepts
as real words even though they are nonsense. I've tried these on two
manufacturer's Unix systems and gotten the same result, so I don't think
it's a local bug (Sun and Gould are the two if anyone is curious).
Three of the words which are passed through by spell are:

 vfppvdu, plbpvhb, and nbclowd

Is this a known problem? Do these words pass on all unix systems?
--------------------------------------------------------------------------------
    #####              Gould Computer Systems Division
      ###                  in Sunny South Florida
#####   #####                   Burch Seymour
#####    ####     =>  ...!{seismo,sun,pur-ee,brl-bmd}!gould!bseymour
#####   #####     =>  ...!{ihnp4!codas,allegra}!novavax!gould!bseymour
      ##          => NOTE: Disregard header info. Email to above paths only.
    ####          => The opinions expressed are probably not worth disclaiming
================================================================================

gwyn@brl-smoke.ARPA (Doug Gwyn ) (08/08/87)

In article <541@augusta.UUCP> bs@augusta.UUCP (Burch Seymour) writes:
>I've tried these on two
>manufacturer's Unix systems and gotten the same result, so I don't think
>it's a local bug (Sun and Gould are the two if anyone is curious).
>Three of the words which are passed through by spell are:
> vfppvdu, plbpvhb, and nbclowd

/usr/5bin/spell on BRL Gould systems rejects these as misspelled.
Possibly your /usr/bin/spell is not hashing correctly.

(Note:  I haven't had requests for updated /usr/5* software from
Gould for a long time.  Maybe their /usr/5bin/spell is also broken.)

bicker@hoqax.UUCP (The Resource, Poet of Quality) (08/08/87)

In article <541@augusta.UUCP>, bs@augusta.UUCP (Burch Seymour) writes:
> Three of the words which are passed through by spell are:
>  vfppvdu, plbpvhb, and nbclowd
> Is this a known problem? Do these words pass on all unix systems?

Not mine.

bk

dhesi@bsu-cs.UUCP (Rahul Dhesi) (08/09/87)

In article <6257@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) 
writes:
>In article <541@augusta.UUCP> bs@augusta.UUCP (Burch Seymour) writes:
>>I've tried these on two
>>manufacturer's Unix systems and gotten the same result, so I don't think
>>it's a local bug (Sun and Gould are the two if anyone is curious).
>>Three of the words which are passed through by spell are:
>> vfppvdu, plbpvhb, and nbclowd
>
>/usr/5bin/spell on BRL Gould systems rejects these as misspelled.
>Possibly your /usr/bin/spell is not hashing correctly.

I gave those three words to 4.3BSD `spell` and it accepted them as
legal.  However, it flagged "burch", "seymour", "doug", "gwyn", and
"gould" as misspellings :-).

Purely speculative possibility:

A good way of preserving a copyright on collections of items that
individually cannot be copyrighted is to include a few red herrings
that could not be there by chance.  Vendors of mailing lists thus
include a few otherwise unknown addresses.  Similarly, I've heard that
dictionaries include a few authentic-sounding nonsense words that were
created by the publisher.  Theft of the collection can then be proven
more easily.
-- 
Rahul Dhesi         UUCP:  {ihnp4,seismo}!{iuvax,pur-ee}!bsu-cs!dhesi

avr@hou2d.UUCP (Adam V. Reed) (08/09/87)

In article <541@augusta.UUCP>, bs@augusta.UUCP (Burch Seymour) writes:
> Three of the words which are passed through by spell are:
>  vfppvdu, plbpvhb, and nbclowd

Spell flags them correctly on our system (Vax running V.3).
						Adam

jph@houxa.UUCP (J.HARKINS) (08/10/87)

In article <541@augusta.UUCP>, bs@augusta.UUCP (Burch Seymour) writes:
> Three of the words which are passed through by spell are:
>  vfppvdu, plbpvhb, and nbclowd
> Is this a known problem? Do these words pass on all unix systems?

Works OK on my system.  Did some joker define a local dictionary and stick these in?
A shell script could be replacing the real spell executable, and running the
real one with a local dictionaty.  Just a possibility.
-------
Disclaimer: I hereby disclaim all my debts.
-------
Jack Harkins @ AT&T Bell Labs
Custom Digital Solutions
(201) 949-3618
(201) 561-3370
..!ihnp4!houxf!jph

kathy@bakerst.UUCP (08/11/87)

In article <541@augusta.UUCP> bs@augusta.UUCP (Burch Seymour) writes:
>
>Three of the words which are passed through by spell are:
>
> vfppvdu, plbpvhb, and nbclowd
>
>Is this a known problem? Do these words pass on all unix systems?


My spell program catches those words - UNIX PC, v.3.5


Kathy                                  kathy@bakerst.UUCP
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
         {ihnp4|mtune|ptsfa} _____
       {hplabs|seismo}!kitty _____\__ !bakerst!kathy
           {mcnc|duke}!ethos _____/

randy@umn-cs.UUCP (Randy Orrison) (08/11/87)

In article <541@augusta.UUCP>, bs@augusta.UUCP (Burch Seymour) writes:
> Three of the words which are passed through by spell are:
>  vfppvdu, plbpvhb, and nbclowd
> Is this a known problem? Do these words pass on all unix systems?

They all pass with flying colors on this Vax 11/780 running (i think) vanilla
4.3BSD.  Couldn't it just be a simple matter of a hash collision?
-- 
Randy Orrison, University of Minnesota School of Mathematics
UUCP:	{ihnp4, seismo!rutgers!umnd-cs, sun}!umn-cs!randy
ARPA:	randy@ux.acss.umn.edu		 (Yes, these are three
BITNET:	randy@umnacvx			 different machines)

sns@genghis.UUCP (Sam Southard) (08/11/87)

In article <944@bsu-cs.UUCP>, dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
> A good way of preserving a copyright on collections of items that
> individually cannot be copyrighted is to include a few red herrings
> that could not be there by chance.  Vendors of mailing lists thus
> include a few otherwise unknown addresses.  Similarly, I've heard that
> dictionaries include a few authentic-sounding nonsense words that were
> created by the publisher.  Theft of the collection can then be proven
> more easily.

I have no problem beliving this happens with mailing lists, but in a
dictionary?  I really doubt that a dictionary would have to resort to something
like this to preserve a copyright.  If a lot of the definitions are worded
exactly the same, then it would be pretty obvious.  It's not as if a dictionary
manufacturer can claim that someone stole the word from him.  If the new
dictionary has all the same words with different definitions how can it be a 
copyright violation?  If it has all the same definitions (or a very large
percentage of them) then it would be pretty obvious without needing to resort
to making up words.

Just think, the words "be" and "sit" were just made up by a publisher: they
don't really mean anything :).
-- 

Sam Southard, Jr.
{sns@genghis.caltech.edu|sns@genghis.uucp|{backbone}!cit-vax!genghis!sns}

geoff@ncr-sd.SanDiego.NCR.COM (Geoffrey R. Walton) (08/11/87)

In article <727@houxa.UUCP> jph@houxa.UUCP (J.HARKINS) writes:

>In article <541@augusta.UUCP>, bs@augusta.UUCP (Burch Seymour) writes:

>> Three of the words which are passed through by spell are:
>>  vfppvdu, plbpvhb, and nbclowd
>> Is this a known problem? Do these words pass on all unix systems?
>
>Works OK on my system.  Did some joker define a local
>dictionary and stick these in? A shell script could be
>replacing the real spell executable, and running the
>real one with a local dictionaty.  Just a possibility.
>-------

I sensed a trend in the "yes, it does this" and "no, it
doesn't" replies/follow-ups to the original posting.  So I
decided to try a little experiment.

ncr-sd is a Pyramid 90x, running OSx 4.0, so we have both the
SYSVR3 and BSD 4.3 versions of "spell" (plus WWB and a few local
variants).  The strings mentioned in the original posting _DO_
pass unnoticed when the Berkeley speller is used; however, the
ATT version of the command catches them every time, as does WWB
(not surprisingly).

A workable, but kludged, solution -- for sites with only the BSD
"spell" -- is to add these strings, and any other known nonsense
words, to the local stoplist used by the spell-checker.  The
real solution, of course, is to fix the source (for those sites
that have source).  (Is anyone in Berkeley listening? ;^))

#include <all_usual_disclaimers.h>

Geoff Walton
Software Publications
NCR E&M San Diego
geoff@ncr-sd.SanDiego.NCR.COM
or
{wherever}!ucbvax!sdcsvax!ncr-sd!geoff
Even the smallest problem becomes unsolvable if enough
meetings are held to discuss it.

tony@artecon.artecon.UUCP (Anthony D. Parkhurst) (08/11/87)

In article <541@augusta.UUCP>, bs@augusta.UUCP (Burch Seymour) writes:
 [deleted] ... we discovered, quite
> by accident, that there are patterns of letters which spell accepts
> as real words even though they are nonsense. 
  [deleted]
>  vfppvdu, plbpvhb, and nbclowd

> Is this a known problem? Do these words pass on all unix systems?

Many moons ago, when we received an early (pre-release?) version of
HP-UX (based on System III), we played with spell a bit.  First off,
the words file did not come with source, so since spell had an
option which told you if a word was in the spelling list, we
wrote a program with generated all permutations of letters:
	a b c d e ... z aa ab ac ... zzzzzzzz
and ran it thru spell to generate a source list 
(Yes, we did realize that it would take a LONG time to complete, but
is was fun anyway)

Well, after a couple of hours, we looked at the output and were 
appalled at the nonsence that passed spell.  Many words with no vowels
(which is increasingly difficult to have in the English language 
as the word size grows) with 5 or 6 letters.

We called a support group, then (knowing I would get no response) I called
the division that was working on it. 

Anyway, the answer had to do with the algorithm used by spell.  Somehow
it was designed to catch words that were almost correct, but not
complete nonsense.  They even gave a reference to a published work on
the algorithm, but I have since lost it.

I would assume that since other people say that their speller catches these,
then either their word source is quite different (creating different tables),
or have bugs fixed (if they exist, "that's not a bug, it's a feature"), or
simply use other algorithms that are more correct (but may be slightly slower).

The answer to your question is:  Yes, it is common to UNIX.
-- 
**************** Insert 'Standard' Disclaimer here:  OOP ACK! *****************
*  Tony Parkhurst -- {hplabs|sdcsvax|ncr-sd|hpfcla|ihnp4}!hp-sdd!artecon!adp  *
*                -OR-      hp-sdd!artecon!adp@nosc.ARPA                       *
*******************************************************************************

ken@cs.rochester.edu (Ken Yap) (08/11/87)

|A workable, but kludged, solution -- for sites with only the BSD
|"spell" -- is to add these strings, and any other known nonsense
|words, to the local stoplist used by the spell-checker.  The
|real solution, of course, is to fix the source (for those sites
|that have source).  (Is anyone in Berkeley listening? ;^))

Nobody has mentioned the possibility that the words in question fell
through spell's probabilistic detection algorithm. Last time I looked
at spell, it hashes each suspect word with 12 (?) independent hash
functions to get all 12 bit addresses and checks to see if all bit
locations were turned on in the bitmap. If they all are, the
word is let through as legitimate.

Maybe the bitmap is too full or the hash doesn't work properly.  Only
in the second case would the source be fixable.

	Ken

purdom@rabbit1.UUCP (Chris Purdom) (08/12/87)

BSD 4.1 spell also misses the weird words posted earlier.

nz@hotlg.ATT (Neal Ziring) (08/12/87)

In article <1024@hoqax.UUCP> bicker@hoqax.UUCP (The Resource, Poet of Quality) writes:
 > In article <541@augusta.UUCP>, bs@augusta.UUCP (Burch Seymour) writes:
 > > Three of the words which are passed through by spell are:
 > >  vfppvdu, plbpvhb, and nbclowd
 > > Is this a known problem? Do these words pass on all unix systems?
 > Not mine.

Not mine either.  I am on an AT&T System V 2.0v2 with Bell Labs enhancements
on a VAX 8650.  It is labelled (sp?) as revision 1.7, for whatever good that
does.
-- 
...nz  (Neal Ziring  @  ATT-BL Holmdel, x2354, 3H-437)
	"You can fit an infinite number of wires into this junction box,
	but we usually don't go that far in practice."
					London Electric Co. Worker, 1880s

reeves@decvax.UUCP (Jon Reeves) (08/13/87)

In article <1262@sol.ARPA> ken@cs.rochester.edu (Ken Yap) writes:
>Nobody has mentioned the possibility that the words in question fell
>through spell's probabilistic detection algorithm.

This is, indeed, what happened.  Ken's description was mostly correct
(except the magic number is 11) for BSD systems.  Even with proper
tuning, the algorithm still has a 1-in-2048 chance of generating a false
match.  As an example, "nbclowd" collides with:

chloroplatinate, telephone/vulnerable, nonogenarian/polarography, crocus,
gummy, irremovable, kingpin, lucre/tangential, alma, whirligig, Curran.
(Where two words are separated by a slash, both collide with the same
hash value.)

On a BSD-derived system, the best way to check for nonsense strings is
to use match (or grep, or comm) against /usr/dict/words.  Spell is
designed for catching typos.

System V uses a completely different algorithm that can't generate this
kind of false matches.
-- 
Jon Reeves	decvax!reeves -or- reeves@decvax.dec.com
"[T]he use of the binary system in the machine is a passing phase ..."
 - Douglas Hartree, University of Cambridge, 1949.

frank@zen.UUCP (Frank Wales) (08/14/87)

In article <727@houxa.UUCP> jph@houxa.UUCP (J.HARKINS) writes:
>In article <541@augusta.UUCP>, bs@augusta.UUCP (Burch Seymour) writes:
>> Three of the words which are passed through by spell are:
>>  vfppvdu, plbpvhb, and nbclowd
>> Is this a known problem? Do these words pass on all unix systems?
>Works OK on my system.  [...]
>Jack Harkins @ AT&T Bell Labs
>..!ihnp4!houxf!jph

Well, I just tried it here (HP-UX 5.15 on HP9000 series 500, SysV-derived),
and our spell didn't complain about any of them either.  Some further info
for anybody who's *really* interested:

        $ what /usr/bin/spell
        /usr/bin/spell:
                27.1   85/07/12
                spell.sh        27.1      85/05/20
        $ what /usr/lib/spell/spell*
        /usr/lib/spell/spellin:
                  26.1
        /usr/lib/spell/spellout:
                  26.1
        /usr/lib/spell/spellprog:
                  26.1

I'd be more interested in finding out how you discovered that spell
accepted such "words".

Orthographised Frank.       [frank@zen.uucp<->seismo!ukc!zen.co.uk!frank]

allbery@ncoast.UUCP (08/15/87)

As quoted from <727@houxa.UUCP> by jph@houxa.UUCP (J.HARKINS):
+---------------
| In article <541@augusta.UUCP>, bs@augusta.UUCP (Burch Seymour) writes:
| > Three of the words which are passed through by spell are:
| >  vfppvdu, plbpvhb, and nbclowd
| > Is this a known problem? Do these words pass on all unix systems?
| 
| Works OK on my system.  Did some joker define a local dictionary and stick these in?
| A shell script could be replacing the real spell executable, and running the
| real one with a local dictionaty.  Just a possibility.
+---------------

Nope -- we're running System III, I checked all the spell stuff and it's in
order... but it thinks those "words" above are spelled correctly.  So at least
*some* versions of "spell" are confused.