[comp.dcom.telecom] A Puzzle

Joe Stong <jst@ccnext.ucsf.edu> (02/13/90)

I'm sending this to telecom, because I suspect this sort of data keeping
is liable to be used by the phone companies...

I was asked recently how much disk space would be needed to keep
VISA's bad card list: I was told it had 400 million numbers.  It is my
understanding that VISA numbers have 12 digits, and 4 check digits.
(Is this right?), thus 10^12 or a trillion numbers.  Raw storage of
these numbers would require 40 bits per number, or 5 TeraBytes.

Now, a bitmap of a trillion numbers would take 125 GigaBytes.  Since
the bad numbers are sparse (about 1 in 2500), on the average, the
distance between bad numbers could be kept, on the AVERAGE, as a 12
bit quantity (sometimes more, sometimes less, using a
bit-level-escaping technique), so, assuming 12 bits * 4 million gives
600 megabytes, (maybe double that to account for the encoding
overhead, and indexing to make the lookups fast).

Is this realistic?  Does anyone know of any better techniques for
storing the numbers?  Does the phone company do any compression on the
tables of who has what features in an ESS office, or in the systems
that they use to do billing accounting?

Anyone know of any pointers to material dealing with large number
storage problems?

Send me mail please, or post and send mail, and I'll post a summary of
responses.  I have trouble keeping up with the volume of NetNews.

P.S.:

I didn't promise a summary about the X.25 encapsulation question, but
many replies suggested that vendors are typically using HDLC packets
without an acknowledgement mecanism to carry TCP-IP packets for
transmission on T1 lines.  I have a friend who "thinks the whole
world" is doing what he is doing, which is systems that encapsulate
TCP-IP in X.25 packets on T1's for a mostly DOD network.  It appears
that he doesn't have the big picture.

	Joe Stong	jst@cca.ucsf.edu

wrf@mab.ecse.rpi.edu (Wm Randolph Franklin) (02/23/90)

There is a way to store a sparse list of long numbers or words in less
space than it would take to list them, if you allow some false
positives.  Hence it would be better to store the valid numbers, else
people would be falsely accused of using bad cards.

Let N = number of numbers/words to store.

Set M = 20M (20 is a user parameter)

Allocate B[1:M] a long bit string, initially 0.  So our total storage
req is 20 bits/number, INDEPENDENT of the numbers' lengths, which may
be much less than needed to store the numbers.

Define 10 hash functions from numbers to bit locations: Hi(n) -> l
where i says which hash function, n is the number to be hashed, and l
is the output bit number.

Now, to store number n, set the 10 bits B[H1(n)], ..., B[H10(n)].
Repeat for all the card numbers to be stored.  This will set somewhat
less than half of all the bits in B (because of some bits being set
several times).

To check whether a number is valid, compute the 10 hash functions and
check the 10 bits.  If any bit is 0, that is definitely an invalid
number.  On the other hand if the number is invalid there is less than
a 1/1024 = 0.1% chance that this scheme gives a false positive.  Note
that the error rate is a function of 10 and 20.

This is also an excellent method to store a spelling dictionary.

Does anyone know if this is actuallu used for calling or credit cards?


		   Wm. Randolph Franklin
Internet: wrf@ecse.rpi.edu (or @cs.rpi.edu)    Bitnet: Wrfrankl@Rpitsmts
Telephone: (518) 276-6077;  Telex: 6716050 RPI TROU; Fax: (518) 276-6261
Paper: ECSE Dept., 6026 JEC, Rensselaer Polytechnic Inst, Troy NY, 12180