Joe Stong <jst@ccnext.ucsf.edu> (02/13/90)
I'm sending this to telecom, because I suspect this sort of data keeping is liable to be used by the phone companies... I was asked recently how much disk space would be needed to keep VISA's bad card list: I was told it had 400 million numbers. It is my understanding that VISA numbers have 12 digits, and 4 check digits. (Is this right?), thus 10^12 or a trillion numbers. Raw storage of these numbers would require 40 bits per number, or 5 TeraBytes. Now, a bitmap of a trillion numbers would take 125 GigaBytes. Since the bad numbers are sparse (about 1 in 2500), on the average, the distance between bad numbers could be kept, on the AVERAGE, as a 12 bit quantity (sometimes more, sometimes less, using a bit-level-escaping technique), so, assuming 12 bits * 4 million gives 600 megabytes, (maybe double that to account for the encoding overhead, and indexing to make the lookups fast). Is this realistic? Does anyone know of any better techniques for storing the numbers? Does the phone company do any compression on the tables of who has what features in an ESS office, or in the systems that they use to do billing accounting? Anyone know of any pointers to material dealing with large number storage problems? Send me mail please, or post and send mail, and I'll post a summary of responses. I have trouble keeping up with the volume of NetNews. P.S.: I didn't promise a summary about the X.25 encapsulation question, but many replies suggested that vendors are typically using HDLC packets without an acknowledgement mecanism to carry TCP-IP packets for transmission on T1 lines. I have a friend who "thinks the whole world" is doing what he is doing, which is systems that encapsulate TCP-IP in X.25 packets on T1's for a mostly DOD network. It appears that he doesn't have the big picture. Joe Stong jst@cca.ucsf.edu
wrf@mab.ecse.rpi.edu (Wm Randolph Franklin) (02/23/90)
There is a way to store a sparse list of long numbers or words in less space than it would take to list them, if you allow some false positives. Hence it would be better to store the valid numbers, else people would be falsely accused of using bad cards. Let N = number of numbers/words to store. Set M = 20M (20 is a user parameter) Allocate B[1:M] a long bit string, initially 0. So our total storage req is 20 bits/number, INDEPENDENT of the numbers' lengths, which may be much less than needed to store the numbers. Define 10 hash functions from numbers to bit locations: Hi(n) -> l where i says which hash function, n is the number to be hashed, and l is the output bit number. Now, to store number n, set the 10 bits B[H1(n)], ..., B[H10(n)]. Repeat for all the card numbers to be stored. This will set somewhat less than half of all the bits in B (because of some bits being set several times). To check whether a number is valid, compute the 10 hash functions and check the 10 bits. If any bit is 0, that is definitely an invalid number. On the other hand if the number is invalid there is less than a 1/1024 = 0.1% chance that this scheme gives a false positive. Note that the error rate is a function of 10 and 20. This is also an excellent method to store a spelling dictionary. Does anyone know if this is actuallu used for calling or credit cards? Wm. Randolph Franklin Internet: wrf@ecse.rpi.edu (or @cs.rpi.edu) Bitnet: Wrfrankl@Rpitsmts Telephone: (518) 276-6077; Telex: 6716050 RPI TROU; Fax: (518) 276-6261 Paper: ECSE Dept., 6026 JEC, Rensselaer Polytechnic Inst, Troy NY, 12180