jbn@glacier.STANFORD.EDU (John B. Nagle) (10/13/87)
It's been 10 years since 1977, so let's rough out the design of a brute-force DES cracker, designed to try all 2^56 possible keys for a known-plaintext attack. No cryptoanalytic cleverness here, just current commercial electronics technology. THE CHIP We will need a custom VLSI chip. Our unit will contain a pipelined DES encryptor able to perform one encryption of 64 bits into 64 bits using a 56 bit key, registers to hold the known plaintext and cyphertext, comparators to detect a match between the plaintext and the output of the encryptor, and counter logic to advance the key on each unsuccesful try. The unit halts when a comparison is succesful, indicating that the correct key has been found. The chip thus explores a portion of the keyspace by itself, without any need for communication with the outside world while running. Only during startup, testing, and checking for stops is communication necessary, and the bandwidth required is quite modest. So inter-unit communication is very low. Let's use an 8-bit bus to connect the units. Each unit will have device registers, like an I/O device, and will be controlled by writing values into these device registers and queried by reading from them. Let's assume VLSI technology at the M68020 level; current technology, air-cooled parts, 20Mhz clock. Conservatively we can fit 16 units on a chip; maybe we could do better, but let's assume 16. Let's package the chip in a 28-pin dual-inline package. The wide 28-pin package should have enough room for our chip, and it's a standard size that can be fabricated cheaply; packaging can take place on standard low-cost packaging lines used for RAM chips. We'll give the chip a pinout very similar to that of a standard byte-wide static RAM. Thus, we can treat it electrically like a memory device, and address the device registers as memory. The chip will not generate interrupts, since it looks like a RAM. But it will have a device register which indicates whether any unit has stopped with a match. This register must be polled periodically to see if a solution has been found. So that's our chip. Each chip can try 20Mhz x 16 keys per second, or 320 x 10^6 keys per second. THE BOARD Since our chip looks like a RAM chip, our board will look like a memory board, a large array of densely packed sockets and some modest interface electronics to the rest of the world. Let's assume Eurocard packaging, with 9U-high extra-deep cards, similar to those used in SUN machines. We can fit 128 chips on such a board, and still have room for some interface electronics. For the sake of convenience, let's make our board look like a memory board on a VMEbus, using the 16-bit VMEbus interface. The board is a pure bus slave, and does not generate interrupts; it must be polled to determine when a find has been made. So our board is about 15 inches square, and can try 128 x 320 x 10^6 keys a second, or 4 x 10^10 keys per second. There are about 7.2 x 10^16 possible DES keys, so one board can break a key in about 2 x 10^6 seconds, or about 500 hours. On the average, one would find the key after searching half the keyspace, so 250 hours, or about 10 days, is a reasonable estimate for a one-board system. THE SYSTEM A practical system needs to be a bit larger. A 20-slot Eurocard cage, with 19 slots of custom cards and some standard 68000-based VME processor and memory card in the last slot for control, would be a useful configuration. Equipped with fans and power supplies of suitable size, and a 3 1/2" disk drive to provide some storage for the control machine, the entire system should fit in a 48-inch rack. At least two async ports should be provided, one for a control terminal to sit on top of the rack, and one for connection to a larger system. Software would be quite modest; some simple M68000 operating system without memory protection, such as M-DOS, or even CP/M 68K, would be sufficient. Alternatively, one could use a "PC-on a VME card" board and run MS-DOS, which would simplify software development. One job of the software is to test all the units periodically. Since each unit is independently addressable, and each can be started on a test problem separately, the test system simply loads up each unit with a known test problem which should result in a stop within a few seconds, and checks back after an interval to see if the unit properly found the solution. Any unit which fails is reported and not used again. Failures during operation can be handled gracefully without difficulty. During runs, the system should stop all units every ten minutes or so, save the current key values they are working on, check them against what they were loaded with, mark as dead any units with incorrect values, and run the test sets, again marking any failed units as down. Units still up are reloaded with their key values and restarted. The aborted searches are rerun later on other units of the array. Unit failure is thus not a serious problem. CONCLUSION A DES-breaking machine of modest size and complexity is well within the current state of the art. While it would be necessary to develop one specialized VLSI chip, everything else is available off-the-shelf in commercial-grade electronics. The VLSI chip required is well within the capabilities of many fabricators and would not require any exotic technology. Development costs are difficult to estimate, but should comparable to those of developing a new and complex graphics board for a commercial microcomputer. Total manufacturing costs for a complete system should not exceed $250K in quantities of 10. John Nagle
karn@faline.bellcore.com (Phil R. Karn) (10/15/87)
> So that's our chip. Each chip can try 20Mhz x 16 keys per second, > or 320 x 10^6 keys per second. I do not know how much chip real estate a DES engine takes, but John's figures seem a tad optimistic when compared to existing technology. Typical commercial DES chips, e.g., the AMD 9518/Z8068, do NOT pipeline their calculations. The 9518 has one set of S and P boxes and iterates 16 times. Including overhead, it takes a total of 18 clock cycles to do one complete encryption or decryption. Key loading isn't particularly fast either. The designers probably considered a low pin count to be more important than making the chip especially useful for key cracking. Pipelining the DES engine would clearly increase its area at least 16 times, since not only do you need 16 copies of the S and P boxes but also 16 key registers. You also need a way to extract the proper subkey in parallel at each of the 16 stages. John further hypothesizes that each chip would contain sixteen of these complete DES engines, for a total complexity increase of at least 256x over existing DES chips. I don't know if the RAM comparison is valid, at least inside the chip, since a DES engine is much less regular than a RAM. True, the 9518 is a few years old, implemented in N-MOS technology, and not specifically designed for key cracking. Still, I don't know if a complexity increase of 256x combined with a speed increase of 5x is reasonable in 1987 CMOS. Perhaps somebody can comment on this. Not that any of this really takes away from the John's valid argument that the DES key is too small. An extra factor of x16 or even x256 really isn't all that much protection given inexorable improvements in technology. Cracking DES through brute force is still out of range of a plug-in card for your PC, but certainly not out of the capability of the NSA with custom VLSI and much more computer room floor space than John allowed for his machine. Phil
devine@vianet.UUCP (Bob Devine) (10/16/87)
In article <17195@glacier.STANFORD.EDU>, jbn@glacier.STANFORD.EDU (John B. Nagle) writes: > It's been 10 years since 1977, so let's rough out the design of a > brute-force DES cracker, designed to try all 2^56 possible keys for > a known-plaintext attack. No cryptoanalytic cleverness here, just > current commercial electronics technology. > > Each chip can try 20Mhz x 16 keys per second, or 320 x 10^6 keys per second. > [...] > our board can try 128 x 320 x 10^6 keys a second, or 4 x 10^10 keys per > second. There are about > 7.2 x 10^16 possible DES keys, so one board > can break a key in about 2 x 10^6 seconds, or about 500 hours. (10 days) > [...] > Total manufacturing costs for a system should not exceed $250K in > quantities of 10. I'm no EE, but, I doubt that you could come up with a chip that can calculate a new DES value per clock-cycle even with a lot of pipelining. And putting 16 processing units on a single chip sounds to me like it will be Rev Z before all bugs are worked out. Yes it might be possible but not for the price you gave. [But then, that's why we give the CIA billions of bucks every year...] An off-the-shelf approach would be to take an existing DES chip (our parent company has a 6 MHz chip for under $20) and gang them on a board with some controller and fast memory. Using a large number, say 1K, of processors for a system, it looks like the key search would take: 1. each chip calculates and compares = 7.2 * 10^16 keys / 1K chips ~= 5 * 10^13 keys / chip 2. assume "average" number of searches needed ~= 2 * 10^13 keys/chip 3. each calc/comp iteration takes (reasonable guess?) 100 microseconds [if someone wants, I can get the exact figure] 4. Time until average solution = 2 * 10^13 * 10^-4 seconds = 2 * 10^9 seconds ~= 60 years So, unless someone sees a problem with my calculations, a DES cracker would need about 1000 such systems to bring the time down into the range of months. I'd guess that the cost per system is approximately $20K for the DES chips plus $30K for the "glue" chips for $50K per box. To speed this up an obvious path to choose is a faster DES chip. There may be some on the market now it's just that I had the press release for the WD20C03 chip in my file. Bob Devine [ BTW, 7.2 * 10^16 == 72 quadrillion ]
john@hpcvlo.HP.COM (John Eaton) (10/16/87)
<<<< < It's been 10 years since 1977, so let's rough out the design of a < brute-force DES cracker, designed to try all 2^56 possible keys for < a known-plaintext attack. No cryptoanalytic cleverness here, just < current commercial electronics technology. ---------- Ok, so its possible to break a DES encrypted message for a reasonable amount of money in real time. Your analysis assumes that the message was only encrypted once with a single key and can be detected once it has been broken. The classic NSA response to machines like yours is that if your that paranoid then encrypt it a second time with a different key. You can also do things like compress the text before shifting or rotating the bits in each byte. These may not be very effective by themselves but they greatly increase the complexity of your detection circuits that you must build thousands of. John Eaton !hplabs!hp-pcgh egh
baum@apple.UUCP (Allen J. Baum) (10/17/87)
-------- [] >I do not know how much chip real estate a DES engine takes, but John's >figures seem a tad optimistic when compared to existing technology. > >Pipelining the DES engine would clearly increase its area at least 16 >times, since not only do you need 16 copies of the S and P boxes but >also 16 key registers... I'm not sure that this is true. What happens at each stage is a permutation and an xor. If each stage doesn't have to share the permutation logic, then its simply wires, not lots of wires + a 16 way mux -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385
jim@randvax.UUCP (Jim Gillogly) (10/17/87)
In response to John Nagle's DES-cracking proposal Phil R. Karn writes: >> So that's our chip. Each chip can try 20Mhz x 16 keys per second, >> or 320 x 10^6 keys per second. > >Typical commercial DES chips, e.g., the AMD 9518/Z8068, do NOT pipeline >their calculations. The 9518 has one set of S and P boxes and iterates >16 times. Including overhead, it takes a total of 18 clock cycles to do >one complete encryption or decryption. > ... >Not that any of this really takes away from the John's valid argument >that the DES key is too small. Not being a hardware type, I'm not sure about the argument that a 20 MHz clock means 20M decryptions per second, so I'll fall back on the famous USENET "argument by appeal to authority". In Crypto 84 (more formally, Advances in Cryptology: Proceedings of Crypto 84, Ed. Blakley & Chaum, Springer-Verlag 1985) some Belgian scientists (F. Hoornaert, J. Goubert and Y. Desmedt) came up with a proposed hardware architecture for fast DES operation, including pipelining. They assumed a 20 MHz clock rate and gave the following timing for a single encryption: 48 (=16 iterations) + 2 (pipeline delay) + 3 (I/O) = 53 cycles. They expected throughput in ordinary DES operation of about 20 Mbit/sec. They included a description of a configuration of the proposed chip that would deal with about 1.13 x 10^6 keys/sec in a brute force key hacking mode. They proposed a system that would brute force a key in a week or two, within an order of magnitude of John Nagle's proposal. I don't remember the costs that they estimated, but I think they expected the chips themselves to be about $40 each in quantity. The article is "Efficient hardware implementation of DES; also design for exhaustive key search machine", pp 147 ff. Of course, it would be even more satisfying if one of us could find a healthy key-folding algorithm and crack it elegantly... -- Jim Gillogly {hplabs, ihnp4}!sdcrdcf!randvax!jim jim@rand-unix.arpa
karn@faline.bellcore.com (Phil R. Karn) (10/18/87)
> I'm not sure that this is true. What happens at each stage is a permutation > and an xor. If each stage doesn't have to share the permutation logic, then > its simply wires, not lots of wires + a 16 way mux Each round in DES involves an expansion, exclusive-OR, a ROM table lookup (the S-boxes) followed by a permutation (P-boxes) and then another exclusive-OR. Except for the Initial Permutation (IP) and Reverse Initial Permutation (IP-1) which you could factor out of the algorithm and do once externally, *all* of the work in encrypting and decrypting has to be repeated at each round. You also need to generate different sub-keys for each round. So I think an estimate of 16x chip area for a pipelined DES chip over a non pipelined version is reasonable. While wires may be "free" in terms of propagation time, they are not at all free in terms of chip area. Complicated wire crossings like the P boxes in DES may take up a considerable amount of real estate, making the chip much less dense than a regular device like a RAM. Phil
jim@randvax.UUCP (Jim Gillogly) (10/19/87)
In article <1483@faline.bellcore.com> karn@faline.bellcore.com (Phil R. Karn) writes: >Each round in DES involves an expansion, exclusive-OR, a ROM table >lookup (the S-boxes) followed by a permutation (P-boxes) and then >another exclusive-OR. If it helps, the S-box step can be done with Boolean operators: each of the 4 output bits of each S box can be represented as a Boolean function of the 6 input bits. Some of them are simpler than one might expect, as has been pointed out in the literature. I suspect this would be faster in hardware than a ROM lookup, but it would presumably take more chip area and reinforce the bottom line of Phil's argument. -- Jim Gillogly {hplabs, ihnp4}!sdcrdcf!randvax!jim jim@rand-unix.arpa