Archive-name: cryptography-faq/part08
Last-modified: 94/01/25
This is the eighth of ten parts of the sci.crypt FAQ. The parts are mostly independent, but you should read the first part before the rest. We don't have the time to send out missing parts by mail, so don't ask. Notes such as ``[KAH67]'' refer to the reference list in the last part.
The sections of this FAQ are available via anonymous FTP to rtfm.mit.edu as /pub/usenet/news.answers/cryptography-faq/part[xx]. The Cryptography FAQ is posted to the newsgroups sci.crypt, talk.politics.crypto, sci.answers, and news.answers every 21 days.
Contents
8.1. How do I recover from lost passwords in WordPerfect?
8.2. How do I break a Vigenere (repeated-key) cipher?
8.3. How do I send encrypted mail under UNIX? [PGP, RIPEM, PEM, ...]
8.4. Is the UNIX crypt command secure?
8.5. How do I use compression with encryption?
8.6. Is there an unbreakable cipher?
8.7. What does ``random'' mean in cryptography?
8.8. What is the unicity point (a.k.a. unicity distance)?
8.9. What is key management and why is it important?
8.10. Can I use pseudo-random or chaotic numbers as a key stream?
8.11. What is the correct frequency list for English letters?
8.12. What is the Enigma?
8.13. How do I shuffle cards?
8.14. Can I foil S/W pirates by encrypting my CD-ROM?
8.15. Can you do automatic cryptanalysis of simple ciphers?
8.16. What is the coding system used by VCR+?
8.1. How do I recover from lost passwords in WordPerfect?
Chris Galas writes: ``Someone awhile back was looking for a way to decrypt WordPerfect document files and I think I have a solution. There is a software company named: Accessdata (87 East 600 South, Orem, UT 84058), 1-800-658-5199 that has a software package that will decrypt any WordPerfect, Lotus 1-2-3, Quatro-Pro, MS Excel and Paradox files. The cost of the package is $185. Steep prices, but if you think your pw key is less than 10 characters, (or 10 char) give them a call and ask for the free demo disk. The demo disk will decrypt files that have a 10 char or less pw key.'' Bruce Schneier says the phone number for AccessData is 801-224-6970.
8.2. How do I break a Vigenere (repeated-key) cipher?
1. Discover the length of the key by counting coincidences. (See Gaines [GAI44], Sinkov [SIN66].) Trying each displacement of the ciphertext against itself, count those bytes which are equal. If the two ciphertext portions have used the same key, something over 6% of the bytes will be equal. If they have used different keys, then less than 0.4% will be equal (assuming random 8-bit bytes of key covering normal ASCII text). The smallest displacement which indicates an equal key is the length of the repeated key.
2. Shift the text by that length and XOR it with itself. This removes the key and leaves you with text XORed with itself. Since English has about 1 bit of real information per byte, 2 streams of text XORed together has 2 bits of info per 8-bit byte, providing plenty of redundancy for choosing a unique decryption. (And in fact one stream of text XORed with itself has just 1 bit per byte.)
If the key is short, it might be even easier to treat this as a standard polyalphabetic substitution. All the old cryptanalysis texts show how to break those. It's possible with those methods, in the hands of an expert, if there's only ten times as much text as key. See, for example, Gaines [GAI44], Sinkov [SIN66].
8.3. How do I send encrypted mail under UNIX? [PGP, RIPEM, PEM, ...]
cat file | compress | des private_key | uuencode | mail
Meanwhile, there is a de jure Internet standard in the works called PEM (Privacy Enhanced Mail). It is described in RFCs 1421 through 1424. To join the PEM mailing list, contact pem-dev-request@tis.com. There is a beta version of PEM being tested at the time of this writing.
There are also two programs available in the public domain for encrypting mail: PGP and RIPEM. Both are available by FTP. Each has its own newsgroup: alt.security.pgp and alt.security.ripem. Each has its own FAQ as well.
PGP is most commonly used outside the USA since it uses the RSA algorithm without a license and RSA's patent is valid only (or at least primarily) in the USA.
RIPEM is most commonly used inside the USA since it uses the RSAREF which is freely available within the USA but not available for shipment outside the USA.
Since both programs use a secret key algorithm for encrypting the body of the message (PGP used IDEA; RIPEM uses DES) and RSA for encrypting the message key, they should be able to interoperate freely. Although there have been repeated calls for each to understand the other's formats and algorithm choices, no interoperation is available at this time (as far as we know).
8.4. Is the UNIX crypt command secure?
8.5. How do I use compression with encryption?
Compression aids encryption by reducing the redundancy of the plaintext. This increases the amount of ciphertext you can send encrypted under a given number of key bits. (See "unicity distance")
Nearly all practical compression schemes, unless they have been designed with cryptography in mind, produce output that actually starts off with high redundancy. For example, the output of UNIX compress begins with a well-known three-byte ``magic number''. This produces a field of "known plaintext" which can be used for some forms of cryptanalytic attack. Compression is generally of value, however, because it removes other known plaintext in the middle of the file being encrypted. In general, the lower the redundancy of the plaintext being fed an encryption algorithm, the more difficult the cryptanalysis of that algorithm.
In addition, compression shortens the input file, shortening the output file and reducing the amount of CPU required to do the encryption algorithm, so even if there were no enhancement of security, compression before encryption would be worthwhile.
Compression after encryption is silly. If an encryption algorithm is good, it will produce output which is statistically indistinguishable from random numbers and no compression algorithm will successfully compress random numbers. On the other hand, if a compression algorithm succeeds in finding a pattern to compress out of an encryption's output, then a flaw in that algorithm has been found.
8.6. Is there an unbreakable cipher?
Of course, a cryptosystem need not be utterly unbreakable to be useful. Rather, it needs to be strong enough to resist attacks by likely enemies for whatever length of time the data it protects is expected to remain valid.
8.7. What does ``random'' mean in cryptography?
A software generator (also known as pseudo-random) has the function of expanding a truly random seed to a longer string of apparently random bits. This seed must be large enough not to be guessed by the opponent. Ideally, it should also be truly random (perhaps generated by a hardware random number source).
Those who have Sparcstation 1 workstations could, for example, generate random numbers using the audio input device as a source of entropy, by not connecting anything to it. For example,
cat /dev/audio | compress - >foo
gives a file of high entropy (not random but with much randomness in it). One can then encrypt that file using part of itself as a key, for example, to convert that seed entropy into a pseudo-random string.
When looking for hardware devices to provide this entropy, it is important really to measure the entropy rather than just assume that because it looks complicated to a human, it must be "random". For example, disk operation completion times sound like they might be unpredictable (to many people) but a spinning disk is much like a clock and its output completion times are relatively low in entropy.
8.8. What is the unicity point (a.k.a. unicity distance)?
Unicity distance, like all statistical or information-theoretic measures, does not make deterministic predictions but rather gives probabilistic results: namely, the minimum amount of ciphertext for which it is likely that there is only a single intelligible plaintext corresponding to the ciphertext, when all possible keys are tried for the decryption. Working cryptologists don't normally deal with unicity distance as such. Instead they directly determine the likelihood of events of interest.
Let the unicity distance of a cipher be D characters. If fewer than D ciphertext characters have been intercepted, then there is not enough information to distinguish the real key from a set of possible keys. DES has a unicity distance of 17.5 characters, which is less than 3 ciphertext blocks (each block corresponds to 8 ASCII characters). This may seem alarmingly low at first, but the unicity distance gives no indication of the computational work required to find the key after approximately D characters have been intercepted.
In fact, actual cryptanalysis seldom proceeds along the lines used in discussing unicity distance. (Like other measures such as key size, unicity distance is something that guarantees insecurity if it's too small, but doesn't guarantee security if it's high.) Few practical cryptosystems are absolutely impervious to analysis; all manner of characteristics might serve as entering ``wedges'' to crack some cipher messages. However, similar information-theoretic considerations are occasionally useful, for example, to determine a recommended key change interval for a particular cryptosystem. Cryptanalysts also employ a variety of statistical and information-theoretic tests to help guide the analysis in the most promising directions.
Unfortunately, most literature on the application of information statistics to cryptanalysis remains classified, even the seminal 1940 work of Alan Turing (see [KOZ84]). For some insight into the possibilities, see [KUL68] and [GOO83].
8.9. What is key management and why is it important?
Key management refers to the distribution, authentication, and handling of keys.
A publicly accessible example of modern key management technology is the STU III secure telephone unit, which for classified use employs individual coded ``Crypto Ignition Keys'' and a central Key Management Center operated by NSA. There is a hierarchy in that certain CIKs are used by authorized cryptographic control personnel to validate the issuance of individual traffic keys and to perform installation/maintenance functions, such as the reporting of lost CIKs.
This should give an inkling of the extent of the key management problem. For public-key systems, there are several related issues, many having to do with ``whom do you trust?''
8.10. Can I use pseudo-random or chaotic numbers as a key stream?
See [KNU81], exercise 3.5-7; [REE77]; and [BOY89].
8.11. What is the correct frequency list for English letters?
The second answer is that ``the English language'' varies from author to author and has changed over time, so there is no definitive list. Of course the lists in the books are ``correctly'' computed, but they're all different: exactly which list you get depends on which sample was taken. Any particular message will have different statistics from those of the language as a whole.
The third answer is that yes, no particular message is going to have exactly the same characteristics as English in general, but for all reasonable statistical uses these slight discrepancies won't matter. In fact there's an entire field called ``Bayesian statistics'' (other buzzwords are ``maximum entropy methods'' and ``maximum likelihood estimation'') which studies questions like ``What's the chance that a text with these letter frequencies is in English?'' and comes up with reasonably robust answers.
So make your own list from your own samples of English text. It will be good enough for practical work, if you use it properly.
8.12. What is the Enigma?
See [WEL82], [DEA85], [KOZ84], [HOD83], [KAH91].
8.13. How do I shuffle cards?
for ( n = ARRLTH-1; n > 0 ; n-- ) swap( &arr[modran( n+1 )], &arr[n] ) ;
modran(x) can not be achieved exactly with a simple (ranno() % x) since ranno()'s interval may not be divisible by x, although in most cases the error will be very small. To cover this case, one can take ranno()'s modulus mod x, call that number y, and if ranno() returns a value less than y, go back and get another ranno() value.
See [KNU81] for further discussion.
8.14. Can I foil S/W pirates by encrypting my CD-ROM?
As far as we know, this is impossible, since there is nothing in standard PC or workstation hardware which uniquely identifies the user at the keyboard. If there were such an identification, then the CD-ROM could be encrypted with a key based in part on the one sold to the user and in part on the unique identifier. However, in this case the CD-ROM is one of a kind and that defeats the intended purpose.
If the CD-ROM is to be encrypted once and then mass produced, there must be a key (or set of keys) for that encryption produced at some stage in the process. That key is useable with any copy of the CD-ROM's data. The pirate needs only to isolate that key and sell it along with the illegal copy.
8.15. Can you do automatic cryptanalysis of simple ciphers?
8.16. What is the coding system used by VCR+?