As a dedicated member of the American Cryptogram Association, I like to solve ciphers and I enjoy writing programs for that purpose. Regular readers will have read some of my posts on the subject. Recently I tried an experiment to modify my programs to make them faster. It didn’t work very well.
In its simplest form computer solving can be described as two steps: 1. try a decryption and 2. test the result. This process is repeated, usually until a solution is found or the user stops it. It can be done by running through a word list or by randomized guesswork such as hillclimbing. My idea to speed it up is very simple: don’t test the entire decryption; just test the beginning.
The way my programs usually do the testing is to score every tetragram (4-letter group) for its frequency in English or other target language. The more high-frequency tetragrams in the trial decryption, the more likely it is to be near the solution. I found that if I just test the first four tetragrams instead of the whole thing, it does save time. I found a strict cutoff score that allows 99.3% of valid decryptions through for a full decryption, but cuts off up to 90% of false decryptions depending on the cipher type. I thought this would improve my solving time by 90%. It didn’t turn out that way.
The first problem is that this method doesn’t reduce the number of decryptions, only the number of test cycles. For many cipher types, it is necessary to decrypt the entire thing before you have the beginning to test. It turns out that the decryption phase takes longer than the testing phase. The time savings were there on several types, but so minimal that it wasn’t wroth it. Remember, it’s not 100% accurate, so that means that every now and then it would cut off a valid solution, too, which is a limitation.
The second problem is that this only works for solving methods that rely on a 100% solution. For example if the decryption method is to decrypt using a word list as a key source, you expect that when the correct key word is tried, the solution will be 100% valid English, including the beginning. It is safe to assume that if the beginning has bad tetragrams, one should go on to the next key. But if one is using hillclimbing, that is not so. You might make a change to the key that improves the overall score, and should be preserved, but doesn’t necessarily produce good text at the beginning. My method won’t work at all on a hillclimber, simulated annealing, etc. It also doesn’t work on types with numerals or with Playfair unless your frequency data is modified to deal with the numbers or extra X’s for the same reason.
I tried solving the first problem by extending the method to the decryption process. For many types, such as Vigenere, it is possible to just decrypt the beginning. So I did that, decrypting only the beginning and testing it. There were some savings in time, but surprising to me, it was only around 1% and still had the same risk of cutting out the valid solution. Apparently interrupting the normal loop of decryption to test the start slows down the whole process, especially for a type like Vigenere, that produces quite a few valid-looking beginnings even with wrong keys. The cost of the extra test cycle balanced out the savings from culling out some full decryptions.
Another limitation in the method is that it doesn’t work at all with transposition types. They have the same number of high-frequency letters as normal text, so nearly every trial decryption will pass the test and go on for the full-length decrypt/test cycle. In the end I decided to chuck the whole idea. It still seems like it ought to be possible to streamline the trial decrypt/test process. One way to do it it if you have a crib is to test whether the crib appears in the trial decryption. Similarly, if your decryption process involves reducing the text to a simple substitution, test to see if the pattern of the crib appears.