Computer cipher solving – Lesson 2: Hillclimbing continued

Hillclimbing sounds fairly straightforward, but there are issues to be aware of. For example, step 1 is to pick a random key. Why not use a standard key, such as the regular alphabet A-Z for a cryptogram? After all, you end up swapping letters around randomly anyway. Actually, you can do this but it will usually take longer for statistical reasons too complicated to explain, and you may well get stuck on the same local maximum every time. See the last paragraph for more on this. More important is that you modify the key (step 4) randomly. For example, you cannot just swap every letter with every other letter in an orderly fashion, i.e. first with second, then first with third, fourth, etc. That first comparison you made will only be a valid test for that one arrangement of all the other letters. If you were to compare those same two letters in the same two places after many of the other letters have been swapped around, they may not produce the same result. If you don’t want to take my word for it, just try it yourself and you’ll see that choosing two letters randomly works better.

Another issue is how to modify the key. I’ve used swapping two letters in the key as an example, and that works fine for simple substitution ciphers because we know what the key is – 26 different letters, A-Z, in some order. But a different type of modification may be necessary for other cipher types. For a Playfair or other polybius-square based key you may need to swap entire columns or rows sometimes instead of letter pairs or use a mix of methods. Sometimes making a three-letter switch works better. For a Myszkowski, swapping doesn’t work because the key contains an unknown number of repeated letters. You might need to pick a random letter in the right range (but not necessarily another key letter) and replace one of the key letters for step 4. You need to experiment and adapt to the specific cipher type.

A very important issue is how to tell if you’re program is producing better keys. Metaphorically speaking, what do you use for your altimeter as you climb the hill? For that matter, how do you know that you’ve solved the cipher? This method produces thousands or even millions of decryptions in a matter of minutes. You can’t examine them all by eye and make a personal judgment call. There are several useful methods. One simple method applies when you have a crib, i.e. bit of known plaintext. Suppose you know the word “nuclear” appears in the text of the message, but you don’t know where. That’s called a crib. You can test your decryptions by checking to see if the word nuclear appears in the decryption and either save or display those decryptions and the keys that produced them. That might work pretty well with a short simple cipher like a Baconian, since if the crib is produced, usually the rest of the decryption will also be correct or close enough that you can figure it all out. But for other types this isn’t enough. You can produce that one word many ways and the rest of the decryption may be useless gibberish. Not only that, you may have the entire solution almost perfect but for the word “nuclear” misspelled as “unclear.” Your program will skip over that and never show it to you since the crib isn’t there. What you need for hillclimbing and most other forms of computer attack is a scoring method. You need to give points for improvements. For step 6 you will save a key if it produces a decryption with a higher score and reject it if it doesn’t. I’ll discuss in another lesson how to do this.

Before we leave hillclimbing, though, there is another big issue to deal with: local maxima, i.e. those incorrect decryptions that can’t be improved further by the incremental changes to the key. Remember how I mentioned that the paratrooper can end up on the top of the wrong hill or rise and not be able to climb any further? That happens constantly in hillclimbing, which is why I put in step 8. You can simply give up and go back and scramble your key completely and start over. If you always start with a straight A-Z alphabet or other “standard” key, that’s like always dropping onto the same exact starting point, the small hill  next to the correct mountain and you will never get to the radar station by climbing the hill. Start over with a random key. Computers can repeat a process millions of times very quickly, so this method generally works pretty well, but there are more sophisticated, more effective methods of getting off, or avoiding altogether, the hill. That, too, is a topic for a future lesson.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.