Computer cipher solving – Lesson 10: Genetic algorithms

I’ve discussed hill-climbing and its variants like simulated annealing in earlier lessons. They work well for most cipher types, at least most American Cryptogram Association (ACA) types.However, they don’t always work so well on certain types. For me Myszkowskis fall in that category. Thankfully, I learned about another technique that works better for Mysz ciphers: the genetic algorithm. It tends to improve its solutions more slowly than hill-climbing, but in the end it is usually more reliable.

So how does it work? Just like its name implies. You create a large population and continue to mate the individual members and their offspring until one of the children is the solution. Sounds simple, and from a programming standpoint, it is.

Let’s take the Myszkowski for an example. First, to create the population  you need to know, or assume, a key length. I’ll use 10 for the example. A Myszkowski uses a numeric key that has at least one doubled number (otherwise it would be a columnar, not a Mysz). That means the 10 columns can be keyed by any digit from 1 to 9. I use a data base of 100,000 keys. This may be larger or smaller than the ideal number, but it seems to work. I populate the database with 100,000 random 10-digit numbers using only the digits 1-9. If you want to get fancy and take advantage of Bayes’s Theorem you could weigh the distribution more heavily toward the lower end of that number range. I don’t bother with that, as there is a correcting step later in the algorithm.

Next you start the repeating loop by choosing two of these keys at random. One is arbitrarily chosen as the left (male) parent, and the other right (female) parent. Decipher the ciphertext with each parent, score the output for how well it resembles your target language, and keep track of the scores. Then choose a spot near the middle of of the key length. For key length 10, it should be a number from 4 through 7, say. I’ll use 5 for the example. Take the 5 left key digits of the male and the 5 rightmost digits of the female to make a new, 10-digit key. Decipher with that and score the output. If it scores worse than either parent, discard it and go back to the beginning of the loop to choose two new parents. If it outscores both parents, replace both with the child. If it only outscores one, replace that one with the child, but leave the other parent. Whenever a new best decryption results, display that or save it to a file. Repeat in an endless loop until you’re satisfied with the solution.

That’s pretty much all there is.  I have been told that every so often, just like in real life biology, you need to introduce a random variation – a mutation. My own very limited experiments have not proven this to be true, and in fact it is counterproductive. Perhaps it is more important when the key is longer. The theory is that your original population only contained 100,000 individuals. Yet there are 9^10 possible 10-digit Mysz keys, or almost 3.5 billion. The correct key has only about 1 chance in 3500 of being in that population at first (although there are usually quite a few equivalent keys using other digits). It may not be possible to produce the exact correct key by mixing parents’ genes if none of the parents have the necessary components. So at intervals when you are moving through the loop, in addition to mixing the two parents’ digits, make a random change of one or two digits and test that, keeping it if it improves the score. In effect this step is a form of hill-climbing, but since it is only occasional, not the primary altering step of each cycle, this is a fundamentally different algorithm.  Let me know if you have done any testing and can provide good data on optimization.

If the Mysz period is longer than 11 you have to use letters instead of digits for the keys. For other cipher types, like the Columnar, this method can be used, but you would have to insert an extra step to convert the child keys because there could be duplicate digits or letters. Obviously there are various adaptations that would be necessary for other cipher types, and it would not be suitable for many types at all.

2 thoughts on “Computer cipher solving – Lesson 10: Genetic algorithms

  1. Craig Worstell

    Thanks for your article on Myszkowski cipher solving. Prompted me to read the first 9 articles in the series. I have been using Hill -climbing and Annealing approaches for a few years now, and they work well . However, i haven’t had much luck with two key ciphers (like Quagmire III). Have you found related algorithms that work well with these types?

    Thanks

  2. Russ Post author

    In a word, yes. However, nothing that can be explained briefly in this response. If you have a reasonable crib, there are good techniques to break Quag IIIs. They work less well with the Q4, but can still be effective. Without a crib, they are less so, but still may work. I’ll send you a more useful description directly. I have your email.

Leave a Reply

Your email address will not be published. Required fields are marked *