Computer cipher solving – Lesson 5: Cribs

I use cribs in cipher solving at least four ways: 1) Research; 2) Tetragram scoring; 3) Length scoring; 4) Restrictive coding.

Research: I use the crib to determine the subject matter of the plaintext. That allows me to guess other words in the text or recognize likely letter sequences or keys. For example, if the crib is “his beard,” I might be inclined to look for or at least recognize the words Lincoln or Hemingway in the plaintext or key. I can use the crib content to try to extend the crib. Google Ngrams, for example, will tell you what words most often follow a sequence of other words. Sometimes I find the full plaintext online, although I rarely do this except for Xenocrypts since I don’t want to spoil the fun of solving. For some of the toughest ones, though, it may be the only way. Pencil and paper solvers use cribs the same way.

Tetragram scoring: I mentioned this in an earlier lesson. At the beginning of the program I load the tetragram frequency data into an array. After doing that I add points for each tetragram that appears in the crib. In addition to making my hillclimber or other program recognize a better decryption, it has the advantage of not requiring significant additional run time. The extra points don’t have to be added in during each tetragram lookup, only once at the very start. This method has a minor drawback. Sometimes the program may tend to lock in to a false solution that happens to produce the crib, or some portion of the crib, but this is rare and is usually short-lived. You can always rerun the program without a crib being entered. It has another disadvantage: it will not recognize a close match if there are no full identical tetragrams in the matching section. For example, if the crib is “hisbeard” and a trial decryption produces “hixbeaqd” the tetragram scores will be the default ones and not recognize this as coming close to the crib.

Length scoring: I’ve found this to be a quite effective improvement to tetragram scoring, although they can be used together. Like tetragram scoring it has the advantage of not requiring any additional programming on individual ciphertexts, but unlike tetragram scoring, it does use up a bit of extra run time. It solves the problem I just mentioned in the previous paragraph. What I do is run the crib down the decryption and in each spot count the number of letters that are in the same place in both crib and decrypt. In the example above hisbeard and hixbeaqd have six letters in common. I then take the highest-scoring instance for the length of a decryption, 6 in this example. I typically take that number, subtract 3 (assuming it is at least 3), and square the result, then add that to my score. In this example it would add 9 points (6-3 squared) to the score, the equivalent of a high-scoring tetragram. I use this method mostly on cipher types that have longer cribs. It has a good ability to hold hillclimbers close when they get close. It works well with a wide variety of cipher types, but not as well on transposition types or combination tramp/sub types like Bazeries or Myszkowskis. Those types may have the crib letters in close proximity to each other, but not in the right order, or with an extra letter or two between. I’ve considered writing something that will give extra points for those situations, but I haven’t been industrious enough to do that yet.

Restrictive coding: This term refers to the use of information from the crib to restrict the search space or execution time on a solving program. It can take many forms. For example, if you know the crib and its placement, you can write code into your solver that ignores trial decryptions that do not have that crib in that location, thus saving the time of scoring them and the problem of high-scoring false solutions crowding out the correct solution. I have a polybius square program I use to produce possible keys for many cipher types like Bifids, Two-squares, Playfairs, etc. I have a section in the source code where I program in the various letter relationships that I learned from the crib placement, such as requiring specific letters to be in the same row or column, etc. Thus it can be used to search for keys, not just on trial decryptions. The obvious disadvantage is that it requires programming for each individual cipher that is attacked, which can be fun but also is subject to the usual frustrations of bugs in the code and the time it takes to get to a solution. It helps if you can restrict the area of the code that is modified to a single compact module so that you don’t have to find (and later find to undo) all the scattered modifications.