Tridigital cipher solving algorithm continued

In my last post, I mentioned key pruning. What I mean by that is halting the recursion (and allowing it to go back a level and continue with another word) based the nature of the keyblock. In particular, the block always has three rows. This means any given digit may have at most three letter equivalents. Thus I put a testing routine in the recursion routine to look for conflicts of this sort. If the word being tested passes the first conflict test, e.g. THEIRSOME in the last post, it then goes through a routine to count how many letters are represented by each digit. THEIRSOME is represented by 107735607. The 7 enciphers E and I while the 0 enciphers H and M. Since there are no cases where a digit represents four or more letters, this is a possible solution so far. This is not surprising since we’re only on level 2. If one of them represented four different letters, the recursion ends at that level, the test word SOME would be rejected, and the next word in the four-letter word list would be tried.

This example is so short that there are thousands of valid solutions (which is why Tridigital is a hobby cipher, not a real-life one). One my program found was “had some by their,” which even makes sense as a phrase. But with more words, especially longer ones, it will soon be the case for incorrect combinations that one or more digits represent more than three letters. This typically happens on the level three or four recursion.

The other technique I use is scoring. Since all solutions that pass through the conflict testing consist of valid words, the usual scoring techniques such as tetragram frequency or word list scoring don’t work. So I reorder each potential solution array back to its original order and test pairs of words for frequency.  I use Google N-gram data to determine this. The better the score, the higher in the display the solution is placed. Although this doesn’t speed up processing per se, it makes it much easier for me to spot a correct solution, or at least a likely correct segment in the solution early on. I don’t have to continuously scroll through dozens or even thousands of possible solutions as the program runs. The best ones are right up at the top.

As an example, the demo problem produces these possible solutions immediately:
had some by their
man some by their
arm some by their
had some up their
may some up their
man some up their

The most natural-sounding one is on top because the three two-word combinations (had some, some by, by their) score the highest in frequency. One of the other solutions may actually have been the first one found, but the display is in order of frequency score. Of course, we know none of these are the correct solution, but they are all valid.

This in turn allows another shortcut. If I spot a solution, or even a partial solution, I can abandon the program and use the digit-letter equivalencies revealed to run through my word lists in a separate program to identify any key words that can produce a keyblock with the same letters in the same columns, such as E and I in one column, H and M in another. Typically only about four or five words have to be solved before I have enough digits to find the original keyword and hat. Once I have those, I can solve any incorrectly solved or not yet solved parts by hand.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.