Monthly Archives: December 2020

The Running Key Cipher

The Running Key Cipher is very simple in concept, but very difficult to decipher. It is not a genuine cipher since it cannot be deciphered uniquely. Rather, it should be thought of as a puzzle. However, it has a real cryptographic use, because the ability to decipher a Running Key cipher is what allowed American counter-intelligence people to decrypt Soviet messages during the Cold War (see the Venona Project).

To create a Running Key Cipher, take your message (plaintext) and break it in half. Add an extra letter if necessary if there’s an odd number of letters. Use the first half as a key to encipher the second half in a Vigenere cipher. The resulting ciphertext will always be half the length of the original plaintext. Here’s an example:

THISORGANIZATIONANDITSSYSTEMSPLI
TTHEAMERICANGRAPHOLOGYWORLDINTWO
MAPWODKRVKZNZZOCHBOWZQOMJEHUFIHW

The top row is the key (and first half of the plaintext) and the second row the rest of the plaintext. The third row is the ciphertext. Now try to decipher that ciphertext without knowing any of the text. I’ve written a program that uses brute force to decipher all possible combinations of key and plaintext for the first eight letters (MAPWODKR) to see what makes sense. Here are some of the pairs of valid decryptions that resulted.

THESAMER TTLEORGA, HANDAMON FACTORWE, THESOLID TTLEASCO, THISISCO TTHEGLID.

All of these pairs, and many others, are valid looking possible sentence beginnings and midsections. It takes skill and judgment to pick out the correct decryption. Note how any given letter or word, even if valid, can be either part of the key or the other half. This is why Running Key cipher problems in the American Cryptogram Association are usually presented only with generous cribs. If you want to try your hand at a few to hone your skills or just pass the time, I’ve provided a few for you below. They can be fun diversions.

ULLWLPXCZQWLQHXVHIGSZBEQUPD (crib: succeeding)

FWBTAFOIKDMCHBS (crib: swill)

FWYJIFWDECKGIXMVBXNOSAMS (crib: theproof)

Afterland by Lauren Beukes

AfterlandAfterland by Lauren Beukes
My rating: 3 of 5 stars

Nicole (Cole) and her 13-year-old son Miles are South Africans stranded in the USA due to an apocalyptic pandemic three years earlier. The disease killed off almost all the males, so Miles, apparently immune, is a rare and precious commodity under government protection. All Cole wants is to get back home. They break out and begin a journey to the East Coast with Miles disguised as a girl. Meanwhile, her devious sister Billie wants to sell Miles or at least his sperm, for the big bucks. She teams up with some thugettes to chase them cross-country.

The author’s style is charitably described as irreverent, but more accurately as in-your-face. She writes almost as though challenging the reader, like we are unwelcome eavesdroppers. The 90-page plot is stretched into 400 pages of irrelevant anecdotal digressions suspiciously resembling filler. The dialogue is peppered with gratuitous cursing, always a sign of a lazy author. The plot is wholly illogical but the book is all about style, not substance.  Some will find the author’s prose amusingly sardonic. Me, not so much, but she is at least imaginative. The basic concept is a new twist on the post-apocalyptic genre.

View all my reviews

Review of YouTube TV

We recently cut the cord with our cable provider (AT&T U-verse) and signed up with YouTube TV (YTTV) – one of Google’s streaming TV options. YouTube TV costs about half of what U-Verse does in our area. That’s its biggest advantage. So what are the pros and cons?

Pros:

Without question, the biggest advantage is cost. If you already have wi-fi, there is only the subscription cost, which may be half of the cable or satellite subscription cost. It has all the major broadcast network channels – Fox, ABC, CBS, ABC, PBS – and all the major cable ones, too – ESPN, CNN, Hallmark etc. The lineup of channels can change, so check what’s currently being offered if you have a favorite cable channel.

The main screen shows three options: Home, Library, and Live in the center top. You will start with the Home option highlighted and you move to your choices with your remote. In the far upper right is an hourglass to search. The search function works well. I found I rarely need it. Library consists of your recorded shows. Select the Library button on the screen and then down the left side is a menu display. You can choose New in Your Library, scheduled, most watched, “shows”, and sports among the top options. The screen shots to the right of each option show you the shows available there. Once you select one, say New In your Library you move to the right, look at all recordings ordered from most recent on the left and choose one. It will show you screen shots of all the recordings (up to thirty days worth, I think) of that program. Recordings you’ve already watched are labeled as such. All are labeled with recorded time (e.g. “6 hours ago”). I find this a bit different, but no more cumbersome than choosing on U-verse. If the show is still being recorded, you are given the option of starting from the beginning or joining live. If you’ve paused a show or turned off the TV and come back to it, when you select it again, the icon on the selection screen will have a red line at the bottom to show how far in you’ve already watched, and it will start in the same place you left off.

It has a nice feature for sports coverage. For NFL and college football games it gives you the option of watching key plays either as a quick highlights reel after the game is over, or as we sometimes use it, to catch up to live action. For example, we usually record a game and only start watching it a half hour or hour after it has begun. That’s usually so we can fast forward through the time outs, ads, etc. If a game gets too one-sided or dull for any reason, or if we just don’t have time to watch an entire game, we can select the watch key plays to catch up to live time and then watch the end of the game in live time. I don’t know if it has this feature for other sports. Another really nice sports feature is that you can just name a team you follow and it will record all broadcasts, regardless of channel. You no longer have to look up the schedules and channels. This is true for all pro sports, I think, and for Division I college football at least. I don’t know if this feature is available for non-sports, like following a favorite actor or singer.

For the Live option, the menu set up is similar, with a list of channels on the left and on each line a display of screen shots as to what’s playing now and in the next few time slots. It’s pretty straightforward, although a bit more cumbersome to choose than on cable where you just press channel up/down or enter the channel number on the remote.

The picture quality is generally excellent. There are occasional pixellated strips that go across the screen fleetingly. This used to happen with cable and with both our current and previous TV, so I suspect the problem is with the network feed or our wi-fi provider (U-verse), not YouTube TV.

Cons:

Probably the biggest negative is that you may need a new TV. Our old Samsung smart TV was one generation too early and couldn’t run it. We bought a new Samsung television just to be able to make the switch. We figure we’ll earn the cost back through savings in less than a year and have a bigger, newer television to boot. Be sure to check with YTTV’s website for the model you want to use. Another irritation is that there is a lot more buffering of the signal than on cable. I don’t think it’s a router speed problem because U-verse used the same router, although I suppose AT&T may give preferential bandwidth to their own service over Google’s. More likely, the problem is on Google’s end. I find the picture freezes while buffering sometimes, too, and won’t unfreeze unless I press rewind or fast forward. This can be irritating and interrupts the flow of a program. It seems to happen more with recorded live shows, especially ones where we’re catching up to live time.

The most noticeable shortcoming for me is the way it fast forwards or reverses. Unlike cable where you can see a more or less continuous stream of speeded up video as you scroll forward or back, with YTTV you get only still screen shots every fifteen seconds of recording. At times you only get a black screen, so you can’t tell whether you’ve reached the point you are seeking. On the plus side, though, the timing is precise per click so that I’ve learned how many clicks it takes to get to the right point for most ad breaks and between football plays. It’s often faster than cable scrolling for some shows, but overall it’s easier scrolling on cable.

Accessing closed captioning is slower and clunkier than with cable. On my old cable remote a single button would turn closed captions on or off. With YTTV you have to push buttons at least four times, assuming you remember the correct sequence. The exact sequence depends on the app, i.e. YTTV, Netflix, Prime, or your television manufacturer or sometimes whether the show is dubbed or pre-captioned by the producer. It’s slower starting up YTTV, at least on the model of TV I have. Samsung does not ship with the YTTV app installed, at least my model didn’t have it. I had to go to the app store (it’s an Android based system) and download the free app. Samsung installed it along with a whole row of other pre-loaded apps, e.g. Netflix, Prime, Hulu, etc. along the bottom of the home screen. The only problem is that it’s not visible unless you scroll all the way to the right. Samsung does not allow you to delete any of those other apps, even though you don’t use or want them, nor can you shift positions of the icons around. So every time I turn on the TV I have to wait several seconds for the boot up process, then scroll all the way to the right, past a dozen or so icons, until I get to YouTube TV at the far right end, then select it. This is not exactly Google’s fault, although I’d bet that if they paid Samsung what Netflix and Hulu do, they could get a better spot on the home screen and have it pre-installed.

Another feature some people might or might not like is that YTTV features alternate services to the one you prefer. For example, we normally watch NBC Nightly News through our local NBC station. We have that on regular record and YTTV does record it just fine. But when you go to recorded shows and select that choice, it displays first NBC News Now, which is a streaming service, not the regular over-the-air broadcast signal. That is a prepackaged set of stories recorded in an earlier edition of the news using only the national feed. I can still choose the local station one icon off to the right, but it seems to me that should be the first choice. The advantage of the local one is that it’s more current by an hour or three since I’m on the west coast and the News Now service is mostly taken from the east coast version of the news. Also, if there is a breaking local story important enough, the local station version will have broken into the national feed only on the local station feed, not the pre-recorded News Now. Similarly the PBS station feed on YTTV is from a station hundreds of miles away instead of the local one. However, I think the News Now or similar news streaming services have fewer ads, although I can’t guarantee that and you may or may not be able to fast forward through them. Some ad breaks even on Amazon Prime shows are now non-skippable, but it may happen on YTTV network shows.

In short, navigation in general is more cumbersome than with cable and you are dependent on being able to retrieve streaming content quickly from the cloud instead of your local DVR. None of these drawbacks has made me regret the choice to switch, but there is a learning curve.

 

Vaccine interest across the U.S.

If you have been following the Covid-19 news, you know that the Pfizer vaccine has been approved by the FDA and is being administered in the U.S. right now. You probably also know that the Moderna vaccine has been submitted to the FDA and is expected to be approved within the week. The vaccines are similar according to reports, but the big difference is that the Pfizer one requires super-cold freezers to store the vaccine and dry ice packing to preserve it during shipping or for storage. The Moderna one can use conventional freezers and trucks.

This led me to take a look at which of these vaccines is of most interest to the public. Here are two charts taken from Google Trends. The top one shows the searches between the two company names over the last year. The bottom one shows where people are searching for dry ice over the last thirty days.

Bear in mind that the top one takes in a long period when it was uncertain which company’s vaccine would become the first to market. Once Pfizer was approved first, both in the U.K. and the U.S., searches for Pfizer greatly outnumbered Moderna, so the top one would be all blue for almost any shorter time frame. It will be interesting to see if that changes in a week or two. I find it interesting that for the most part it was the more populated states that had more interest in Moderna. I have no explanation for that, nor for the fact Wyoming joins them.

The bottom chart makes more sense to me. The dark blue states are where the interest was greatest on a relative basis. The greatest interest seems to be in more rural states with low population density. I expect those states have few of the special freezers needed for the Pfizer vaccine and have greater need for dry ice to store and transport it. Hospitals and nursing homes in those states are probably scrambling to see where they can obtain sufficient dry ice. This trend, too, could change once the Moderna vaccine is approved and begins shipping.

Grace Is Gone by Emily Elgar

Grace Is GoneGrace Is Gone by Emily Elgar
My rating: 3 of 5 stars

The setting is a small Cornwall village. Cara goes to her friend Grace’s house to find the severely disabled teenage girl missing and her mother killed. The chief suspect is Grace’s father, Simon, who is mentally unstable and estranged from the family. Cara is determined to find Grace. She is aided by Jon, a journalist who took Simon’s side in an article years earlier about the family tragedy and has suffered the calumny of the town and the press for it. Jon, tritely, is also having marital troubles and neglects his parental duties as he delves deeper into the case.

It seemed like a good setup, but I can barely squeeze out three stars for it. None of the characters are likeable and the writing is pedestrian at best. The police seem to be doing almost nothing while Jon and Cara more or less stumble about and somehow figure out what’s going on but without any real sleuthing. Grace’s diary plays a big role, yet the police totally miss it in their crime scene search. Entries from it appear normally at first, i.e., one of the characters reads the page and they are printed so we can see them, but later entries from the diary just appear amid chapters without explanation and apparently without the characters becoming aware of the content. I found this clumsy and confusing.

There is a “big reveal” about two-thirds of the way through the book, but the author telegraphed it so heavily beforehand that it would be hard to be surprised. I was planning a two-star review until the very end when the author partially redeemed herself. She added a twist that made the story both more credible and  somewhat more nuanced.

View all my reviews

The Whitest and Blackest Names in America

I was looking at United States Census data again recently and noticed some interesting ways of examining or analyzing it. You can download this data yourself direct from the Census Bureau here: Surnames occurring 100 or more times in the 2010 Census. The 2020 data won’t be out for quite some time. I sorted it by race (self-identified) and here are the surnames that had the highest percentage white and at least 10,000 individuals. All were 96.99% white or higher.
STOLTZFUS
BYLER
HERSHBERGER
TROYER
YODER
BURKHOLDER
HOSTETLER
GRABER
MAST
ROUSH

The dominance of German- and Jewish-sounding names on this list continued for several hundred more entries. Now for the blackest names by percentage, again with a cutoff of at least 10,000 individuals:
WASHINGTON
BANKS
JOSEPH
JACKSON
WILLIAMS
ROBINSON
COLEMAN
HARRIS
SIMS
DIXON

These percentages range roughly from 39% to 87% black. Presumably both these lists are formed largely by the history of slavery, with colonial slaveholders giving their slaves their own surnames, or possibly in some case, freed slaves taking the surname of a well-known white person. If I restricted the list only to the 1,000 most common names, some Irish names like O’Connell appeared in the whitest list, but most were still Germanic. Germans and European Jews tended to arrive in the U.S. in large numbers only after slavery ended and generally settled in free states or territories.

You can also use the data to find out how common your surname is and that of your mother’s and grandmother’s maiden names.

Parsing plaintext concluded

One more post on the problem of dividing plaintext and then I’ll leave the topic. I decided to try two more ways to divide up text into words. The first method was a total failure: hillclimbing. That consisted of randomly choosing dividing points and then testing to see how many valid words there were between the spaces, followed by a series of trying one or two random changes, checking to see if more words were produced, and either keeping the new spots or going back to the previous set. I won’t discuss the details, but take my word for it: it bombed.

The other method is to start at the beginning and reduce the string down until you have a word left at the beginning. For example, if the text you are parsing is “mydogatemylunch’, the program first checks the whole string to see if it’s a word. Since it isn’t, it crops the last letter, tests again, and so on until it has left only “my” which is a valid word. It saves that, then it starts with the next letter, d in this case, and does the same thing until all the letters are used or, if no word is found, the letter is saved as a “word”, but skipped over.

Simply put, the method I described in the previous two posts is to start with valid words from one of more lists and checking to see if they are in the subject text. This new method is to take sections of the subject text and see if they are valid words. Neither method is perfect. After testing numerous trial texts, it is clear to me that the previous version (Method A) is better than this new one (B). There are some texts where B performs better, some where they’re equally good or bad, but most cases have A outperforming B. Here are some examples.

Both A and B got this perfect: oneostricheggwillfeedtwentyfourpeopleforbreakfastthejoyofcooking

A got this one perfect: slowandsteadywinstherace. B’s result: slow ands tea d y wins the race. (“ands” is a valid word as in “no ifs, ands, or buts”).

Both got this wrong, but differently: wedrinkallwecantherestwesell
A: we drink all we c anther est we sell. B: we drink all we cant heres t we sell

Lastly, one where B outperformed A:  asinthesongfreebirdcouldyou
A: a sin the …   B: As in the …

This exercise has given me a new appreciation for those pros who write autocorrect software. Of course they use AI and have massive data troves to mine, while I used just a few dozen test sentences. One good thing about trying this new method is that I learned how to determine whether a string is a valid word much more quickly than before. In the past I was just taking a file of words and sequentially checking to see if each matched my test string. That’s reasonably fast if the word is early in the list, but not otherwise. I was using lists ordered by frequency so that the most-used words would be found fast, but it still involved a lot of unnecessary test matches. For this new method I discovered a search method that is probably old hat to programmers, but new to me. Basically you start in the middle of an alphabetized word list, compare strings, and if the test string is less than the list word, you do the same with the first half of the list, otherwise with the second half, and continue to cut the search space in half, and repeat until you match or can’t shrink any farther.
 

Parsing undivided plaintext

In my last post I gave a few examples of my attempts at writing a program to divide up undivided text into words. Since then, I’ve been working on the program. It’s doing better. Here are examples from that post and how the program divides them now.

what is christmas it is tenderness for the past courage for the present hope for the future.

they were indeed a queer looking party that assembled on the bank the birds with draggle d feathers the animals with their fur clinging close to them and all dripping wet cross and uncomfortable

Not perfect, but much better. So how did I approach this task? I’m not going to provide code, just discuss my thinking process. I decided to start with long words first since it is relatively rare for long words to accidentally appear, that is by juxtaposition with smaller words. The opposite is clearly not true. Small words appear in longer words all the time. Separating out small words like to and in first would break up almost every sentence incorrectly.

I used a word list to go through the text and inserted spaces before and after every found word starting with length 24. Whenever it found a word, it would effectively blank out that stretch so it could not be used in searching for words farther down the list. I also used word lists that were ordered by frequency so that it found the words most likely to appear before obscure words took up that space. In my first iteration, that’s about all I did and the examples I gave in my last post show the limitations of such an approach.

My next step was to identify common errors that this approach produced, such as “I twill” instead of “it will.” Twill is a five-letter word, so it’s found before the more common word will, and that leaves only the letter i. That breaking looks fine to the program since I and twill are both valid words. I created a list of such examples, mostly involving very small words such as “ha snot= has not”, “o four = of our” and so forth. The program checks that list after its initial parsing and fixes anything occurring there. Creating that list was a time-consuming process and is still ongoing. The only way to do it is to test many examples using the program and judge by eye. It’s not always easy. For example, is “a man” better than “am an”? Should it be changed or not? I use Google Ngram as guidance for such hard cases. I call this my “fixit” list.

This improved things, but many errors still appeared. Most common were those where the “S form” of a word (i.e. the plural of a noun or third person singular of a verb)  was found when the correct form is without the final S. I wrote a routine to find and try to correct such case. One class that was easy to find was words with a final S. I went through the separated text pair by pair. Whenever word 1 ended with an S, I tested the frequency of that word followed by word 2, then removed the S from the first word and tacked it onto word 2 and tested the frequency of that pair. Which ever scored best, I saved. The data for such word pair frequencies can again be obtained from Google Ngram. This doesn’t always work. For example “westernsquare” initially breaks into “westerns qua re,” all valid words. Swapping the S to the qua yields “western squa” not square, so the parsing does not change. “Collisionsport” breaks up as “collisions port” without this step, but the program successfully changes it to “collision sport.”

Lastly, I did the same thing for every low-scoring pair, but with an additional test. I set an arbitrary limit, X, and tested every successive pair. If the frequency was below X, I tried shifting the last letter of the first word to the beginning of word 2 to compare, and I also shifted the first letter of word 2 to the end of word 1 and whichever of the original and two variations scored highest, I kept. The improvements have been subtle, but real. The downside is that it slows down parsing tremendously. Without this final word pair improvement step, the parsing is essentially instantaneous.  With it, a sentence often takes ten or fifteen seconds. I will continue to fiddle with the value of X. The higher I put it, the more word pairs get tested and the longer the program takes. I have to weigh speed versus accuracy.

I’ll continue to add common errors to the fixit list, and to add missing words to my lists, including proper nouns, but longer lists add time. I also found it necessary to remove some words, like “doth.” It’s a valid word, but rarely used today and it causes parsing errors with “do -the -they -those,” etc.