Monthly Archives: April 2022

Origin: A Genetic History of the Americas by Jennifer Raff

Origin: A Genetic History of the AmericasOrigin: A Genetic History of the Americas by Jennifer Raff
My rating: 4 of 5 stars

Raff sets forth alternative theories about the origins of Native American populations and the evidence supporting or weakening them. She is a geneticist, not an archaeologist, and focuses on the genetics but there is a lot of archaeology, too. I enjoyed the detailed descriptions of her lab work. Recovering, replicating, and analyzing ancient DNA is a much more daunting and labor-intensive task than I had imagined. Dealing with modern DNA samples is less so, but still an impressive endeavor. The book is aimed at the scientifically-inclined and educated lay reader. Be prepared for a great deal of technical and geographic terms including ethnic ones that look odd and unpronounceable to most Americans. I also learned a lot about the various discoveries in the field and I’m encouraged that many of the disputes between scientific factions will be mostly resolved in the near future.

The same cannot be said about the various indigenous peoples’ stories. The author bends over way too far, and spends way too much time, telling us all to respect these various traditions and myths (which she calls origin stories). She undercuts her scientific credibility in doing so. For example, in Chapter 5 she describes the elaborate procedures used to garb up and sanitize the workbench which she calls a “specific mindfulness” that “acknowledges responsibility for past transgressions and unscrupulous methodologies.” No. Sorry. The gowns and bleach and controlled airflow are to prevent contamination of the DNA, not to admit to being a racist. The sins of the father are not visited on the son and all that. The duty of a scientist is doing science, not baby-sitting the fictions of less educated people. Perhaps that sort of “woke” mindset, or pretense of one, is a mandatory prerequisite to working in the field, since cooperation from various tribes and academia in general is necessary, so it’s forgivable, but I notice that other reviewers had the same reaction I did.

One last peccadillo is worth mentioning: it has not been carefully proofread. I noted several errors like “adler” trees and doubled words. Even so, it is generally well-written and educational. I found it interesting.

View all my reviews

Pervasive illiteracy

Every day I see misspellings, wrong word, and grammar errors. My wife usually points out two or three amusing ones each day, too. They’re everywhere. Today I heard or read four within a span of a half hour so I thought I would share my despair.

  1. A real estate mailer advertised its resedential properties (a home for sedentary people?)
  2. A news report said a court printer had mixmatched names and juror numbers (almost makes sense)
  3. A radio pundit declared that he was reading between the tea leaves (a mixmatched metaphor?)
  4. A published hard copy science book I’m reading mentioned an ancient pine and adler forest (discovered by the Austrian psychotherapist?)

Sigh.

Electrify America

I have an electric vehicle (EV), my second, in fact. My first, a 2011 Nissan Leaf, could charge from several format chargers and different levels: level1 (120V house current, normal plug), Level 2 (240V J1772 plug), and Level 3 (480V CHAdeMO plug). Level 1 and 2 are alternating current (AC) while level 3 is the much faster direct current (DC). Its theoretical fastest rate of charge is 46 Kw, but I never got it to charge that fast. The fastest I ever did get was about 19kw. I sold the Leaf. Now I have a Volvo XC40 Recharge. It has a much bigger battery and longer range. It also has the ability to charge much faster. This is important because I am about to take a long road trip.

Today I tried charging at an Electrify America (EA) charger.  It went well. EA is the network of EV chargers set up by Volkswagen. That company is trying to rectify its major booboo that got it in trouble with Uncle Sam. You may recall a few years back it was discovered that VW had installed a software feature that allowed it to cheat on its mileage numbers on US test machines. As part of the settlement of charges, VW agreed to lower its average fleet MPG in all US cars by a large amount. One of the main ways to do that was to increase the number of EV models it sold. And that in turn led to it ramping up efforts to spread EV charging stations in necessary locations to encourage EV sales.

I don’t have a VW, but I appreciate the effort. It turns out that EA chargers not only charge at Level 3, but their chargers are typically 150kw and even 350kw units. My Volvo can only charge at the 150kw rate, and that’s what I did today. That still turns out to be more than six times the rate of charging that I typically did with my Leaf. I was impressed.

The charging station was at a large shopping center anchored by a Walmart. There were six stations and all six were full when I got there. Within five minutes a car pulled out and I was able to pull in to charge. I went from 56% full to 90% full in a half hour. I probably could have gone from 20% to 80% in about that same time because it charges a lot faster the lower the beginning state of charge and slows down a lot as the battery gets close to fully charged.

I had accounts and cards for other EV charging systems (ChargePoint, Blink, EvGo) and at first I was reluctant to get yet one more. But I’ve come to learn that these other stations aren’t always working, often have only one or two chargers, and don’t give you advance knowledge as to whether it is working or being used. They also typically have a maximum rate of 50kw. The EA installation was modern and clean looking and all six units were working. There are also more of them along I-5, the route I’ll be taking on my trip next week. So I’m glad I chose to join EA and I applaud VW.

I also got to see a bunch of other EVs, some for the first time: two different KIA models, a Porsche Taycan, a Tesla (not using a Tesla charger!), a Rivian, and a VW id.4). Cool!

The Replacement Wife by Darby Kane

The Replacement WifeThe Replacement Wife by Darby Kane
My rating: 3 of 5 stars

Elisa is suffering from the trauma of a workplace shooting a few months ago. She suffers from dizziness and nausea. The author does everything he can to make us consider her the now-trendy-in-novels “unreliable narrator.” She is convinced that her brother-in-law Josh killed his first wife and probably his second, Abby, who is now missing. Josh insists Abby left him. Her husband, Harris, takes Josh’s side and tells Elisa she needs therapy. Then we find out there was yet another wife for Josh before the “first” one. That’s the set-up.

Is she paranoid? I won’t spoil it. Elisa does some illogical things and clearly isn’t thinking straight, but she is frustrated at Harris’s disloyalty to her and is convinced she being gaslighted by Josh. Oh, yes, gaslighting is another trendy shtick in thrillers, especially in these dime-a-dozen unreliable narrator books. The author uses a lot of tricks to stretch things out, like Elisa getting interrupted every time she’s on the brink of something – reading a key text or telling Harris about a key incident. The reader is kept waiting so often you’ll be tempted to just skip to the last chapter to see how it all comes out.

The writing is fluid enough, even if it uses a lot of cheap tricks, and the plot no worse than others that came before it and which it shamelessly copies. It passed the time well enough that I can squeeze out three stars, although I can’t really recommend it.

View all my reviews

Varsity Blues weirdness – Amin C. Khoury

The last two parents charged in the Varsity Blues college admissions scandal recently agreed to plead guilty, or so press reports say. Gregory and Amy Colburn of Palo Alto face prison time and are scheduled to be sentenced today. I say they are the last of the parents’ cases to be wrapped up because that’s what the press reports say and the DOJ website on the case shows no others. But that’s not true. What about Amin Khoury?

Khoury was indicted in September 2020 for allegedly paying a bribe to get his daughter into Georgetown as a tennis recruit. He was one of many in a similar position in the large-scale scandal. His name appeared regularly. As of last December he was mentioned in news reports as one of the last parents to still be fighting the charges. Then all mention of him disappeared. The DOJ website no longer lists him among the defendants. The only other parent who didn’t get convicted either through a plea deal or being found guilty at trial was a Miami businessman named Zangrillo who was pardoned by President Trump on his last day in office. Another bribe maybe? But Zangrillo stills appears on the DOJ site as having been charged and then pardoned. So why is Khoury no longer listed there? I’m certain he was earlier. This is still weird. If the charges were dropped, that should still be shown on the DOJ site.

I checked for Khoury’s Wikipedia page but only found mention of his father, Amin J. Khoury, reportedly a billionaire. It is the son Amin C. Khoury who was indicted. There’s no mention of the case on the father’s page.  He doesn’t have a Wikipedia page, but he does have his own blogspot page. I found this:

 

Amin C. Khoury

Listed links:

Website: http://aminckhoury.blogspot.com/
LinkedIn: https://www.linkedin.com/in/amin-c-khoury-223b3b10a
Twitter: https://twitter.com/AminCKhoury1
Others: https://www.facebook.com/Amin-C-Khoury-140545602972392/timeline

He was suspended from Twitter and his blogspot posts have also been blocked for inappropriate content. The other pages don’t exist now.

I contacted the public information office of the U.S. Attorney in Boston but got no response. However, I was able to find on the court’s public docket that Khoury is scheduled to appear for a pretrial conference today in the courtroom of Judge Patti B. Saris. Obviously he’s still an active case and he’s still resisting the charges. Why is the press overlooking him? Why is DOJ so close-mouthed about his case? If he goes to trial, I suspect it will be an interesting case to follow. All the coaches have either pled guilty, agreed to do so, or been convicted. Of course the big fish, Rick Singer, the mastermind turned informant, has yet to face final sentencing.

Bonecrack by Dick Francis

BonecrackBonecrack by Dick Francis
My rating: 4 of 5 stars

My wife likes what she refers to as “factory movies.” We used to see those in school as kids. The term as she means it is a general one that applies to any movie or TV show that explains how things are made, or even more generally, takes an inside look at a field of endeavor. That’s the main appeal of this book for me. All the inside scoop on thoroughbred horse training and racing was fun to learn. The writing was well done, but the plot really was all too predictable to be ranked high as a mystery or thriller. In fact, there was no mystery at all. Neil, the main character, is beaten by a very bad man who threatens even more harm if his son is not allowed be the lead jockey on the best horse in England. The son is an arrogant little snot … at first. Then he and we readers are educated on the proper way to train horses and learn the jockeying ropes from the ground up. You can pretty much guess the rest. The ending was rather neat and lazy, but inevitable. I enjoyed the book, but now that I’ve seen the factory movie, I don’t have any desire to read another Dick Francis book.

View all my reviews

Computer cipher solving – word list scoring part 2

In my last post I discussed how to break up undivided text strings into words in order to be able to score trial decryptions. In that post I mentioned blanking out words that have been found in order not to double count letters. I’ll provide an example to clarify: suppose your trial decryption is “HBSTUMBLEDTXERE.” Using the length-based approach I recommended, your algorithm would first find the word STUMBLED when it gets to length 8. What you should do is then convert the test decryption to HB——–TXERE for the next loop. The final score would be 8, or,  if you have ERE in your word list, 11. If you leave the trial decryption in the original form, the algorithm would find TUMBLED, STUMBLE, TUMBLE, BLED, and LED as it tested shorter word lengths. The same letters would be counted multiple times and artificially increase the score. What you’re really looking for is the percentage of the decryption that forms words.

I suggested the score should be the number of letters that are found to be in words and that’s a simple way that works. But I believe a test decryption with longer words in it is more likely to be on the right track, and the score should reflect that. So you may wish to give extra points for longer words. Perhaps add L*(L-1) or L*(L-2) where L is the length of the word found would work better. You may want to deduct unmatched letters. But you have balance this against the situation where your bonus rewards inaccurate parsing too much. For example HULALIGHT would parse as HUL ALIGHT. HOLYLIGHT would split after the Y. The second form is more likely right. With simple letter count, the second one scores higher. But what if there’s a bonus for length? HUL ALIGHT might score higher depending on how big a bonus.

Now compare HULALIGHT to HALFLIGHT and to HOLYLIGHT. Without a length bonus these score the same. How do we fix that so the best one scores highest?  It’s possible to try alternate ways of parsing the word divisions and then scoring higher for the result that is most common, as I’ll explain below. I’ve found this to be much too time-consuming to be practical in hill-climbing. Another way I do use for cipher types with word divisions such as Aristocrats, Key Phrase, and Tridigital, is to score word pairs together based on the frequency of the pairing. I’ve downloaded 2-grams from the Google N-gram site and processed the files to delete irrelevant data, adjust the frequency numbers to more manageable size and spread, and alphabetize them. I use these files to score the word pairs. You might be tempted just to use the frequency of the individual words rather than the pairs, as it would be faster to look up. That might work with the above example since HALF is more common than HULA or Holy. So is HALF LIGHT the most common combo. But take TWENTY T??. The most frequent combo is TWENTY TWO but TWENTY THE would score higher based on individual word frequency. Tetragram scoring can take care of some of this and for simple substitution types cost in time is rarely worth two-word scoring. But for some very tough ones, like very short Headline puzzle headlines, it’s the only way to crack them, and in a Key Phrase I’ve found it essential.

I mentioned above that trying alternate means of parsing is not practical for hillclimbing. But another application of this technique is not used for solving at all. When I solve a cipher and the result is without word divisions, I like to submit the result in a more readable form for the Solutions Editor. That means I insert spaces at the word boundaries. My program to do this uses four steps. The first step consists of dividing up the text using the technique I described in my first post, but either method will work for step 1. The second step is to score the word pairs from the N-gram data as described in the last paragraph. These two steps are virtually instantaneous. Then the program goes back and checks for common parsing errors I’ve compiled in a list, such as changing “i snow” to “is now.” This is still instantaneous. The final step is to go back through and try moving one and then two letters to the left and the right at every space and testing the resultant word pairs. For example, “butt he” would be tested against “but the” and “bu tthe.” “But the” would outscore the others and be the output form. All this testing and then altering the output is very slow. For one already solved text the forty seconds or so it takes is acceptable but it can’t be used in hillclimbing.

Computer Cipher Solving – word scoring part 1

I’ve described various computer solving techniques in earlier blogs, but I only briefly mentioned  scoring using word lists. For most computer applications I use tetragram scoring because it’s fast and easy to implement. But it’s not always the most accurate way. If your solving program already “knows” the word break points then it’s relatively easy and fast to look up each trial string to see if it’s a word. I’ll get to that method in a minute. But what about the many cases where the trial output is one continuous string without breaks?

There are two approaches to that situation. The first is to begin at the first letter and keep appending letters until the string is no longer a word. For example, if the trial output begins “THEMAN” this program would go through “THEM,” which it recognizes as a word, and assume the break point comes after the M and begin looking for the next word at AN…. Or, if the word THEMA (Latin, but used in English) is in your reference word list, it will break there and start looking for the next word at the N. As you can see, this does not always find words as you would by eye. You naturally think it begins with the two words THE MAN or possibly THE MANY, etc. If it doesn’t find a word starting with the first letter, it moves on and does the same with letter 2 and so forth. The scoring algorithm won’t give an accurate number if it is breaking the text incorrectly. The final score is based on how many total letters are found to be in words.

The second approach is to start with longest words first. Start with the longest word length in your available lists and then shorten the search length incrementally. Let’s say you start with length 24, the maximum word length in my normal lists. You check to see if the first 24-letter string is a valid word. If it is, you blank that out and test the next string beginning at letter 25. If not, you do the same beginning at letter 2, and so on. You probably won’t find any words of that length, but the next cycle, you try 23-letter words, and so on. Whenever you find a word, be sure you eliminate that section of the output from future testing so that it’s not double counted using shorter words found within the longer ones. The final score in this version is also the total number of letters that are in words.

Neither method is perfect. Take a sentence beginning TWOSTRINGS. The first program will find TWOS T RINGS, dropping the T from the count. The second method would get it right. But with  BUTOUR, the second method gets BU TOUR, losing the BU, while the first method gets BUT OUR, the correct parse. I’ve written programs both ways and in my experience this second way is best.

Now for the question of how to determine if a string is a word. Of course it is very simple to write a program that reads from an alphabetical list and when it comes to the point where your test string should appear, determine whether or not it is there. This works reasonably fast on very short lists or if your candidate begins with an A or B. Otherwise, it is quite slow. Another approach is to use a hash function. I’m not going to discuss that since I don’t use one. If you already know what this is and how to use one, you don’t need me, and if you don’t, I can’t tell you. I believe Python and maybe other languages have this built in and it’s fast even for unordered lists but is complicated and has some computing overhead to build a hash table.

But there’s a fast method for Delphi, the language I use, which I believe is available in other languages, too. My FindWord function takes a string S as input and returns a true if it’s a word and false if it is not. It depends on using the AnsiCompareStr function. This function takes two strings as input, and compares them. If the first one is less than (i.e. alphabetically earlier than) the second, it returns a negative number, if it’s greater, it returns a positive number, and if they are the same, a zero. To use this, first you load your reference word list in alphabetical order and count the number of entries; call that X. This only needs to be done once when the program is loaded not each time the function is called. Your search function will begin its first comparison of S with the word in the middle, i.e. at position 1/2 x. If the test word is less than the list word, you next try S with the word at 1/4 x. If the first comparison shows S is greater than the list word then your second test is against the word 3/4 x. In other words, you keep splitting the search domain in half with every search. When you get a match, the function returns a True and stops looking. When you get down to the last word without a match, the function returns a false. It’s written recursively so that it automatically keeps calling itself until one of those two points is reached. For a list of 80,000 words, the maximum number of comparison you’d have to do is 17.

To make it even faster, don’t load all the words into a single list. Use a separate list for each word length. I sometimes use a two-dimensional array ranging from 2 to 24 for the word length, and 1 to 13000 for the word count. Since there are more 8-letter words than any other length and there are just under 13,000 words in that list. I treat one-letter words separately in my function, giving a False for anything besides A or I. This cuts the maximum number of searches to 14 for length 8 and for most strings, much fewer.

This can be refined even more. I’ll discuss that in my next post.