My Wordle solver, Worbot, has the following statistical accomplishments:
It has missed the word entirely only twice. Ignore the current streak 0 because my program doesn’t actually track that. If you average it out, it comes to about 3.9. It plays entirely in hard mode. Here’s how I created it.
First, I scraped the list of target words from the original Wordle website (before the Times bought it). By target words, I mean words that can actually appear as the winning word. There are many valid words that are accepted as guesses, but never used as the target. I computed the frequency of each letter in each of the five positions in the target words. In case you want to know what they are, here are the top few.
- SAAEE (most frequent in each position)
- COINY (2nd most frequent)
- BROST (3rd)
Then I ran the target list and selected 64 words that all scored high in the frequency of each letter, e.g. CRONE, SAINT. I already had data from Google Nwords about the frequency of various words in English. I have word lists, including a 5-letter list, ordering the words by frequency. The top scorers are WHICH, THERE, THEIR, and WOULD. I wanted these because I knew that Wordle used mostly common words as target words. That’s the data I needed.
Now, to the logic of the program. For the first guess, it randomly selects one of those 64 words. It receives the usual feedback of gray, yellow and green. After that the program ignores the target list and refers only to my list of 5-letter words in frequency order. That list includes many words that are not possible target words, so the program doesn’t “cheat.” It simply tests each word for conflicts and if it hits one, moves on to the next word. At each level, it uses the colors from all the previous guess results to cull out bad words.
For example, suppose the correct word is LINEN and Worbot guesses SAINT for its first guess. The I and N would be yellow, the rest gray. For the second guess, it would start with WHICH, but reject it because it doesn’t have an N. THERE would be skipped because it has a T and doesn’t have an N, and so forth. The first word it would come to with an I and an N is AGAIN, but that would be rejected because of the A, which was gray earlier. The first word that has no conflicts is GIVEN, the 65th word on the frequency list. This would produce green for the I, E, and N. And so on. If you are writing a solver, or just want to improve your guessing, I suggest you consider using this simple logic. BTW – Worbot has never gotten the word on its first try, and neither have I.