I've been sitting on this one for a while, but +Karl Steinke was pondering which length of word filled the largest percentage of its possibility space. That is, of all possible 3 (or 4 or whatever) letter combinations, what
length of combination has the largest percentage of real words? We decided it had to be short words, probably between 3 and 5, but my money is on 3, which is convenient because that's also the largest number of letters that's easy to graph (since you can put it in 3 dimensions) and the shortest length of crossword clue. In this case, I'm using the looser definition of "word" that encompasses any 3 letter combination that shows up in one of my puzzles.
For 3 letter combinations, there are 26^3 (17576) possible values, running from AAA to ZZZ. The official Scrabble word list has 1014 words (5.77%), and my crossword word list has 3070 words (17.5%). Check out the full graph and some highlights below the fold.
This is a Heat Map generated from the 3070 word set. Along the Y-axis is the first letter, along the X-axis is the second letter, and the size of the circle at the intersection is the number that share those first and second (a value between 0 and 26). So there are 8 letters that start with A-A, (AAA, AAE, AAH, AAM, AAP, AAR, AAS, AAU, yeah, I know, I'm not sure those are words either), and no words start with Y-B.
While no set of first and second letters has all 26 third letters, there are four combinations that allow 23, L-A, M-A, T-A, and N-E. Here's that distribution in handy graph form:
No comments:
Post a Comment