ERA | 1260 |
AREA | 1258 |
ORE | 1019 |
ERE | 991 |
ARIA | 978 |
ERIE | 978 |
ALOE | 969 |
ONE | 967 |
ALE | 895 |
ATE | 877 |
ELSE | 846 |
ARE | 842 |
ERR | 801 |
ETA | 795 |
ALI | 771 |
ALA | 770 |
EAR | 767 |
OREO | 761 |
ODE | 760 |
ADO | 756 |
All but two of those would be valid Scrabble words, but Ali (most likely Muhammad) and Oreo are both proper nouns. I'll probably look at Vowel distribution next, I bet the more common words will see a heavy bias towards vowels, as well as towards the 6 over-represented letters I pulled out in the last post.
To have somewhere to start analyzing all these words, I divided them into buckets by the number of occurrences, subdivided by powers of 2, so the first bucket contains all the words that occur 2^0 times, the second is from 2^0+1 to 2^1, all the way up to 2^10+.
Below the cut, check out my graphs of the number of words in each bucket, the total number of occurrences in each bucket, and the average word length in each bucket, I think they're kind of cool.
I really like the curvy shape on the Number of Total Usages graph, it's not far off the standard bell curve shape I was expecting, but I was surprised by how smooth the weighted average word length graph is. I guess it makes sense, as the longer a word is, the more likely it is to be unique, but I didn't expect the relationship to be that direct.
No comments:
Post a Comment