Tuesday, March 6, 2012

Global Distribution of Letters, Part 2, The Envisioning

As promised, here is my extended analysis of the Global Letter Distribution stats I posted last.  First, instead of using the chart from Google, I've switched to an analysis of all the words in the OED.   As I should probably have expected, my initial, purely visual, analysis was way off.  In reality, the use of the letter T is practically identical between the OED and the XWords.  So to provide a little more mathematical rigor, I plotted the Percent Difference between the OED and XWord values as (XWord val - OED val) / (OED val).  What I found surprising was that six letters, S, A, E, D, O, and T are overused at the expense of the other twenty letters, so if you're ever in doubt, that's a good place to start.

Google Charts wasn't doing what I wanted it to, so now I'm experimenting with Tableau, and let me tell you, it's pretty hot (if you're on rss, come view the whole post):


I'm definitely interested in any other functions you think would be fun to graph, as the little bit of formal stats I had was a long time ago. Please let me know what you think!

No comments:

Post a Comment