As promised, here is my extended analysis of the Global Letter Distribution stats I posted last. First, instead of using the chart from Google, I've switched to an analysis of all the words in the OED. As I should probably have expected, my initial, purely visual, analysis was way off. In reality, the use of the letter T is practically identical between the OED and the XWords. So to provide a little more mathematical rigor, I plotted the Percent Difference between the OED and XWord values as (XWord val - OED val) / (OED val). What I found surprising was that six letters, S, A, E, D, O, and T are overused at the expense of the other twenty letters, so if you're ever in doubt, that's a good place to start.
Google Charts wasn't doing what I wanted it to, so now I'm experimenting with Tableau, and let me tell you, it's pretty hot (if you're on rss, come view the whole post):
I'm definitely interested in any other functions you think would be fun
to graph, as the little bit of formal stats I had was a long time ago. Please let me know what you think!
No comments:
Post a Comment