Letter Frequencies

The frequency of letters in text messages has often been studied for use in cryptography, and frequency analysis in particular. An exact analysis of this is not possible, as each person writes slightly differently; however, an approximate order of frequency is ETAOIN SHRDL UCMFG YPWBV KXJQZ. An analysis based on all the words in the Cambridge Encyclopedia gave a word frequency list quite unlike that which shows up in most lists. From most common to least common, it gave EATIN ORSLH DCMUF PGBYW VKXJZQ. Note that more A's appeared than T's. The author stated that the variance from standard lists could be due to the many foreign words often repeated within articles. Note, too, that the frequency of X is greater in this work than that of J. This brings up an interesting point. Letter frequencies, like word frequencies, tend to vary, both by writer and by subject. You cannot talk about x-rays without using frequent x's, and you cannot use any letter if on your keyboard it is broken. Letter, digraph, trigraph, and word frequencies can be used to prove or disprove authorship. Things like average word and sentence length is also used. Everyone writes differently. Hemingway is not Faulkner, and so on. A precise average usage could only be gleaned by analyzing usage in, say, a number of different chatrooms, or, say, by covertly checking email, or something of that order using a huge mass of differing inputs.

Relative Frequencies of Text

Letter>
Frequency Letter Frequency
b>a 0.08167 n 0.06749
b>b 0.01492 o 0.07507
b>c 0.02782 p 0.01929
b>d 0.04253 q 0.00095
b>e 0.12702 r 0.05987
b>f 0.02228 s 0.06327
b>g 0.02015 t 0.09056
b>h 0.06094 u 0.02758
b>i 0.06966 v 0.00978
b>j 0.00153 w 0.02360
b>k 0.00772 x 0.00150
b>l 0.04025 y 0.01974
b>m 0.02406 z 0.00074

Top 10 Beginning of Word Letters

Letter>
requency
b>t 0.1594
b>a 0.155
b>i 0.0823
b>s 0.0775
b>o 0.0712
b>c 0.0597
b>m 0.0426
b>f 0.0408
b>p 0.040
b>w 0.0382

Top 10 End of Word Letters

Letter>
requency
b>e 0.1917
b>s 0.1435
b>d 0.0923
b>t 0.0864
b>n 0.0786
b>y 0.0730
b>r 0.0693
b>o 0.0467
b>l 0.0456
b>f 0.0408

Most Common Digrams (in order)

th, he, in, en, nt, re, er, an, ti, es, on, at, se, nd, or, ar, al, te, co, de, to, ra, et, ed, it, sa, em, ro.

Most Common Trigrams (in order)

the, and, tha, ent, ing, ion, tio, for, nde, has, nce, edt, tis, oft, sth, men

See Also

ETAOIN SHRDLU

 

<< PreviousWord BrowserNext >>
primary mathematics world contest
fosco maraini
the little apple
wairoa
sergey mikhaylovich darkin
po leung kuk
charles caldwell ryrie
lsm
bunkyo university
linux software map
cotton mill
godfrey hounsfield
orangism
kemijoki
peter darvill evans
bureau
license plate (japan)
nicomachus of thebes
maltese cuisine
2004 5 heineken cup
vasco da gama bridge
flying shuttle
acoustic mirror
yakovlev yak 36
jenny nystrm
mackensen class battlecruiser
piraeus bank romania
crayon rails
common rotation
steve jackson (uk)
steve jackson (us)
caesar cardini
pacheco
hih
curzio malaparte
sanlcar de barrameda
john merriman reynolds
water frame
owatonna
yuan ti
slush
owensville
joe mauer
thrash (mascot)