New to Typophile? Accounts are free, and easy to set up.
While I was compiling the linguistic kerning pair data, I realized that it might also be useful (in fact probably more so, when you think about it) to have adjacency data - to see what letters are likely to be on the left and right of a given letter. This can help not only in optimizing spacing (and kerning), but also in designing the letterforms themselves, for example by allowing fine-tuning of the whitespace relationships between/within the glyph bodies. As in the kerning data, this is based on an English corpus, so it's largely (but not entirely) limited to optimizing English setting.
1) For each letter (center column), on each side are the letters most likely to occur in decreasing frequency away from the center. The spaces separate frequency groups, while beyond the dash it's pretty slim pickings in terms of frequency. The asterisk on a side of a letter indicates that that row is 1/10 less frequent than the overall table norm*. As a reference, the most two frequent adjacencies are "th" and "he", both at above 100K instances**, then it drops to about 50K (for "an" and "in"), and the rest is mostly bunched up.
2) I used UC letters ro avoid apparent-leading issues - obviously lc is the real name of the game.
3) Sorry it's so ugly...
* So for example you can tell that b, p and v occur mostly as initial letters.
** The corpus has about 4.5 million pairs.