Monitor on Psychology interview with Kevin Larson

enne_son's picture

From the Microsoft Typography website:

Redmond - 2 November 2010
Monitor on Psychology, the magazine of the American Psychological Association (APA), has published an interview with Kevin Larson, of Microsoft’s Advanced Reading Technologies team, in the November 2010. The interview focuses on the interactions of typographers and psychologists in trying to improve onscreen reading. Monitor on Psychology is a general psychology magazine, sent to every member of the APA.

Here is the url:

Chris Dean's picture

@enne_son: Citation for Pelli and Tillman please?

William Berkson's picture

Peter, I think from the examples I gave that there must be at least one more matrix involved: it is resolving the words into letters, or the mass of letters as a pattern into words, as you hold. I don't think identifying letters within a word is a trivial process to do so quickly. My suspicion, as I have said earlier, it that a "filtering" process for both words and letters is done simultaneously, and whichever wins the race to the semantic part of the brain wins.

enne_son's picture

Bill, you lose me in your first two sentences.

As for the third, I think the visual cortex is biased to select the most efficient route. It's not a matter of which process gets there first. I think the letter process is a phase in learning to read, but once the word is learned there is a there is a remapping in role unit terms. I think the perceptual psychophysics of a well-spaced word is such that the phase alignment that it puts in place becomes the catalyst for the remapping. After this the only place the letter route is really used is when new or unfamiliar words are encountered or disturbance caused by things like alternating case frustrate access to the more elemental role-unit mapping.

John Hudson's picture

Bill, I'm with you with regard to the human facility for pattern recognition, but this is why, contra your last post, I think that identifying the learned pattern of letters within a word is, if not trivial in terms of the perceptual and cognitive processing involved, something most humans are likely to be very good at. Indeed, pattern recognition is properly understood as pattern isolation, i.e. identifying familiar patterns within a crowded context.

With regard to Peter's remapping of individual letter recognition to role unit recognition, I think it very likely that as we gain experience in reading we start to recognise the patterns caused by combinations of letters, especially frequent combinations and, I suspect, orthographically significant combinations, i.e. digraphs or diphthongs. But I'm doubtful if these super-letter patterns ever exceed two or three letters in length, and I think were faced with something like a perceptual equivalent of the Arabic ligature problem: if we process a pair of letters as a unit, in a sequence of three letters do we process the first two or the second two? or if we process a triplet of letters as a unit, in a sequence of four letters do we process the first three or the last three? or do we instead process the first two and the second two? or what about the middle two?

It seems to me that in terms of accuracy relative to efficiency, there are likely to be diminishing returns in trying to recognise units greater than individual letters which remain, after all, the most reliable unit of recognition, the one we can fall back on. The net gain in efficiency may make such super-letter pattern recognition worth while to a point, but I wonder just how far that point is from letterwise word recognition?

enne_son's picture

John, the capacity gains will have to be assessed by network modeling. Echoing James Townsend, from the Cognitive Modeling Training Program in the Department of Psychological and Brain Science of the University of Indiana in a letter to me, I think they will be competitive until the phase alignment problem — if it can be quantified — is factored in. Then, I expect, there will be some clear differentiation.

I don’t see a reason to restrict the amount of role-unit information the system can form vectors over to the amount contained in two or three letters. Is the Arabic ligature problem really real?

There are probably bottom-up and top-down constraints. The bottom up constraint is probably defined by the width of the individual reader’s uncrowded span (Denis Pelli again), which varies. If I remember correctly, it can vary from 5 to 7 letters.

The top-down constraints are probably of several different orders. Letter vectors form over one or more connected or intersecting role-units. You mentioned digraphs and diphthongs, both of which map to phonologically distinctive complexes. What's to stop them from forming over morphemes where lexicality is the driving force in the vectorization?

William Berkson's picture

John, I agree that humans are very good at identifying letters. Letters are designed to be easy to recognize. But given that our capacity is so much greater, why would our brains forgo the efficiency of recognizing "the" as a whole pattern, and put a semantic association with that? It isn't any more complex than many Chinese root characters, so why not take advantage of that? That's my plausibility argument for Peter's view.

By the way, Peter, does "interactive activation" theory posit a look-up on the semantic level, or the "matrix resonance" I find plausible?

Kevin Larson's picture

I don’t fully understand the conversation at the moment, but it did make me think of these three papers on font tuning. Font tuning appears to be a perceptual process that allows readers to recognizing letters or words faster when they are in the same font that they last looked at. This effect only seems to be true for native readers of an orthography.

Sanocki, T. (1988). ‘Font Regularity Constraints on the Process of Letter Recognition’, Journal of Experimental Psychology: Human Perception and Performance, 14(3), 472-480.

Gauthier, I., Wong, A.C., Hayward, W.G., Cheung, O.S. (2006). Font tuning associated with expertise in letter perception. Perception, 35(4), 541-559.

Walker, P. (2008). Font tuning: A review and new experimental evidence. Visual Cognition, 16, 1022-1058.

Kevin Larson's picture

David, I think you are suggesting that reading research is only a topic right now because of the poor state of on-screen reading, and if screen densities suddenly improved than the topic would disappear.

I think you are partially right. It’s certainly true that my job at Microsoft exists because of the poor state of on-screen reading, and a lot of what I focus on has to do with dealing with the problem of poor screen density.

But reading research is a bigger field than you might imagine with several conferences and journals dedicated exclusively to such research, including specialized conferences for reading education, eye movement studies, and dyslexia. Very few of the researchers at these conferences care at all about the density differences between print and on-screen. For some, having a good model of word recognition is important because it provides information about how to teach a child to read. Interestingly, the two best predictors for a pre-reader becoming a successful reader within the next year are knowledge of letter names and phonemic awareness, being able to manipulate the sounds of language.

In addition to reading research being interesting because learning how the brain works is intrinsically interesting, and because it is useful for learning how to teach reading, I will also bet that reading research will eventually improve some aspect of print typography.

enne_son's picture

Kevin, I‘m not all that familiar with the font-tuning literature. But from what I‘ve gathered, I‘d guess phase alignment and implementation of cartesian space play a role in this. I include phase alignment, because the peak frequency in the x-dimension of reading shifts position from font to font, as does the narrowness of the alignment.

Bill, I‘m not sure what ‘doing a look-up on the semantic level’ might mean in neurophysical / neurobiological / neural-network terms. I don‘t see that the interactive activation account posits either a look-up or matrix resonance, unless you want to think of node-activation as a type of matrix resonance, which is plausible.

I see three classes of schemes:
Feed-forward node-activation schemes with feedback loops; matrix resonance schemes which posit vector-placement over networks of statements about identity, connection and location; trace-signature recognition schemes. If recognition manifests itself neurobiologically as a cascading series of neuro-transmissional events, than specific role units, and the registration of their relations in letters, bigrams, diphthongs and words can become uniquely associated with a particular series of such events. The series of events, being unique to the role-unit, etc., can than be said to have a trace signature.

I hope readers can find some of this illuminating, rather than just confusing. Those conspiring to sort out the perceptual and cognitive mechanics of reading in today‘s environment might benefit from this.

Nick Shinn's picture

@Kevin: I will also bet that reading research will eventually improve some aspect of print typography.

BS has been mentioned in this thread, and making a wager you can't lose ("eventually" will outlast all bettors) qualifies as such.

John Hudson's picture

Peter: I don’t see a reason to restrict the amount of role-unit information the system can form vectors over to the amount contained in two or three letters. Is the Arabic ligature problem really real?

Note that I was suggesting a perceptual parallel to the Arabic ligature problem, which is a mechanical problem, not a perceptual one.

The Arabic ligature problem is this: if you have a ligature that represents a combination of two letters AB, and another ligature that represents a combination of two letters BC, what do you do when you encounter the sequence of letters ABC? You either need to favour AB over BC, or vice versa, or you need to add an ABC ligature to your font. But what happens if you also have a CD ligature, or even a BCD ligature, and encounter an ABCD sequence? You have to make the same choice about which sequence of letters to process as a unit, and the only way to optimise the typographic results is to add an ABCD ligature, and then an ABCDE ligature, and an ABCDEF ligature, all the way up to the maximum sequence of connected letters anticipated in the vocabulary of the target language(s). And this is exactly what some approaches to nastaliq typography ended up doing, resulting in fonts of >20,000 ligatures, many of them whole words. [And then a Palistani newspaper would transliterate the name of a Soviet premier, and a new ligature would need to be added.]

The Arabic ligature problem is resolved by not using ligatures.

What I am suggesting is that a similar problem might arise in any situation in which two or more letters are processed as a unit, including the perceptual situation. If we learn to recognise the pattern of features that constitute two letters in sequence, or three letters in sequence, what happens when we encounter overlapping sequences? You are familiar, I am sure with Wittgenstein's rabbit/duck illustration, or with the famous images of the old crone / young beauty, or the portrait of Freud that can also be seen as a naked woman reclining? The point of such illustrations is that at any one moment we see either one image or the other: we don't see both at once. I believe -- I am not aware of evidence to the contrary -- that pattern isolation, recognition of patterns within a crowded context, works the same way. Otherwise, word search puzzles would be no challenge at all. So in reading a sequence of letters containing two overlapping patterns that we recognise as units, but whose combined form we do not recognise as a pattern, I predict that we will recognise only one of the patterns. Now, it probably doesn't matter which of the two patterns we recognise in that moment, except insofar as one or the other might have an advantage in terms of accuracy or efficiency of identifying the particular word (whereas the other pattern might have the advantage in a different word). But what may matter is that if we try perceptually to resolve the problem by recognising three, four, five letter sequences as patterns that subsume the shorter sequences, there are almost certainly diminishing returns in efficiency and accuracy, unless you think we read like a ligatured nastaliq font with thousands of patterns of letters sequences in our heads to be searched among for matches when reading.

John Hudson's picture

Bill: But given that our capacity is so much greater, why would our brains forgo the efficiency of recognizing "the" as a whole pattern, and put a semantic association with that?

I never said it didn't, and there seems to be good evidence from saccade length that short words are more easily recognised that long words and some kind of super-letter pattern recognition is likely. But your argument can go both ways. Instead of asking why the brain would forgo the efficiency of recognising 'the' as a whole pattern instead of a sequence of three letter patterns, you might just as well ask why the brain would do more than it needs to do to recognise the word? If letterwise recognition is as efficient as it needs to be to produce the kind of reading speed and accuracy that is typical of mature readers -- which is the claim of the reading psychologists --, is there any need for super-letter pattern recognition, especially within longer words in which, as I suggested to Peter, the necessary overlapping of such patterns would produce variable benefits and may even result in a net cost in some words.

dezcom's picture

The problem is that, when reading printed text from paper, reading the type is only a problem to the visually or cognitively impaired and they can certainly use some help. I can see highway signage and way-finding signage needing a look-see.
We have a temporary issue with online reading which is related to the technology--which will either soon enough change or be adapted to.

What amazes me is how WELL humans can read even the worst ßhit writing or type (including my own hand writing).
We may gain a very minute amount of reading speed and comprehension with further research so scientists can have at it all they want .

Personally, I am sure they can accurately count eye regressions but I don't know that they can say for sure what that means. I have several reasons that I know I go back and re-read a few words. Here is my top 10:
1. It was great writing and I enjoyed it;
2. It was really ßhitty writing and I couldn't understand it;
3. It said something that really pißsed me off;.
4. It said something I strongly agreed with;
5. It was a blatant lie;
6. It was funny;
7. It was stupid;
8. It was about typography (or dark chocolate or coffee);
9 It was about my family;
10. It was from the IRS, Ivan the Terrible, or John Gotty.

My biggest reason would have to be:

Because I was daydreaming and forgot where I was.


Nick Shinn's picture

I'm presently reading a book with Th ligatures. (Garamond Premier Pro.)
It's really getting between me and the text.
I'll be reading along merrily, and then Boom, WTF was that, and I'm perusing an ugly ligature.

William Berkson's picture

>you might just as well ask why the brain would do more than it needs to do to recognise the word?

Well I did ask and answer that, though my answer is conjectural, of course. In my view brain can *more quickly* do whole word recognition than by doing a look-up, given the way the brain works. We can easily recognize the difference between a cat and a dog, a holistic recognition problem that is currently beyond computers (as Kevin has explained) whereas adding 358 and 786 in our heads takes most of us a long time, if we can do it at all. Interactive activation, which is necessary to explain the word superiority effect at all under the "slot processing" model, I suspect operates more slowly than the kind of "matrix resonance" that is going on in pattern recognition. That's one reason why I don't buy that that is what's happening.

Your argument about increasing ligatures in Arabic is not relevant, in my view, because we are not talking about an endless combination of letters, but a finite and not that big collection of words, a few thousand, and a smaller group of phonetic combination that I think are recognized as well.

Kevin Larson's picture

> Personally, I am sure they can accurately count eye regressions but I don't know that they can say for sure what that means.

That’s a great point, and why clever experiments are developed to induce what is happening. For example a test could be developed to compare the regressive saccade rates when writing is confusing or when writing is funny compared to writing that is less confusing or less funny respectively. These tests would tell us if confusing or funny writing cause an increase in regressive saccades.

I expect that most regressive saccades are not caused by the semantic content. One finding shows that blinking is a source of eye regressions. The rate of regressive saccades immediately following a blink is measurably higher than the rate of regressive saccades immediately before the blink. I think this tells us that blinking is disruptive to reading, but is necessary to keep the cornea moist.

dezcom's picture

Sounds like a plan, Kevin!

John Hudson's picture

Bill: Your argument about increasing ligatures in Arabic is not relevant, in my view, because we are not talking about an endless combination of letters, but a finite and not that big collection of words, a few thousand, and a smaller group of phonetic combination that I think are recognized as well.

The combination of joined letters in Arabic is also not endless, and the maximum length of a lettergroup is something fairly close to the area of reasonably clear foveal focus in a fixation during reading. The problem that I described is not, in any case, one of length, but of pattern overlap. The length issue only arises if one tries to resolve the overlap issue by recognising larger patterns that incorporates the shorter, overlapping patterns.

Here is a word of average length: length. If we don't recognise the whole word as a pattern, then there is a very large number of possible combinations of single and multiple letter patterns that we potentially recognise within the word:

l e n g t h
le n g t h
l en g t h
l e ng t h
l e n gt h
l e n g th
le ng th
le n gt h
le n g th
l en g th
l en g t h
l en gt h
len g t h

Any one of these patterns will be more or less beneficial to accuracy and speed of word recognition than any other, and we can't predict or determine which patterns are actually recognised by an individual reader in a particular situation.

William Berkson's picture

John, I believe that for long words the letter scrambling texts have found that the ends of the word and its length are the stronger influence. As I mentioned with my idea of many different parallel analyses going on and one winning the race, I think the brain is doing a lot simultaneously. This would include identifying sub-word phonemes by pattern. This may seem inefficient, but not when the prime goal is recognizing the word quickly, and given the massive resources available. And this is the phenomenon of "attention": a whole lot of the brain is tasked to the priority of the moment. At least for some of us, that focus means we may not hear our wives reminding us of household chores when absorbed in reading, writing or designing type!

enne_son's picture

Massive resources are no excuse for inefficiency. Bill, I think your “many different parallel analyses” scheme results in something I’ve seen referred to by David Boulton as the disambiguation overhead problem. See: I think a horse-race that relies on identifying sub-word patterns kicks in when the rapid automatic process of resolving the whole word pattern in purely role-unit terms fails.

I’ll have more to say, I hope, about processing analogues to John’s Arabic Ligature Problem after I’ve knocked down some looming deadlines. I’ll be referring to Colin J. Davis & Jeffrey S. Bowers [2006] Contrasting Five Different Theories of Letter Position Coding: Evidence From Orthographic Similarity Effects.

enne_son's picture

For more on disambiguation overhead see: My scheme avoids this because there is no explicit labelling at a level higher than role-units. Yet as we read we ‘see’ or are subliminally aware of words as lettered things because local combination detection still occurs in the simultaneous co-activation of letter parts process.

John Hudson's picture

Bill: s I mentioned with my idea of many different parallel analyses going on and one winning the race, I think the brain is doing a lot simultaneously.

I don't find this idea very persuasive. I find it easier to believe that the cognitive process is iterative, and that the first method tried is determined by previous success. This is also one of the reasons why I'm not convinced that there is a magic moment when someone who had learned to read by recognising letter shapes suddenly becomes a different kind of reader, one who recognises words using some other pattern of shapes: what is the impetus to change something that works? This is not to say that I don't think we gradually pick up the ability to recognise other patterns within words, but I suspect this happens willy nilly, happens as a byproduct of looking at lots of words and not as a cognitive progression. Of course, if Karl Marx were interested in cognitive psychology, this is presumably the point where he would remind me that sufficient quantitative changes result in a qualitative change.

enne_son's picture

[with edits]

John, there is no magic moment: the change is not that sudden. The literature on perceptual learning details how the change might happen.

The impetus to change is three-fold: 1) the process of learning to read, being letter-based and iterative, is highly deliberative and cognitively lead but visual word-form resolution must ideally be perceptual driven, effortless and automatic, 2) the disambiguation overload caused by the phonetic assignments of discrete alphabetic markers must be silenced, and 3) narrow phase alignment — which the history of reading has selected for survival — makes the intervening segmentation required to isolate letters for independent (though parallel) recognition hard.

John Hudson's picture

Peter, before I respond, can you clarify what you mean by phase alignment? I want to be sure that I understand you.

I'm concerned that your first two impeti might be begging the question, that they suggest reasons for something to happen based on what you think should happen.

enne_son's picture

John, I'm not trying to prove anything. I’m just trying to lay out my notions in a persuasive or compelling manner. To do this I try to elucidate, using things I've gathered from years of immersion in the literature on reading — a kind of “critical retrieval,” if you will — how my scheme might elucidate the issues that you raise. The issue was that of impetus. I said a deliberative and iterative process was not ideal, and then highlighted a disambiguation and a potential perceptual psychophysical issue which both of which are insufficiently address. If that's “assuming the initial point,” so be it.

About phase alignment, fourier transforms show that in the horizontal direction of reading visual information is fairly tightly packed around a single phasal peak, i.e., a single spatial frequency. In transforms of well set type there are typically no competing harmonics, and the packing is not absolute. Competing harmonics begin to appear when type becomes badly spaced. This seems consistent across many typefaces, covering different styles. The realist sans serifs and the bodonis are most tightly packed, which accounts for the picket fence effect. Humanist sans serifs and traditonal old style and transitional faces are better. So when I say narrow phase alignment I am refer to the tight packing around a single phasal peak.

William Berkson's picture

Peter "efficiency" depends on the goal. If the overwhelming priority is on speed, then it is most efficient to throw a lot of resources at it, unless these are needed elsewhere. My reason for thinking that letter recognition is also starting simultaneously is that we are pretty good at it, as mixed case reading shows. We do slow down with mixed case, which I think shows that there is whole word pattern recognition involved. But the fact that we are still pretty good at mixed case, and scrambled words, I think shows that something like letter recognition is going on pretty often also. We seem to have some quite good skills at recognizing and assembling individual letters, even though that may not dominate.

Do you have an explanation of why we are pretty skilled at mixed case and scrambled letter reading?

enne_son's picture

Bill, the overwhelming priority is on sense-following. Effective sense following and typographical transparency is probably very compromised with mixed case and scrambled letter texts. I think we piece our way through scrambled text and alternating case much like a child learning to read pieces his or her way through unfamiliar words, Save your cognitive resources for sense following and meaning construction.

enne_son's picture

Bill, Kate Mayall, Glyn W. Humphries, et. al. [2001] “The Effects of Case Mixing on Word Recognition: Evidence from a PET Study” found that besides the increase in reaction times in naming and other word recognition tasks for alternating-case words, during both an implicit (feature detection) task and an explicit word-naming task, alternating-case words compared to same-case words produced increased activation in an area of the right parietal cortex previously associated with visual attention. They concluded that reading alternating-case words requires increased attentional processing.

Why are we pretty skilled at it? Because even at significant levels of disruption, with increased attentional processing, and maybe indeed some resultant activation of letter modes, matrix resonance can still happen?

enne_son's picture

In an effort to clarify for myself and others where my intrinsic integration account and a parallel letter recognition account might converge and diverge, let me try the following:

A letter recognizer, when applied to words, requires a parsing or segmentation routine. Suppose we think of this as a gathering of role-unit level information around salient centres. (We know that role-units — my term for independent letter parts like bowls and stems — are the perceptual processing primitives in reading because of some of Deinis Pelli’s work with spatial frequency channels and efficiency.) The salient centres might be indexed in parafoveal preview. They might correspond with open counters, or areas of high role-unt density or ‘expressed structure’ like in the e or s. The gathering is keyed to determine the prototypical identity of the role-units and detecting their local combinations, such as where and how role unites touch or intersect.

My account can work on this principle as well.

In the parallel letter recognition account it is assumed that this gathering proceeds to the point of explicit labeling, constructed in the Interactive Activation scheme as an activation of an independent abstract-letter node. A look-up function, which can be schematized as projection to or feeding-forward to a set of word nodes sits on top of this.

In my account the gathering around salient centres happens but node activation or explicit labelling is inhibited or suspended. (This is Edmund Burke Huey [1908]’s inhibition of incipient recognitions for letters.) The activation of word-nodes can be more direct.

It is not the gathering of information around salient centres that I object to, but the explicit labelling / letter-node activation caveat.

Because my account can effectively work on the principle of a parallel gathering around salient centres, it can in principle account for both our everyday experience — while reading — of words as lettered things and empirical data that is taken to suggest a letter-wise parsing operating in parallel.

To account for the capacity benefits experimenters see in Word Superiority tests, Marilyn Jager Adams speaks of interfacilitation [her term] at the letter level. My account imagines an interfacilitation routine operating at a role unit level and relying on phase alignment to be effective. The Interactive Activation scheme achieves the same or similar capacity benefits by positing facilitatory feedback loops between levels.

Richard Fink's picture

Bringhurst at ATYPI, Dublin. Photo by me.

I changed my mind about this guy. He's OK. Yes, he's a self acknowledged "type snob"; yes, he elevates writing above the other arts, so he can still KMA.
But I was reading the chapter on "Grooming The Font" from The Elements Of Typographic Style and it turns out the guy thinks most fonts need fixing. He's a tweaker! Go figure.

John Hudson's picture

Peter: I’ll have more to say, I hope, about processing analogues to John’s Arabic Ligature Problem...

I am very interested in pursuing this further, Peter, so please do post more when you are able.

Rich: ...[Bringhurst] elevates writing above the other arts...

He does? Robert is very knowledgeable on music and painting, so this statement surprises me. He has spent years trying -- not very successfully, I think -- to attain in typographic form something of the qualities of polyphonic music, so he is surely aware of some of the limitations of writing.

Also, it is perhaps better to say that literature is his principle interest, not writing per se, since he has done so much to champion oral literature.

Syndicate content Syndicate content