Hrant, your diverging boumas idea is perhaps worth pursuing. Diverging boumas, if you will, is one of the forces driving script evolution. The other pole is optimizing phase alignment and rhythmic cohesion to get consolidation of the word image. The latter reaches a high point in terxtura, but at the cost of divergent boumas. I believe this dynamic is described somewhere in Noordzij's writing.
Earlier you said: “If information in the parafovea is used to decide what word a bouma is (and that's clearly happening because at the top end only 1/3 of text is foveated)
then that's reading.”
I took this to be your description of how we actually read. I tried to show that your facts are skewed and your conclusions misleading, because they fail to take into account well-documented and widely accepted constraints on parafoveal vision.
Now you say: “A text face designer interested in marking a non-trivial increase in reading performance isn't concerned with how people usually read; he's concerned with how they might read under the right conditions.” I think your ideas about how people might read don't recognize hard-wired perceptual psychophysical constraints, or that upper limit saccades are, if I'm not mistaken typically followed by regressions, that is, they've gone too far. The thread of sense has become interrupted.
Nevertheless, I do think, as hinted at above that the parafoveal preview benefit can be enhanced by strategic moves in some of the directions you describe.
I think that the time required for robust sense-following and accurate meaning construction are the real constraints on speed, not improperly leveraging the parafovea. The actual foveal uptake of information takes much less time than 1/4 of a second.
> So the question becomes: how do we make boumas more distinctive?
Ya folken ever wondered why the 'g' is so often double-storied while the 'q' never is? As long as the letters can be told apart cheaply, especially in parafoveal vision, the boumas can and will be resolved cheaply.
Chris: It is like saying, when talking about water, which is the more important, the hydrogen or the oxygen.
Not really, because if you give someone a black pen and a black piece of paper and ask them to write something, they'll still write the same letter shapes that they would if writing on white paper. We know what the figure shapes of letters are independent of our perception of the ground, and hence I find it reasonable to consider that we bias those figure shapes just as we bias the figure shapes of deer against a backdrop of trees. Indeed, given the great variety of weight and proportion in letters that we recognise as the same figure forms, necessarily independent of the shapes and sizes of the internal and intervening ground, it seems to me that the structure of the figure form is not only biased but the essential component, more than the area or edge of either black or white. We can't claim that a super heavy and a hairline letter 'a' are recognised on any basis but their shared figural structure, since the black is otherwise so dissimilar and the white even more so.
Hrant: The subconscious brain, being driven by efficiency, tries to eat
up text in the largest chunks possible.
I don't think that is proven, and it isn't necessarily a good intuition either. Efficiency does not always reside in eating up the largest chunk possible, as you'll discover if you try to eat a piece of cake that way while your older sister nibbles at hers and finishes first (thereby winning the coveted last slice on the plate). The key notion for efficiency seems to me whatever constitutes 'bite size' in reading, i.e. what is the optimal size of the perceptual unit for speed, accuracy and endurance (comfort over time)? Two things that seem obvious to me are that a) letters have a built-in advantage over sub-letter and super-letter elements as perceptual units in that they are self-contained visually and often linguistically (more so in many languages than in English), and b) the optimal perceptual unit varies as we read, most obviously relative to word length and familiarity. So I have no difficulty accepting that some of the time the optimal perceptual unit is something larger than a single letter, especially in the case of short and common words, whether that is conceived as a bouma or a particular role architecture or a particular gridded arrangement. But I don't think it is likely, let alone proven, that the brain is always trying to consume the biggest chunks of text possible. On the contrary, the process of becoming a good reader must involve the brain learning from experience how much of a text to bite off; which is why experienced readers make fewer regressions than inexperienced ones and do better in accuracy tests than so-called speed readers, who bite off more than they can chew.
[John] letters have a built-in advantage over sub-letter and super-letter elements as perceptual units in that they are self-contained visually and often linguistically (more so in many languages than in English), and b) the optimal perceptual unit varies as we read, most obviously relative to word length and familiarity.
John, think in perceptual processing terms. To facilitate this I suggest it is important to distinguish between units of perception and units of processing, tokenizable parts of composite structures, and featural primitives.
Words have a built-in advantage over letters as perceptual units in that they are space-delimited and internally cohesive bounded maps of blacks and whites. To get to letters a preliminary segmentation (tokenization) must occur within this unit that designer try their darndest to make internally cohesive, that is, to make into a single unit, a perceptual unit.* Letters are the tokenizable parts of composite structures and the composite structures have gestalt integrity.
Every perceptual scientist I know begins with feature detection or feature analytic processing. To me features (in the alphabetic domain) are things like expressedness and closure. In my scheme the units of processing in glyph-like structures are role-units. Most psychologist collapse the two (features and role-units).
If, at the role-unit level, blacks are in phase and whites rhythmically coordinated, the segmentation is psycho-physically discourged or inhibited. This is why I think a single-tiered cross-letter gathering of role-units (protoypical structures — white and black — not literal shapes) with attention to their local combination characteristics and global distribution across word-integral between-letter reference points is the best way to address the recognitional problem.
* Should designers stop trying to do this? Don't designers try to acheive gestalt cohesion by making letters less self-contained and more open to eachother?
> Diverging boumas, if you will, is one of the forces driving script evolution.
I wish. As far as I've seen (and it does make sense) divergence is only
implemented at the level of individual symbols, and that only in extreme
cases (like "1" vs "7" where the latter is given a middle bar). This is not
surprising because layman consciousness cannot grasp immersive reading;
it only grasps legibility, and that only in an informal, reactionary way.
> well-documented and widely accepted constraints on parafoveal vision.
As a rule simpler data is more reliable. I think these "constraints"
might be the result of poor testing. How do you explain the simpler,
more reliable data showing that saccades can far exceed letterwise
> upper limit saccades are, if I'm not
> mistaken typically followed by regressions
I've never seen that.
If you show me that all saccades that exceed the
fovea's span (4-5 letters) result in a regression
(or faulty comprehension) then I'm with you.
> the parafoveal preview benefit can be enhanced
If it can be enhanced to the point of picking out boumas, that
means the capability is there... Why this parafoveaphobia? :-)
> The actual foveal uptake of information
> takes much less time than 1/4 of a second.
I'd say that reinforces my view!
I posit that the remainder of the effort is going into reading the parafovea.
> if you give someone a black pen and a black piece of paper
> and ask them to write something, they'll still write the same
> letter shapes that they would if writing on white paper.
You can't be serious.
Try writing with your eyes closed.
> We can't claim that a super heavy and a hairline letter 'a' are
> recognised on any basis but their shared figural structure, since
> the black is otherwise so dissimilar and the white even more so.
But they're not dissimilar, in the ways that count.
> The key notion for efficiency seems to me
> whatever constitutes 'bite size' in reading
Actually, I agree.
But it doesn't make sense for that to be letters, because clusters
of them occur in high frequencies* and furthermore are easier
to pick out in the blurry parafovea.
* And for example "th" is much more frequent than "z".
What I'd ask you to consider is this: if short and common words are
picked up as wholes, why not increasingly long and decreasingly
common as a reader's experience increases? Language after all is
incredibly redundant - English is around 50%. So the more you
read the more you can be sufficiently sure that blurry cluster in
the parafovea is what you think it is and take the leap, literally.
> experienced readers make fewer regressions than inexperienced ones
This is only true when the reader isn't being challenged, either
in terms of time pressure or difficulty of the material. My feeling
is that the proportion of regressions is maintained at the top end;
it's the length of saccades that increases with experience.
> Don't designers try to acheive gestalt cohesion by making
> letters less self-contained and more open to each other?
Not enough. And painting the letters necessarily
impedes that, even if one believes in the White.
Hrant, I guess I just don't buy your pattern of interpreting, questioning and assimilating data. I suspect the feeling is mutual. You don't find my conclusions compelling, and I don't find your arguments well founded. Yet I am interested in some of the things you are working at.
I'm certainly not nearly as versed in the science as you are, and
you know I'm a believer in science. But it has to make sense, and
the core questions (like how saccades can far exceed the acuity
of the fovea) have to be answered before we go much deeper.
I guess there are many levels of "answers", and we each try
to grasp the level that can best serve our near-term future.
[Hrant] I'm a believer in science.
I think that, to make bold statements about grasping immersive reading you need also to immerse yourself in the science. My view goes against the conventional understanding of reading, but developed and is developing in constant conversation with a wealth of literature from various subdomains and in the context of conflicting perspectives within the literature. My apriori going in were and are shaped by paying close attention to the obsessions and attunements of masters of the craft of type-design and typography, as well as to initiatives like yours and Bloemsma's.
As I said in Thessaloniki years ago, it's an ever-expanding sea to drink and a conflicting maze of wandering paths, but it can be wonderfully productive.
Hrant, is your idea that saccades are made when the reading system already has an idea about or knows what the n+1 or n+2 word is (where n is the fixated word)? The purpose of a fixation is then 1) to verify or falsify, and 2) to bring more words into parafoveal view?
I can understand the appeal of this, but I doubt it’s how immersive reading actually works.
This also helps me understand your line of questioning, the logic behind your snippet retorts, and that you might think all my foveal “rapid automatic visual word-form resolution” stuff is grossly inflated and largely beside the point, and that the parallel letter recognition view is right in the domain of foveal vision.
My understanding is that saccades are made when a foveally-based visual word-form resolution event has occurred, and because we don’t know from parafoveal preview what the n+1 or n+2 word is. The reason why some words are skipped and why the next saccade will go to the n+2 word instead is because of the anticipatory structure of sense-following which seems to be oriented to bringing substance words into view.
A basic principle for me is that, though the unit of perception is typically the word, rapid automatic visual word-form resolution requires robust perceptual discrimination affordance at least to the level of role-units / letter parts / features. I take this to be confirmed beyond the shadow of a reasonable doubt by Pelli’s work. Luckiesh expresses this by laying emphasis on the visibility or clear seeing of the critical details as a first reqirement for ease of reading.
Earlier I wrote: “upper limit saccades are, if I'm not mistaken typically followed by regressions.”
Hrant contested this.
In his comprehensive 1998 review article “Eye Movements in Reading and Information Processing: 20 Years of Research” [available here] Keith Rayner writes: “[…] saccades vary from 1 to over 15 letter spaces. […] Saccades as long as 15 letter spaces are quite rare and often occur immediately following a regression in which readers typically make a long saccade to place the eyes ahead of where they were prior to making the regression.”
1) You said long saccades are typically followed by regressions.
2) Things like "rare", "often" etc. leave room for what I'm saying.
3) What about between 5 and 15?
From where I stand I see no proof that the parafovea
cannot read boumas, and some evidence that it can.
Yes, I was correcting my error.
To be fair though: if very long saccades only serve to skip
over previously read boumas, then that negates my claim.
However, to me anything more than ~10 has implications
in terms of the parafovea's role.
(I don't know why that segment is suddenly going wrong font...)
Basically I think the fovea and parafovea are each used for what
they're good at: the former for its acuity, the latter for its range.
So we fixate on something, and simultaneously read what are in effect
single-letter (but see below) boumas in the fovea and multi-letter
boumas in the parafovea (in the direction of reading only). The depth
to which we can read in the parafovea depends on how high-frequency
and distinctive the given boumas are. The location of the next fixation
is simply where our confidence (which gets built up over time with
experience, and also depends on the difficulty of the material) has
run out. So we fixate on something because we have no choice in terms
of picking out all the boumas with sufficient confidence; and we
regress when the next fixation reveals a problem with the boumas the
parafovea gave us in the previous fixation - but that's OK because it
means we're making good speed overall.
I have yet to hear a better explanation that addressed the fact that
saccades far exceed the letterwise acuity of the fovea. If Pelli for
example implies that long saccades cannot be functional, his data
must be flawed because they exist! The subconscious brain is much
more clever than any researcher. The data cannot lead the way, only
our logic can do that.
I previously wrote:
> As Larson has convinced me at least, the fovea
> does not need more than individual letters.
Actually I do see room to move on that: if the brain
can pick out multi-letter boumas in the fovea, I guess
there's no reason it shouldn't, since that saves time.
I just don't know if the processing overhead (I mean
once the boumas are known) makes that moot or not.
In the parafovea we have no choice because individual
letters cannot be identified with enough confidence.
Peter: To get to letters a preliminary segmentation (tokenization) must occur within this unit that designer try their darndest to make internally cohesive, that is, to make into a single unit, a perceptual unit.
I don't think this is what designers are doing, although I understand that many of them may think that is what they are doing. I think what designers are actually doing is making the necessarily different (distinctly recognisable) letters visually similar in a variety of ways, such that no letter strays into a different spatial frequency channel or otherwise disrupts the reader. Part of this does involve what I call knitting of words -- making internally cohesive, as you say --: creating a flexible but reasonably consistent internal periodicity in the arrangement of features, which establishes the integrity of the word as a textual linguistic unit, but I don't think it follows that this implies making a word into a single perceptual unit. Obviously we perceive words as visual units, thanks to inter-word spacing and internal integrity (even in the parafovea and importantly for navigation), but it doesn't follow that we treat words as perceptual units for recognition purposes: it may well be that segmentation of the internally cohesive word is normal and necessary. Certainly that seems to be the case for all but the shortest and commonest words. I'm quite happy to accept that segmentation may occur at levels above, below and through the letter level, but once one gets within the word unit, as one must, letters seem to be to have a bias in being linguistically assignable things.
The brain is quite capable of suggesting what its perceptors search for and acquire in low-data areas of vision, after all, such a process works seamlessly to fill in the blind spot. Call it cloning, after Photoshop.
The brain decodes a streaming cascade of jumbled percepts (Peter’s term), continually directing the fovea towards emergent meaning as, upon closer inspection, shapeless forms become coherent words.
Hrant: You can't be serious. Try writing with your eyes closed.
Okay, I did. Actually, the admittedly short specimen looked somewhat better than my usual writing. The problem with writing with one's eyes closed isn't the shaping of the letters or their spacing -- muscle memory is good at this --, its the growing sense of disorientation regarding the page as a whole, which limits one to short sequences. I suppose if one practised, one might be able to do longer, and presumably many people who have gone blind in later life have managed to write.
> The problem with writing with one's eyes closed
> isn't the shaping of the letters or their spacing
I think stand-alone letters can be formed decently
without looking, but visual feedback is immediately
helpful when you put down the second letter... which
is when the white really kicks in!
John, the word segmentation might be misleading. Here it means taking discrete groups of 2 or 3 features (units of processing) and processing them in separate channels. I compare this to running a three- to five-legged race, where you have to team up with a neighbour that has identical shoelaces to yours. It’s awkward and involves a two-stage process, the first being feature (= leg) binding after you’ve figure out who your race partners are.
There are other problems as well, associated with the explicit labeling of letters. Because at this level phonetic equivalents come on line, there is a phonetic disambiguation overhead problem, which requires a significant array of feed-back loops to solve. David Boulton in his series of interviews for his Children of the Code website brings this up repeatedly. Huey in 1906 thought there had to be an inhibition of incipient recognitions for letters for the reading system to work efficiently.
These things aren't decisive for selecting between your and my alternatives. Both I think are strong hypotheses. You'll probably grant that. There will have to be a way to decide between the two.
1) I think the wikipedia graphic is more accurate. See:http://upload.wikimedia.org/wikipedia/commons/e/e4/EyeFixationsReading.gif
2) Try using mongrels instead of blurring. See:http://www.journalofvision.org/content/9/12/13/F2.expansion.htmlhttp://www.journalofvision.org/content/9/12/13.full
3) Bring your images down to text sizes rather than display sizes, and limit viewing time to 1/4 to 1/2 second.
Peter: Because at this level phonetic equivalents come on line, there is a phonetic disambiguation overhead problem, which requires a significant array of feed-back loops to solve. David Boulton in his series of interviews for his Children of the Code website brings this up repeatedly. Huey in 1906 thought there had to be an inhibition of incipient recognitions for letters for the reading system to work efficiently.
In the case of non-phonetic spelling systems this does appear to present an issue, but I think it is answered by understanding letter recognition in a role-unit way in word recognition, i.e. the place of the letter within the word is crucial to efficient word recognition so may override potentially ambiguous phonetic associations. That is, it isn't necessary to inhibit incipient letter recognition per se if one readily inhibits phonetic associations by filtering based on the letter role within the word. By the time one has recognised letters and correctly identified their place in the word, one is at the word recognition stage, and phonetic association is something that I would consider a bypassed mechanism, something to which the reader falls back in the case of unfamiliar words.
John, there are other things that could be brought to the table, but this will do for now, at least for me on this tangent.
Why only lateral though?
> Try using mongrels instead of blurring.
I'd love to. Photoshop filter? :-)
BTW it's important to realize that the blurring
I show there is... allegorical - it's really the idea
of loss of acuity that counts. And since that page
is for laymen I don't mind that simplification.
You might have noted another simplification there:
I'm saying that whole words are taken in instead of
letters, while in fact I believe that clusters of
letters (so parts of words) are the typical bouma.
> Bring your images down to text sizes rather
> than display sizes, and limit viewing time
> to 1/4 to 1/2 second.
That actually can't work - and I don't just mean practically.
Taking conscious action is based on conscious understanding,
not actual experience. Think for example of how electron orbits
are shown; in really it's nothing like that.
hhp – The subconscious brain ...
I think 'unconscious' would have done the job while avoiding a specific theory. (Left Inception for an overdose of 'subconscious'.)
I had the same thought, Karsten, and have rumbled with Hrant over his use of 'subconscious' in the past.
Really? To me "unconscious" sounds too much like being knocked out...
But OK, let me read up some terminology and get back to you.
For information on how the mongrels are done, you'll have to consult the text of the paper in the link I provided, or the authors.
By keeping the display size and the unlimited timing you give a false impression of what's possible to gather in parafoveal vision. Using a line of mongrels keyed to distance from the fixation point, and a limited time window, will give you and the layman a much better simulation of what parafoveal vision allows. Your explanation sounds reasonable given your misleading simulations, and in the absence of exposure to the burgeoning literature on the effect of crowding in parafoveal vision.
I'll stick to my version of when and why eye movements occur:http://typophile.com/node/88563?page=2#comment-489660
> By keeping the display size and the unlimited timing you give a
> false impression of what's possible to gather in parafoveal vision.
Without both of those things I couldn't give any impression.
Should I give a caveat to that effect? Maybe. But I still think
the main idea I'm trying to convey is being communicated.
> Using a line of mongrels keyed to distance from
> the fixation point, and a limited time window
But I assume you mean mongrels at the fixation point.
With this I agree (I just don't have the means). However
giving people limited time simply cannot work to convey
anything more than a "Huh?".
Remember, I'm not running an experiment (although that
does have some merit), I'm trying to explain something to
From that previous post:
> The reason why some words are skipped and why the next saccade
> will go to the n+2 word instead is because of the anticipatory
> structure of sense-following which seems to be oriented to
> bringing substance words into view.
OK, could you explain this in long/simpler form?
OK, I'll make you a deal: I'll switch to "unconscious" as
soon as somebody finds a good replacement for "stroke". :-)
[Hrant] However giving people limited time simply cannot work to convey
anything more than a "Huh?"
Exactly! It will provide an accurate simulation of the problem parafoveal vision faces. The "Huh?" is why the thing needs to be brought into foveal vision where it will appear in unscrambled view.
[Yes, mongrels at the fixation point.]
[A longer simpler explanation of what I mean by the anticipatory structure of sense-following will have to wait. I has to do with semantic and syntactic expectations.]
> The "Huh?" is why the thing needs to be brought into
> foveal vision where it will appear in unscrambled view.
But that's not my version. No matter what it looks like the stuff
is being jumped over, without loss of meaning. So I think the brain
can figure out the mongrels* (up to a point). Remember, boumas
don't have to be made of letters. Hmmm, it's actually pretty cool to
think there might be a secret dictionary of symbols that our brain
accumulates and processes in order to read.
* Assuming they're even real.
But that's not my version.
No matter what it looks like the stuff is being jumped over, without loss of meaning.
According to the Keith Rayner review, by and large only “function” words are skipped, not “content” words, and even those, only about 25 percent of the time. What’s skipped is skipped because in these instances coarse featural coding has provided in enough information about what's there to make full visual word-form resolution unneccessary. The same can not be said for the new word that gets fixated in.
Assuming they're even real.
To make a judgment about that, you'll need to read the Balas paper.
With my contention being that there's no qualitative difference
between function words and content words, it's just that the
much lower absolute and contextual frequencies of the latter
require more experience, more foveal proximity and more
distinctive boumas to be saccaded over.
[Hrant] it's just that the much lower absolute and contextual frequencies of the latter require more experience, more foveal proximity and more distinctive boumas to be saccaded over.
The last two conditions sound like Hangul. Are Hangul words saccaded over?
I don't know. Certainly looking at how some key non-
Latin scripts are read would provide invaluable clues.
A replacement for 'stroke'? Hmm... A 'bullet' seems to be quite common. :-|
Hrant: With my contention being that there's no qualitative difference between function words and content words, it's just that the much lower absolute and contextual frequencies of the latter require more experience, more foveal proximity and more distinctive boumas to be saccaded over.
It's a reasonable hypothesis that frequency accounts for the bias for skipping function words, but I wouldn't rule out their role as a factor. By function words, I understand syntactically functional words, i.e. those that organise the sense of an utterance. Since there is a limited number of syntactical organisations of sense, and these are quite regular in languages such as English that have fairly consistent word ordering and heavy reliance on function words -- contra languages with extensive noun declension, flexible word order, and agglutination --, it seems to me quite likely that we can often skip perception of function words because our brains fill in the syntax readily and accurately.
Hrant: I'll switch to "unconscious" as soon as somebody finds a good replacement for "stroke".
I think it's possible to take 'stroke' as a metaphor, rather than accepting the stroke as a theory, but I understand your desire for an alternative. Tim Donaldson just says 'line', but that seems to me to fail to capture the sense of area of the black, and perhaps suggests something to you too much like a skeleton. The best descriptive alternative I can come up with is too much of a mouthful: bounded structural element.
[John] bounded structural element
My role-unit is, and has always been, a “bounded structural element” of an intermediate complexity somewhere between simple features and letter wholes. Role units are prototypical structures, not literal shapes.
I used to use just role, and got terms like role-architecture and role-architecturally evoked form from that. Bill Berkson suggested I use role-unit instead.
I picked up role from Douglas Hofstadter’s 1982 “Metafont, Metamathematics, and Metaphysics: Comments on Donald Knuth's Article ‘The Concept of a Meta-Font’”. For Hofstadter, roles can be filled in different ways. For example an old-style a fills the a bowl-role in a different ways than a didoni a.
For the introduction of the term role into typographic discourse, see: Hofstadter, Douglas R., “Metafont, Metamathematics, and Metaphysics: Comments on Donald Knuth's Article ‘The Concept of a Meta-Font’” Visible Language, Vol. XVI no. 4, pp. 309-338 (republished in Hofstadter's book Metamagical Themas, NY: Basic Books, 1985)
A manuscript version is available here:ftp://ftp.cs.indiana.edu/pub/techreports/TR136.pdf
John, earlier I said:
the visual system
(1) breaks stimulus words down into oriented lines and curves, to the point where
(2) responsiveness to aspect, closure and expressedness accumulates and
(3) resolution or ‘quantization’ into role-units occurs.
For (1) let's say instead that in the context of edge detection and the detection of surface polarity the visual system binds to this information from the coding of feature primitives such as orientation, linearity and curvature, to the point where […]
If your “bounded structural elements” are my “role-units,” can we agree that (1) through (3) descibe the first stages of visual word-form resolution?
In other words, can we agree that these terms are not simply descriptive conveniences, but might also represent the elemental units of processing.
Hrant, about the “well-documented and widely accepted constraints on parafoveal vision,” which you thought “might be the result of poor testing,” see this just-released and, by the looks of it, comprehensive paper: “Peripheral vision and pattern recognition: A review,” by Hans Strasburger, Ingo Rentschler, Martin Jüttner