Metamers of Moby Dick

enne_son's picture

The image here:

comes from a 2011 Nature Neuroscience paper entitled “Metamers of the Ventral Stream.” The intention is to provide a visualization of representation of printed text in the periphery.

In “Peripheral vision and pattern recognition: A review,” (Journal of Vision 2011), Hans Strasburger, et. al., write: “It is commonly thought that blurriness of vision is the main characteristic of [“seeing sidelong”]. Yet [Jerome] Lettvin [“On Seeing Sidelong,” The Sciences, 1976] […] insisted that any theory of peripheral vision exclusively based on the assumption of blurriness is bound to fail: “When I look at something it is as if a pointer extends from my eye to an object. The ‘pointer’ is my gaze, and what it touches I see most clearly. Things are less distinct as they lie farther from my gaze. It is not as if these things go out of focus—but rather it’s as if somehow they lose the quality of form””

In the image the red dot indicates where the ‘pointer’ touches down.

The science behind metamers is explained in the Nature Neuroscience paper available at: http://www.jeremyfreeman.net/public/downloads/Freeman-Simoncelli-2011-Me.... The Strasburger, et. al., paper is at: http://www.journalofvision.org/content/11/5/13.full.pdf+html.

In a 2012 (Journal of Vision) paper “A summary statistic representation in peripheral vision explains visual search,” Ruth Rosenholtz et. al., use this science to generate metamers or mongrels of patches of text or shapes, and to introduce a notion of patch discriminability. Applied to printed text, patch discriminability would translate to Hrant’s notion of bouma divergence. For Rosenholtz, et. al., visual patch discriminability using metamer stimuli predicts ease of visual search. So for our purposes, gauging patch discriminability might help us get a handle on and test the impact of bouma divergence (or optimizing notan) on the parafoveal preview benefit in reading.

It would be interesting to test using metamers of Moby Dick if the kind of manipulations Hrant envisions to enhance bouma divergence yield patches with significantly greater patch discriminability. Whether they do or not, the graphic appears to supply a more telling representation of what visual information is available to parafoveal vision then we've had until now.

The Rosenholtz paper is here: http://www.journalofvision.org/content/12/4/14

Peter

hrant's picture

What a cool image.
Let me try to wrap my head around this...

hhp

John Hudson's picture

Very interesting, Peter.

On the subject of perception in the periphery, I have a quite good photo of Denis Pelli's final slide from last week's Reading Digital conference, which I will post here once I have confirmed some details of what is shown. It involves a technique developed by one of Denis' grad students to ameliorate the effects of crowding.

enne_son's picture

By the way, Jeremy Freeman and Eero P. Simoncelli, the writers of the Metamers paper write: “We envision that our model could be used to optimize fonts, letter spacing or line spacing for robustness to crowding effects, potentially improving reading performance.”

Another dimension of this line of research is that these “seeing sidelong” hypotheses conflict with the conventional interpretations of the results of eye movement studies using the ‘moving window’ and ‘boundary’ technique that claim the reader is gathering letter information in parafoveal preview to facilitate word recognition. Kevin Larson adopted this interpretation to make his case for parallel letter recognition in “The Scince of Word Recognition.”

The most we could probably say is that the reader is gathering summary statistics pooled over multi-letter patches from parafoveal preview to facilitate word recognition and control eye movements and skipping.

oldnick's picture

When setting up a joke, Jay Leno occasionally cautions his audience, “Don’t get ahead of me.”

While we are actually wading in a sea of visual perception, we imagine that we are swimming in a stream of consciousness; this illusion generally proves to be useful.

John Hudson's picture

I've not heard back from Denis Pelli yet, so can't give all the details, but I'll post this image now because it seems relevant and interesting. This photograph shows Denis explaining work done by one of his grad students, who used eye-tracking and software to reduce constrast of text outside the immediate context of fixations (indicated by the red dot), and hence reduce crowding effect on word recognition in the near parafovea. I don't recall the exact result, but there was a significant improvement in reading speed among test subjects.

5star's picture

'...mongrels of patches...'

What a great phrase!

n.

hrant's picture

1) Was the ghosting changed based on a saccade being initiated? If so, were subjects in fact able to make long saccades into the low-contrast area?
2) Why even bother showing "second"?
3) What wpm are we talking about?

hhp

John Hudson's picture

As I said, I'm hoping to get full details from Denis. The slide came at the end of his presentation and represented 'late breaking news' that wasn't covered in the older paper [PDF] on which the presentation was based.

1. Good question. Might there be a an impact on saccade cueing, even if there is a net gain in reading speed?

2. I don't know for sure. It could be simply so as to prevent an unfamiliar visual experience in the fixation by maintaining symmetry. But I've not understood whether there is actually no information gleaned from the left of a fixation during reading. It seems to me that, at least, information from that side of the fixation could be used to confirm reading from the previous fixation or help trigger a regression.

3. Again, I'm awaiting details, but it seems to me that what matters in the study is relative wpm speeds, not absolute.

Nick Shinn's picture

I’ve always thought peripheral vision was quantized, rather than uniformly blurry.
I don’t believe distress (the “metamer” shown) is an adequate metaphor, either.
I suggest low res bitmapping.
Speaking of analogies from technology, how about the way the blind spot is filled in?
Just like Photoshop’s clone tool.
And if one sees the blind spot cloned in that manner, wouldn’t that suggest that there is a huge element of projected meaning shaping the periphery, not just raw interpretation of shapes? So what one “sees” already has a primitive typo-matrix-like form (because the reader is aware of the kind of document being read, so expects a certain kind of text to be in a certain place), assumed to have the fundamental structure of proto-text, not the decayed appearance of processed text.

And why do we assume that the image seen in the fovea is sharp and fully detailed like a photograph? If non-typographers can’t see the difference between Times and Century, why do we represent what they see by a fully-detailed image? Surely we only see what we recognize.

Just because light falls on the retina, doesn’t mean that it becomes information which has to be fully processed. There must be some redundancy built in, to predictively turn off rods and cones deemed irrelevant once a certain amount of semantic closure has occurred.

As Blake observed, if the doors of perception were cleansed, everything would appear as it is—infinite.

enne_son's picture

Perhaps the term ‘visualization’ in my initial post was misleading.

In a 1971 paper Herman Bouma introduced a series of shape descriptions of letters around terms focussing on sidedness, expressedness (disturbed or undisturbed; rectangular or round), aspect (oblique or rectilinear), closure (or the lack of it) , extendedness. He did this with an eye on representation in the parafovea.

The metamers are not literal images of representation in the periphery, but statistically representative simulations that show to what degree location, aspect, expressedness, closure (or lack of it) and extendedness are compromised in crowded parafoveal vision.

The compromise is large enough to inhibit or prevent sufficiently automatic and rapid visual word-form resolution or letter identification to support robust word recognition.

The other side of the decrement is that what (of expressedness, closure, aspect, etc) is preserved seems to be at just the right level of informativeness to facilitate effective saccade planning, and provide a priming-like preview benefit.

enne_son's picture

Nick, what happens at the retinal level may indeed be akin to low resolution bitmapping, but the metamers show what happens in the ‘ventral stream’ (as the title of the metamer paper indicates).

The ventral stream is the ‘belly’ or bottom part of the brain that includes the visual cortex and what has been dubbed the Visual Word Form Area. (The term Visual Word Form Area is controversial — my working tag for it is Intrinsic Integration Interface — but that's another story).

There is a ‘cortical magnification factor’ as information travels through the visual cortex, which compounds the issues surrounding the coarseness of the representation at the retinal level*, forcing a ‘pooling’ or ‘compulsory averaging’ of featural and positional statistics. The positional uncertainty and structural ambiguities captured in the metamers are the result of the compulsory averaging and pooling.

Since researchers are now able to quantify the degree of pooling, accurate simulations of positional uncertainty and structural ambiguities are available using metamers.

*[adding to clarify] representation at the retinal level is constrained or limited by the lower density of rods and cones than in the fovea.

oldnick's picture

While the level of scholarship in this thread is impressive, I am still left wondering what the practical implications or applications might be.

Has anyone come close to developing a formula to delineate particularly effective letterforms; or, do patterns of recognition suggest an optimal method of grouping words together for maximum apprehension or clarity in conveying concept?

hrant's picture

what the practical implications or applications might be

That is of course the central question. But look what happens to the /y in "myself" and "involuntary" in the top image; I would suggest that if the descender terminal (which is in effect a serif - hint, hint) were heftier you might (at least sometimes) end up with at least a blip in the metamer instance, leading to a better bouma. Ergo: timid extenders are dysfunctional.

And I have no doubt that there are deeper guidelines to be formulated.

hhp

enne_son's picture

Metamers of Moby Dick — or any other text — in a condensed sans serif font, versus metamers of the same text in a non-condensed version of the same typeface might indicate that patch discriminability is significantly compromised in the condensed, signaling a possible reduced parafoveal preview benefit and a disruption of eye-movement mechanics. Patch discriminability might be compromised, because, for example, the less pronounced obliquity, as in the arms of the x or the legs of the w might make the summary statistics of patches containing these forms less divergent from those that don't, hence less informative.

The predictions could than be tested in the domain of eye-movement and parafoveal preview benefit research, and the results used to give evidential support to our intuitions about typeface selection for extended texts. It might also become possible to plot the ratio of condensation to the disruption of the parafoveal preview benefit and optimal eye-movement mechanics.

The effect of serifs on patch discriminability could also be looked at.

So no direct formula to delineate particularly effective letterforms, but contextual knowledge about what kind of manipulations have what sort of effect, and in what part of the reading process.

[cross-posted with Hrant's post just above, whose observation is I think correct]

oldnick's picture

So, Peter, in Earth-speak what you're saying is we don’t have the data needed to determine whether or not condensed a typeface overall impairs the process of recognition; but we do know that condensing certain letterforms—specifically, w and x—does impair.

Add Hrant’s point that descenders facilitates disambiguation—e.g., /i/j · /v/y—and you come up with a VIJWXY rule of thumb: news I can use. Cool.

Ryan Maelhorn's picture

VIJWXY rule of thumb?

Té Rowan's picture

Yep. Don't go all-thumbs on them.

oldnick's picture

VIJWXY rule of thumb?

Somehow, that arrangement of letters looks better than “The i·j/w·x/v·y Disambiguation Imperative.”

hrant's picture

It's more complex than that. :-)
From my work in 1998: http://www.themicrofoundry.com/other/m&s.gif

hhp

Nick Shinn's picture

@Peter: …the metamers show what happens in the ‘ventral stream’

But how do they show it? They look like a real image and not enough like a diagram—homunculism. They are a picture impersonating a diagram (or is it vice versa?), and will be interpreted as what peripheral vision “really” looks like.

They are a piece of graphic design (see the method quoted below) which may be able to simulate experimental results. But are they any less of a metaphor than simple blurring?

They are misleading, because they are high res. They (and the other grunged-up photos) lend allure to the research, but they are somewhat of a misrepresentation, because even scientists who read and understand the paper will nonetheless think that this is what the retinal field of vision looks like, when in fact it doesn’t look like anything!

“Synthetic images were then generated by initializing them with samples of Gaussian white noise and iteratively adjusting them (using a variant of gradient descent) until they matched the model responses of the original image (see Online Methods).”

I assume that blurring was also used to bring the noised image back to a consistent resolution across the entire image (why?).

This is a fairly common Photoshop technique: first you distress the image in a rough and graphic manner, then you smooth it so that it looks more naturalistic. A gradient mask blends the filtered selection into the rest of the image. It may be used in retouching skin, for instance, to tone down blemishes.

There are several deconstructed typefaces, mostly designed in the early 1990s, that might also be considered to represent what happens in the ventral stream, but with a degree of simple formality in their distress. Types such as Arbitrary Sans suggest an alternative en-masse effect to the trope of this research, which reminded me of this from Casablanca:

Ryan Maelhorn's picture

“The i·j/w·x/v·y Disambiguation Imperative.”?

oldnick's picture

“The i·j/w·x/v·y Disambiguation Imperative.”?

Make sure the pairs listed look different enough to tell them apart clearly.

Ryan Maelhorn's picture

Who has decreed such a thing? And made it, Imperative?

Té Rowan's picture

It has just become the new Prime Directive. Read a bit further upthread.

enne_son's picture

Nick [S], despite your caveats and comparisons I think the images are telling and the research enlightening. The synthesizing of the images is quantitatively controlled. It would be like controlling the randomization in Beowulf with evidence-based pooling algorithms, and extending this to stroke location and combination.

The danger of [metaphysical and empirical] homunuculism is always there in theories of representation. This fed the representationalist / anti-representationalist debate in cognitive science sparked by Hubert Dreyfus’s and Tim Van Gelder in the early years of AI. I'm not sure of the current status of the debate, but accusing Freeman et. al., of homununculism is as misleading as you think the simulations might be.

Here's the cover of a 1999 book on the debate.

Nick Shinn's picture

…accusing Freeman et. al., of homununculism is as misleading as you think the simulations might be…

Homunculism isn’t a danger that might happen, it’s an inescapable paradox of optical science that shows images which map the field of vision.

I criticized Freeman et al.’s images from the perspective of an artist/designer, for being too naturalistic.
It seems to me that the graphic process/metaphor they have designed/chosen (“feathered” filtering), which processes a high-res typographic image, could be improved upon.

Their overall premise starts with Lettvin’s remarkable insight that emergent images acquire the quality of form, in which case generating parafoveal representations by distressing already fully realized Garamond is wrong.

In dealing with type, researchers have an advantage over photographs of natural scenes, because the semantics of typographic imagery are pre-formatted in a manner familiar to test subjects.

That is why I’m proposing that, rather than the blurry, rainy-window effect, it might be useful to represent peripheral type by something like Arbitrary Sans which looks or is partly or emergently formed, rather than post-facto distressed.

I get the feeling that there is something fractal happening, in the relationship between categoric shapes as they are perceived in the centre of vision, and the periphery. The design question is, how could parametric typefaces best be used to represent the visual process? (And make no mistake, this research is also graphic design.)

What are the alternatives to Photoshopping bitmapped Garamond, or simple Beowolf-style data-driven disruption of form?

hrant's picture

it might be useful to represent peripheral type by something like Arbitrary Sans

?
How could a piece of consciously designed personal expression fill in for what actually happens? It's like saying a Picasso shows how humans actually look.

I think you're underestimating the functional relevance of this new breed of rendering.

hhp

Nick Shinn's picture

The name Arbitrary Sans is a clue, suggesting that the letter forms are not fully resolved, which is indeed the case.


In the /a for instance, the lower right terminal is decisively rendered, but it appears to be neither quite big enough for a typical curled serif in that position, nor is it the subtle splaying of the vertical stem that is another standard treatment. It is deliberately indecisive with regard to the reader’s expectation. Protean, even.

This form therefore corresponds to Lettvin’s “I don’t exactly know but I’m beginning to get an idea” process of deciphering which underlies the research being discussed.

With this principle, text in different areas of the field of vision could be represented by parametric variants of a typeface, quantitatively deployed.

Ryan Maelhorn's picture

Where's that a from, Nick?

riccard0's picture

The name Arbitrary Sans is a clue

Ryan Maelhorn's picture

Thank you, Riccardo

hrant's picture

The first part of the name is a clue as well... :-)

hhp

dezcom's picture

I took debate and got hooked ;-)

enne_son's picture

Nick, an emergent characteristic of 1) retinal coarse-sampling (in the parafovea) of sharply rendered, but closely spaced stimuli, and 2) ventral-stream compulsory averaging of the result of c0arse-sampling is positional uncertainty. How could this be represented by parametric variants of a typeface, quantitatively deployed?

Nick Shinn's picture

That would be the design brief.

enne_son's picture

As an adjunct to this thread, see:
http://denispelli.com/2012/05/05/the-brodmann-areas-a-new-ballet/
Read the entire blog post, but take special note of the instructions in the quotation from the New Criterion. Then view the video.

Spectacular!

enne_son's picture

[John] “On the subject of perception in the periphery, I have a quite good photo of Denis Pelli's final slide from last week's Reading Digital conference, which I will post here once I have confirmed some details of what is shown. It involves a technique developed by one of Denis' grad students to ameliorate the effects of crowding.”

The student’s name is Sarah Rosen. A ppt presentation of the work on the technique is here:
http://www.google.ca/url?sa=t&rct=j&q=&source=web&cd=6&ved=0CF8QFjAF&url...
View it normally to see the notes. Viewing it as a slideshow adds extra functionality but removes the notes, so the thread of the argument gets lost. This should provide details on what is shown in your slide.

Syndicate content Syndicate content