25 Perception and Colours

T. Raghuveera

Learning Objectives:

To introduce the approaches to visual perception
To learn the visual perception theory
To learn about colour principles, colour space and colour difference
To know about good and bad usage of colours

25.1Visual Perception:

Visual perception is the ability to interpret the surrounding environment by processing information that is contained in visible light. The resulting perception is also known as eyesight, sight, or vision. The various physiological components involved in vision are referred to collectively as the visual system, and are the focus of research. The major problem in visual perception is that what people see is not simply a translation of retinal stimuli (i.e., the image on the retina). Thus people interested in perception have long struggled to explain what visual processing does to create what is actually seen.

The figure 1 above shows what may happen during the first two seconds of visual inspection. While the background is out of focus, representing the peripheral vision, the first eye movement goes to the boots of the man (just because they are very near the starting fixation and have a reasonable contrast). The following fixations jump from face to face. They might even permit comparisons between faces. It may be concluded that the icon face is a very attractive search icon within the peripheral field of vision. The foveal vision adds detailed information to the peripheral first impression.

It can also be noted that there are three different types of eye movements: vergence movements, saccadic movements and pursuit movements. Vergence movements involve the cooperation of both eyes to allow for an image to fall on the same area of both retinas. This results in a single focused image. Saccadic movements is the type of eye movement that makes jumps from one position to another position and is used to rapidly scan a particular scene/image. Lastly, pursuit movement is smooth eye movement and is used to follow objects in motion.

There is considerable evidence that face and object recognition are accomplished by distinct systems. For example, prosopagnosic patients show deficits in face, but not object processing, while object agnostic patients show deficits in object processing with spared face processing. Behaviourally, it has been shown that faces, but not objects, are subject to inversion effects, leading to the claim that faces are “special”. Further, face and object processing recruit distinct neural systems. Notably, some have argued that the apparent specialization of the human brain for face processing does not reflect true domain specificity, but rather a more general process of expert-level discrimination within a given class of stimulus,though this latter claim is the subject of substantial debate.

25.2 The cognitive and computational approaches

The major problem with the Gestalt laws (and the Gestalt school generally) is that they are descriptive not explanatory. For example, one cannot explain how humans see continuous contours by simply stating that the brain “prefers good continuity”. Computational models of vision have had more success in explaining visual phenomena and have largely superseded Gestalt theory. More recently, the computational models of visual perception have been developed for Virtual Reality systems—these are closer to real-life situation as they account for motion and activities which are prevalent in the real world. Regarding Gestalt influence on the study of visual perception, Bruce, Green & Georgeson conclude that

“The physiological theory of the Gestaltists has fallen by the wayside, leaving us with a set of descriptive principles, but without a model of perceptual processing. Indeed, some of their “laws” of perceptual organisation today sound vague and inadequate.”

In the 1970s, David Marr developed a multi-level theory of vision, which analysed the process of vision at different levels of abstraction. In order to focus on the understanding of specific problems in vision, he identified three levels of analysis the computational, algorithmic and implementation levels. Many vision scientists, including Tomaso Poggio, have embraced these levels of analysis and employed them to further characterize vision from a computational perspective. The computational level addresses, at a high level of abstraction, the problems that the visual system must overcome. The algorithmic level attempts to identify the strategy that may be used to solve these problems. Finally, the implementation level attempts to explain how solutions to these problems are realized in neural circuitry.

Marr suggested that it is possible to investigate vision at any of these levels independently. Marr described vision as proceeding from a two-dimensional visual array (on the retina) to a three-dimensional description of the world as output. His stages of vision include:

A 2D or primal sketch of the scene, based on feature extraction of fundamental components of the scene, including edges, regions, etc. Note the similarity in concept to a pencil sketch drawn quickly by an artist as an impression.
A 2½ D sketch of the scene, where textures are acknowledged, etc. Note the similarity in concept to the stage in drawing where an artist highlights or shades areas of a scene, to provide depth.
A 3 D model, where the scene is visualized in a continuous, 3-dimensional map

25.3 Transduction

Transduction is the process through which energy from environmental stimuli is converted to neural activity for the brain to understand and process. The back of the eye contains three different cell layers: photoreceptor layer, bipolar cell layer and ganglion cell layer. The photoreceptor layer is at the very back and contains rod photoreceptors and cone photoreceptors. Cones are responsible for color perception. There are three different cones: red, green and blue. Rods, are responsible for the perception of objects in low light. Photoreceptors contain within them a special chemical called a photo pigment, which are embedded in the membrane of the lamellae; a single human rod contains approximately 10 million of them. The photo pigment molecules consist of two parts: an opsin (a protein) and retinal (a lipid). There are 3 specific photo pigments (each with their own color) that respond to specific wavelengths of light. When the appropriate wavelength of light hits the photoreceptor, its photo pigment splits into two, which sends a message to the bipolar cell layer, which in turn sends a message to the ganglion cells, which then send the information through the optic nerve to the brain. If the appropriate photopigment is not in the proper photoreceptor (for example, a green photopigment inside a red cone), a condition called color vision deficiency will occur.

Transduction involves chemical messages sent from the photoreceptors to the bipolar cells to the ganglion cells. Several photoreceptors may send their information to one ganglion cell. There are two types of ganglion cells: red/green and yellow/blue. These neuron cells consistently fire—even when not stimulated. The brain interprets different colors (and with a lot of information, an image) when the rate of firing of these neurons alters. Red light stimulates the red cone, which in turn stimulates the red/green ganglion cell. Likewise, green light stimulates the green cone, which stimulates the red/green ganglion cell and blue light stimulates the blue cone which stimulates the yellow/blue ganglion cell. The rate of firing of the ganglion cells is increased when it is signalled by one cone and decreased (inhibited) when it is signalled by the other cone. The first colour in the name if the ganglion cell is the colour that excites it and the second is the colour that inhibits it. i.e.: A red cone would excite the red/green ganglion cell and the green cone would inhibit the red/green ganglion cell. This is an opponent process. If the rate of firing of a red/green ganglion cell is increased, the brain would know that the light was red, if the rate was decreased, the brain would know that the colour of the light was green.

Theories and observations of visual perception have been the main source of inspiration for computer vision (also called machine vision, or computational vision). Special hardware structures and software algorithms provide machines with the capability to interpret the images coming from a camera or a sensor. Artificial Visual Perception has long been used in the industry and is now entering the domains of automotive and robotics

25.4 Visual Perception Theory:

In order to receive information from the environment we are equipped with sense organs e.g. eye, ear, and nose. Each sense organ is part of a sensory system which receives sensory inputs and transmits sensory information to the brain.

A particular problem for psychologists is to explain the process by which the physical energy receive by sense organs forms the basis of perceptual experience. Sensory inputs are somehow converted into perceptions of desks and computers, flowers and buildings, cars and planes;Into sights, sounds,smells, taste and touch experiences.

A major theoretical issue on which psychologists are divided is the extent to which perception relies directly on the information present in the stimulus. Some argue that perceptual processes are not direct, but depend on the perceiver’s expectations and previous knowledge as well as the information available in the stimulus itself.

This controversy is discussed with respect to Gibson (1966) who has proposed a direct theory of perception which is a ‘bottom-up‘ theory, and Gregory (1970) who has proposed a constructivist (indirect) theory of perception which is a ‘top-down‘ theory. Psychologists distinguish between two types of processes in perception: bottom-up processing and top-down processing.

Bottom-up processing is also known as data-driven processing, because perception begins with the stimulus itself. Processing is carried out in one direction from the retina to the visual cortex, with each successive stage in the visual pathway carrying out ever more complex analysis of the input.

Top-down processing refers to the use of contextual information in pattern recognition. For example, understanding difficult handwriting is easier when reading complete sentences than when reading single and isolated words. This is because the meaning of the surrounding words provide a context to aid understanding.

25.4.1 Gregory (1970) and Top Down Processing

Psychologist Richard Gregory argued that perception is a constructive process which relies on top-down processing. For Gregory (1970) perception is a hypothesis. For Gregory, perception involves making inferences about what we see and trying to make a best guess. Prior knowledge and past experience, he argued, are crucial in perception. When we look at something, we develop a perceptual hypothesis, which is based on prior knowledge. The hypotheses we develop are nearly always correct. However, on rare occasions, perceptual hypotheses can be disconfirmed by the data we perceive.

25.4.2 Evidence to Support Gregory’s Theory

1. ‘Highly unlikely objects tend to be mistaken for likely objects’.

Gregory has demonstrated this with a hollow mask of a face (see above video). Such a mask is generally seen as normal, even when one knows and feels the real mask. There seems to be an overwhelming need to reconstruct the face, similar to Helmholtz’s description of ‘unconscious inference’. An assumption based on past experience.

2. ‘Perceptions can be ambiguous’

The Necker cube is a good example of this. When you stare at the crosses on the cube the orientation can suddenly change, or ‘flip’. It becomes unstable and a single physical pattern can produce two perceptions. Gregory argued that this object appears to flip between orientations because the brain develops two equally plausible hypotheses and is unable to decide between them. When the perception changes though there is no change of the sensory input, the change of appearance cannot be due to bottom-up processing. It must be set downwards by the prevailing perceptual hypothesis of what is near and what is far.

3. ‘Perception allows behavior to be generally appropriate to non-sensed object characteristics’.

For example, we respond to certain objects as though they are doors even though we can only see a long narrow rectangle as the door is ajar. What we have seen so far would seem to confirm that indeed we do interpret the information that we receive, in other words, perception is a top down process.

25.4.3 Critical Evaluation of Gregory’s Theory:

1. The Nature of Perceptual Hypotheses:

If perceptions make use of hypothesis testing the question can be asked ‘what kind of hypotheses are they?’ Scientists modify a hypothesis according to the support they find for it so are we as perceivers also able to modify our hypotheses? In some cases it would seem the answer is yes. For example, look at the figure below:

This probably looks like a random arrangement of black shapes. In fact there is a hidden face in there, can you see it? The face is looking straight ahead and is in the top half of the picture in the center. Now can you see it? The figure is strongly lit from the side and has long hair and a beard. Once the face is discovered, very rapid perceptual learning takes place and the ambiguous picture now obviously contains a face each time we look at it. We have learned to perceive the stimulus in a different way.

Although in some cases, as in the ambiguous face picture, there is a direct relationship between modifying hypotheses and perception, in other cases this is not so evident. For example, illusions persist even when we have full knowledge of them (e.g. the inverted face, Gregory 1974). One would expect that the knowledge we have learned (from, say, touching the face and confirming that it is not ‘normal’) would modify our hypotheses in an adaptive manner. The current hypothesis testing theories cannot explain this lack of a relationship between learning and perception.

2. Perceptual Development:

A perplexing question for the constructivists who propose perception is essentially top-down in nature is ‘how can the neonate ever perceive?’ If we all have to construct our own worlds based on past experiences why are our perceptions so similar, even across cultures? Relying on individual constructs for making sense of the world makes perception a very individual and chancy process.

The constructivist approach stresses the role of knowledge in perception and therefore is against the nativist approach to perceptual development. However, a substantial body of evidence has been accrued favoring the nativist approach, for example: Newborn infants show shape constancy (Slater & Morison, 1985); they prefer their mother’s voice to other voices (De Casper & Fifer, 1980); and it has been established that they prefer normal features to scrambled features as early as 5 minutes after birth.

3. Sensory Evidence:

Perhaps the major criticism of the constructivists is that they have underestimated the richness of sensory evidence available to perceivers in the real world (as opposed to the laboratory where much of the constructivists’ evidence has come from). Constructivists like Gregory frequently use the example of size constancy to support their explanations. That is, we correctly perceive the size of an object even though the retinal image of an object shrinks as the object recedes. They propose that sensory evidence from other sources must be available for us to be able to do this.

However, in the real world, retinal images are rarely seen in isolation (as is possible in the laboratory). There is a rich array of sensory information including other objects, background, the distant horizon and movement. This rich source of sensory information is important to the second approach to explaining perception that we will examine, namely the direct approach to perception as proposed by Gibson.

25.4.4 Gibson (1966) and Bottom Up Processing

Gibson argued strongly against the idea that perception involves top-down processing and criticizes Gregory’s discussion of visual illusions on the grounds that they are artificial examples and not images found in our normal visual environments. This is crucial because Gregory accepts that misperceptions are the exception rather than the norm. Illusions may be interesting phenomena, but they might not be that informative about the debate.

James Gibson (1966) argues that perception is direct, and not subject to hypotheses testing as Gregory proposed. There is enough information in our environment to make sense of the world in a direct way. For Gibson: sensation is perception: what you see if what you get. There is no need for processing (interpretation) as the information we receive about size, shape and distance etc. is sufficiently detailed for us to interact directly with the environment.

For example, support of the argument that perception is direct is motion parallax. As we move through our environment, objects which are close to us pass us by faster than those further away. The relative speed of these objects indicates their distance away from us. This is evident when we are travelling on a fast moving train.

Gibson (1972) argued that perception is a bottom-up process, which means that sensory information is analyzed in one direction: from simple analysis of raw sensory data to ever increasing complexity of analysis through the visual system. Gibson attempted to give pilots training in depth perception during the Second World War, and this work led him to the view that our perception of surfaces was more important than depth/space perception. Surfaces contain features sufficient to distinguish different objects from each other. In addition, perception involves identifying the function of the object: whether it can be thrown or grasped, or whether it can be sat on, and so on.

Gibson claimed that perception is, in an important sense, direct. He worked during World War II on problems of pilot selection and testing and came to realize: In his early work on aviation he discovered what he called ‘optic flow patterns’. When pilots approach a landing strip the point towards which the pilot is moving appears motionless, with the rest of the visual environment apparently moving away from that point.

Figure 4: Optical flow patterns

The outflow of the optic array in a landing glide.

According to Gibson such optic flow patterns can provide pilots with unambiguous information about their direction, speed and altitude.

25.4.5 Three important components of Gibson’s Theory are

1. Optic Flow Patterns

2. Invariant Features and

3. Affordances.

These are now discussed.

1. Light and the Environment – Optic Flow Patterns

Changes in the flow of the optic array contain important information about what type of movement is taking place. For example:

i ) Any flow in the optic array means that the perceiver is moving, if there is no flow the perceiver is static.

ii) The flow of the optic array will either be coming from a particular point or moving towards one. The center of that movement indicates the direction in which the perceiver is moving. If a flow seems to be coming out from a particular point, this means the perceiver is moving towards that point; but if the flow seems to be moving towards that point, then the perceiver is moving away. See above for moving towards an object, below is moving away:

2. The Role of Invariants in Perception

We rarely see a static view of an object or scene. When we move our head and eyes or walk around our environment, things move in and out of our viewing fields. Textures expand as you approach an object and contract as you move away.

There is a pattern or structure available in such texture gradients which provides a source of information about the environment. This flow of texture is invariant i.e. it always occurs in the same way as we move around our environment and, according to Gibson, is an important direct cue to depth. Two good examples of invariants are texture and linear perspective.

3. Affordances

Affordances are, in short, cues in the environment that aid perception. Important cues in the environment include:

OPTICAL ARRAY: The patterns of light that reach the eye from the environment.

RELATIVE BRIGHTNESS: Objects with brighter, clearer images are perceived as closer

TEXTURE GRADIENT: The grain of texture gets smaller as the object recedes. Gives the impression of surfaces receding into the distance.

RELATIVE SIZE: When an object moves further away from the eye the image gets smaller. Objects with smaller images are seen as more distant.

SUPERIMPOSITION: If the image of one object blocks the image of another, the first object is seen as closer.

HEIGHT IN THE VISUAL FIELD: Objects further away are generally higher in the visual field

25.4.6 Evaluation of Gibson’s (1966) Direct Theory of Perception Visual Illusions

Gibson’s emphasis on direct perception provides an explanation for the (generally) fast and accurate perception of the environment. However, his theory cannot explain why perceptions are sometimes inaccurate, e.g. in illusions. He claimed the illusions used in experimental work constituted extremely artificial perceptual situations unlikely to be encountered in the real world, however this dismissal cannot realistically be applied to all illusions. For example, Gibson’s theory cannot account for perceptual errors like the general tendency for people to overestimate vertical extents relative to horizontal ones.

Neither can Gibson’s theory explain naturally occurring illusions. For example if you stare for some time at a waterfall and then transfer your gaze to a stationary object, the object appears to move in the opposite direction . Bottom-up or Top-down Processing?

Neither direct nor constructivist theories of perception seem capable of explaining all perception all of the time. Gibson’s theory appears to be based on perceivers operating under ideal viewing conditions, where stimulus information is plentiful and is available for a suitable length of time. Constructivist theories, like Gregory’s, have typically involved viewing under less than ideal conditions.

Research by Tulving et al manipulated both the clarity of the stimulus input and the impact of the perceptual context in a word identification task. As clarity of the stimulus (through exposure duration) and the amount of context increased, so did the likelihood of correct identification. However, as the exposure duration increased, so the impact of context was reduced, suggesting that if stimulus information is high, then the need to use other sources of information is reduced. One theory that explains how top-down and bottom-up processes may be seen as interacting with each other to produce the best interpretation of the stimulus was proposed by Neisser (1976) – known as the ‘Perceptual Cycle’.

As shown in Figure 7 colours play an important role in visualizations. There are various colour models. Most usable models are RGB, HSV and HSL

Figure 8 shows the pictorial representation of HLS colour model. Colours often provide Good design, focus attention using contrast ,Unifies using analogy. The three primary colours are Red, Blue and Yellow. The three secondary’s are Purple, Green and Orange. Different types of colour blend with each other. chroma scale shows same value and hue, but different saturation.

25.5.1 Controlling colour value:

Artists create different tints, tones and shades of colours. Artists think about gradation and mixtures that may not lie precisely along the Tint, Tone and Shade. Tint is lightened desaturated hue by adding white, Tone is darkened and grayed by adding black and Shade is obtained with both white and black. Strengthen or Weaken simultaneous contrast. Remember, Depth of field varies with wavelength. Low intensity will lead to vibrating edges. It is good to avoid blue edges. Avoid combination with the variation in DOF especially on dark backgrounds like black.

25.5.2 Colour space:

A colour space is mathematical model for describing colour. Following are colour space RGB, HSB, HSL, Lab and LCH. RGB is the most common in computer use and is shown in figure 9, but it is least useful for design, our eyes do not decompose colours into RGB constituents. HSV, describes a colour in terms of its hue, saturation and value (lightness), models colour based on intuitive parameters ,it is more useful. Colour make the human brain to differentiate the data faster. Colour used poorly is worse than no colour at all in showing the difference. Colour can cause the wrong information to stand out and make meaningful information difficult to see.

25.5.3 Colour to Label:

Colour has very low level perceptual phenomenon, Pops out the features, Colour to label can be used to group again due to pop out feature. Colour to label can be more effective when a small number of colours are used against neutral background. It is always suggested to remember names instead of hues. Be careful so that information should not conflict with colour names as shown in figure 10 . e.g. green stop sign.

25.5.4 Colour to Quantify:

Most natural scales will vary in value or saturation as shown in figure 11. Used in Cartography. Perceptually no hue scaling is done. It is Qualitative. Same value but a different hue. It is more Sequential. Scale in value/saturation. It is Diverging and Cross fades through neutral

25.5.5 Colour selection and design:

Colour selection and design has Colour harmony. It has Constraint by practical and functional limits. Colour selection and design is dictated by perception, conventional and material cost matters. Warm red and yellow palette looks vibrant, Cool is represented as blue and green, Saturated colours represent Youth and Subdued/Unsaturated represents Sophistication/Maturity.

25.5.6 Making colour robust:

Accommodating viewers with anomalous vision, Good contrast in values, Reinforce with encoding in shape and size. Stop sign is hexagonal in addition to being red. Accommodating different media. Gamut Mapping can be lightening or darkening with hue shifts. Scales can be Uniform and non uniform. Usually try to map a few key colours. Define some robust way to move between them in a consistent fashion.

25.5.7 Good and Bad Uses:

Colours should be more clarifying and should not be so confusing but should be more tasteful. Colours should not be clumsy and should be more robust. Colours should do no harm policy.