汪彪_毕业论文[排版]

来源：九壹网

摘要

自然场景中的视觉刺激许多都可以根据亮度的差别很容易得从背景中剥离出来，这就是所谓的“一阶刺激”，同样的，也有许多刺激是由对比度和纹理的差别定义的，这些被称作“二阶刺激”。人类生理和心理学的研究发现，对于一阶和二阶刺激，在人类视觉系统的初级视皮层中存在两条并行的信号处理通路。一阶刺激处理机制可以被线性滤波器模型很好的描述，而二阶刺激处理系统可以被建模为一个被称作“滤波器-整流-滤波器”模型。

已经发现，一阶及二阶处理机制业已在计算机视觉领域找到各自重要的应用价值。一阶处理机制是一些轮廓提取方法的生物学依据，而二阶处理机制在纹理分割和纹理分类等方面有其重要的应用价值。

本文深入研究这两种不同的处理机制并尝试在计算机上建模仿真。这里，单一尺度和多尺度的处理都将被讨论到。对于一阶处理，本文提出一个基于多尺度处理的简单边缘检测方法，并分别考虑了灰度和彩色图像两种情况。对于二阶处理，本文提出一种基于多尺度处理的纹理分割方法，同样的，也考虑了灰度和彩色图像两种情况。另外，由于两种机制都采用的多尺度处理，多个通道间的整合方式也是本文将要讨论的，对一阶系统，我们采用所谓的“赢者取全”的整合策略，对亮度定义边缘的检测取得了较好的效果。对二阶系统，我们采用K均值聚类算法将属于同一纹理区域的像素点标记为同一区域标识。实验结果表明，文中提出的基于“滤波器-整流-滤波器”模型的分割策略对灰度图像能够取得不错的效果，但对彩色图像而言，虽然效果比将之直接转化为灰度图像进行处理要好，但依然不能令人满意。

关键字: 人类视觉系统一阶刺激二阶刺激多尺度处理整合策略边缘检测

纹理分割

ABSTRACT

Naturally occurring visual stimuli are rich in examples of objects delineated

from their backgrounds simply by differences in luminance, so-called first-order stimuli, as well as those defined by differences of contrast or texture, referred to as second-order stimuli. Researchers of human psychophysics found that there are two separate parallel signal-processing pathways in the early visual cortex of human vision system(HVS) for first- and second-order stimuli. While first-order system can be described as linear spatial filters model, second-order processing system can be modeled as a “Filter->Rectify-Filter(FRF)” scheme.

The first- and second-order processing mechanisms have found its value in the application of computer vision. First-order processing mechanism is the basis of some biological-inspired contour detection method, while second-order processing mechanism proved to be valuable in the application of texture segmentation and classification.

In this thesis, I’ll do some research on these two processing mechanism and simulate them on PC and both single-scale and multi-scale processing will be considered. For first-order processing, a simple edge detection method based on the multi-scale processing is proposed for both grey and color image. For second-order processing, a texture segmentation method based on the multi-scale processing is proposed for both grey and color image. The pooling scheme for both the first- and second-order multi-scale processing will also be discussed. For first-order processing system, we adopt a simply pooling scheme, which is so called ”winner-take-all”. However, for second-order processing system, K-means clustering algorithm is used to label each pixel to the right cluster in which all the pixels belonging to the same texture region. Experiment result showed that the proposed texture segmentation algorithm can achieve a good performance for grey image, while for color image, it’s not quite satisfying，although it’s better than directly converting it to grey, .

Keywords: human vision system, first-order stimuli, second-order stimuli,

multi-scale processing, pooling scheme, edge detection, texture segmentation

Table of Contents

摘要..........................................................................................................................I ABSTRACT.................................................................................................................II 1. Introduction...........................................................................................................1

1.1 Motivation of This Research.......................................................................1 1.2 Background..................................................................................................2 1.3 Content Arrangement for This Paper........................................................7 2. First-order Processing.............................................................................................8

2.1 Neural and Physiological Basis.....................................................................8 2.2 Gabor Filters................................................................................................10 2.3 Single-Scale First-order Processing............................................................15

2.3.1 Computational models of simple and complex cells......................15 2.3.2 Pooling scheme over orientations : winner-take-all.......................20 2.4 Multi-Scale First-order Processing.............................................................21

2.4.1 Pooling scheme over frequencies : winner-take-all........................22 2.4.2 Filter bank design..............................................................................22 2.4.3 Post-processing: thinning and thresholding...................................24 2.5 Color Image Processing...............................................................................27

2.5.1 Color vision and the opponent theory.............................................27 2.5.2 First-order computational model for color images........................28

3. Second-order Processing.......................................................................................30

3.1 Neural and Physiological Basis...................................................................30 3.2 Single-Scale Second-order Processing........................................................33 3.3 Multi-Scale Second-order Processing.........................................................36

3.3.1 Problems to build the computational model...................................36 3.3.2 Texture segmentation based on a different “FRF” scheme...........37 3.4 Texture Segmentation for Color Image......................................................44 4. Conclusion..............................................................................................................46 5.References................................................................................................................47

III

1. Introduction

1.1 Motivation of This Research

Computer vision is an important branch of computer science and artificial intelligence. We obtain about 80% of the surrounding information from Human Visual System(HVS). To get the scene description from the input images , there should be a series of complex information processing and interpreting processes. Eyes get the surrounding stimulus information and our brain processes and interprets the stimulus via complex mechanisms, thus gives the stimuli clear physical meaning. So, there should be two reasons for which computer vision is important, firstly, it’s attractive to simulate the function of HVS on a computer ; secondly, computer vision may help us understand the mechanisms of HVS better.

As can be seen in Fig1.1, naturally occurring visual stimuli are rich in examples of objects delineated from their backgrounds simply by differences in luminance, so-called first-order stimuli, as well as those defined by differences of contrast or texture, referred to as second-order stimuli.

Fig.1.1. Boundary types in visual scenes: first-order and second-order , first-order: vary in luminance, second-order: do not vary in luminance, in this example, it’s defined by texture variance

During last several decades, scientists of neurophysiology and human psychophysics have done a lot of research work on the stimuli processing mechanism of HVS. Particularly compelling psychophysical evidences show that there are separate parallel signal-processing pathways for first- and second-order stimuli. Luminance-defined stimuli are processed conventionally with a linear spatiotemporal filter, While second-order stimuli can be well understood in terms of a type of nonlinear ‘filter -> rectify ->filter’ model[4].

My aim is to find more details about these first- and second-order models and the choice of the related parameters, and then simulate these models with programs. With these work , we can understand better the way HVS works and make sure what can first- and second-order system see respectively.

1.2 Background

Fig.1.2 shows a probable model of the human visual system.Part 1 is the optics of the eye which focuses a picture onto the retina (part 2). The photoreceptors (cones and rods) spread over the retina. After absorbing lights, the photoreceptors transfer the light signals into neuro-electrical signals and transmit to the next stage with opponent center-surround mechanism . Individual receptors do not have private lines up to the next visual centers. Rather, multiple receptors converge toward subsequent neural units on their way to higher visual centers.This convergence results in a physiological concept known as receptive fields. Receptive fields of cells before the cortex stage (such as bipolar cells and ganglion cells in retina, and LGN cells) have the concentric lateral inhibition form. The optical nerves of the two eyes join and split in the optic chiasma (part 3). Within the lateral geniculate nucleus (part 4), visual information is divided into two parts: Mangocellular system mainly dealing with motion perception and spatial information;and parvocellular system mainly dealing with color, shape, texture etc. Next to the lateral geniculate nucleus (LGN), the visual information is delivered to the cortex (VI, part5).

Fig.1.2 Model of human vision system

Early visual cortex neurons are selective for a remarkable variety of stimulus

attributes , e.g. orientation and the spatial frequency. Like neurons earlier in the visual pathway, they respond only to stimuli presented within a small region of visual space, the neuron’s receptive field. The receptive fields of these neurons have spatially segregated sub-regions, alternating in excitatory and inhibitory effect (Fig.1.3A). Most of the past and current psychophysical research of this field use specially designed artificial stimuli to find the physiological basis of first- and second-order processing mechanisms. The reason is as following:

The natural visual world consists of a complex interaction of texture, color, motion, surface reflectance, and changes in illumination and contrast, whereas much simpler stimuli such as gratings, plaids, white noise, etc. have typically been employed to study vision. These artificial stimuli have proven useful because, unlike natural images, they are simple to generate and have well-defined, quantifiable properties suitable for analysis of visual processing mechanisms.

A B C

Fig.1.3. (A) Cartoon model of oriented, excitatory and inhibitory sub-regions of simple-type receptive field in visual cortex. (B) One frame of a first-order, luminance sine-wave grating with a superimposed cartoon receptive field. The spatial frequency and orientation of the stimulus optimally matches the neuron’s receptive field. (C) One frame of a second-order envelope

stimulus. The carrier is vertical and of a high spatial frequency, whereas the envelope is oriented at 45° and at a lower spatial frequency. Notice that the gray levels falling in both excitatory and inhibitory regions of the receptive field average to zero (mid-gray). If the cell’s receptive field summed linearly, then no response would be expected.

Examples of first- and second-order stimuli used in these experiments are shown in Fig.1. 3B and C with a schematized neuron’s receptive field superimposed. The luminance-defined sine-wave grating is optimally matched to the neuron’s receptive field (Fig.1. 3B) and should elicit a vigorous response. In this example, the second-order envelope stimulus in Fig.1.3C consists of a stationary, vertical, high-spatial-frequency sine-wave grating (carrier), whose contrast is modulated by a drifting 45°, low-frequency grating (envelope). Because the carrier is of a finer spatial grain than the receptive field, the average luminance of the envelope stimulus is the same across both inhibitory and excitatory regions. Nevertheless, neurons have been found that respond selectively to such envelope stimuli.

A continuing question has been whether second-order stimulus attributes are processed by a distinct mechanism from that which handles first-order ones.

Numerous studies have provided evidence for such separate processing, but the question has remained controversial.However, particularly compelling psychophysical evidence has emerged from recent work showing that first- and second-order stimuli are processed by separate parallel signal-processing pathways[4].

Fig.1.4. Model of cortical neuron response, having separate parallel signal-processing pathways for first- and second-order stimuli. In the top path, luminance-defined stimuli are processed conventionally with a linear spatiotemporal filter. The bottom path processes second-order stimuli, and has a ‘filter -> rectify -> filter’ cascade consisting of early linear filtering subunits, a

nonlinearity (e.g. rectification), and a late linear filter. The early filters are shown superimposed on a contrast envelope stimulus, to illustrate how they could mediate selectivity for carrier properties, while the late filter is shown on the full-wave rectified stimulus to illustrate its selectivitiy for properties of the envelope. The neuron’s response reflects a combination of both pathways.

Fig.1.4 illustrates the complete model with separate streams or paths for processing first- and second-order stimuli. A simple luminance grating can elicit a response only via the first-order pathway: regardless of its spatial frequency, it can never satisfy the conflicting demands of the highly discrepant spatial frequency tunings of the early and late filters of the Filter-Rectify-Filter(F->R->F for short) stream. Similarly, a contrast envelope stimulus will exert an effect on the final output only via the F->R>F stream, for the reasons discussed already.

Since 1980 several theorists concerned with spatial vision have recognized the suitability of Gabor's elementary signals as models for simple-cell receptive-field profiles, provided that these cells behave essentially linearly, as several investigators have confirmed. Typically two to five interleaved regions of excitatory and inhibitory influences weighted by a tapering envelope constitute the receptive-field profile of a simple cell, and Gabor signals with suitably chosen parameters invariably give a good fit to such spatial profiles.

Such receptive-like Gabor filters signal changes in properties defined by single points in an image, such as modulations in luminance or color (first-order image properties); however, the first-order information does not convey second-order (or non-Fourier) properties of an image, such as variations in texture or contrast, whose detection requires the comparison of at least two image points. Psychophysical studies have shown that the human visual system is sensitive to these second-order properties and have indicated the existence of mechanisms specialized for their analysis. Additional evidence for distinct processing of first- and second-order information comes from studies of patients with cortical lesions, which show selective deficits in first- or second-order perception , human brain imaging of cortical areas differentially responsive to first- or second-order stimuli, and single-unit neurophysiology.

Whereas first-order information can be extracted by a single stage of filtering (Fig.1.5, top), most models of second-order processing use a filter–rectify–filter (FRF)cascade (Fig.1.5, bottom)[1]. In this scheme the first filter (bandpass for high

spatial frequencies) detects fine grain detail (texture, local contrast), while the second filter(bandpass for low spatial frequencies) detects large scale variations in the textural properties, e.g., modulations of contrast, spatial frequency, or orientation. In an artificial image such as a contrast envelope (high-spatial-frequency sine-wave grating carrier, in which the contrast is modulated by a low-spatial frequency envelope, e.g., Fig.1.3C), the first filter would detect the carrier while the second filter would detect the envelope. Such a model is supported by both neurophysiology and human psychophysics, in which both the first- and second-stage filters are bandpass for spatial frequency and orientation. The intervening static nonlinearity acts to demodulate the early-stage filtered signal in order to enhance areas of large response variation. Most such models use full-wave rectification to simulate this nonlinearity, though any of a variety of static nonlinearities will suffice, provided that they contain even order components[1].

Fig.1. 5. Example of (top) first-order model and (bottom) filter–rectify–filter (FRF) model. The

first-order model convolves the image with a single Gabor, to detect luminance variations. The second-order model convolves the image with a high-spatial-frequency Gabor filter (F1), whose full-wave rectified response is then convolved with a second Gabor (F2) having a lower spatial frequency to detect texture variations. In this example the first-order (F0) and second-order (F2) filters (shown at magnified scale) have the same orientation, phase, and spatial wavelength. The first-order filtering detects the luminance change in the boat hulls and the dock, while the second-order filter detects the texture and contrast changes at the same locations.

Many problems exist when people try to build computation models for the first-

and second-order processing mechanisms according to the existed results from neural

and physiological research. For example, what’s the exactly post-processing mechanism in first- and second-order pathway after the input is filtered by the Gabor-like receptive-field? Is there a relationship between the optimal orientations of the first- and second-stage filter in the FRF model? Some results support the notion that the first-stage filters are predominantly of the same orientation as the second-stage filters, whereas other findings suggest orthogonal inputs. Conclusion by C. L. Baker’s experiment[1], imply that there is no rigid orientation combination between the first and second stage filters. But is this true? Further physiological research will tell us.

Although many underlying neural and physiological basis are currently not clear or unknown, Many researchers are trying to build biologically motivated computation-models to solve variant problems ,e.g. motion perception, edge and contour detection, texture segmentation and classification ,on the basis of the existed physiological discovery on HVS. Some of them have achieve results as good as traditional non-biological theory and methods, e.g. many good-performance texture segmentation model has been proposed[5][6][7], first- and second-order motion perception[13], which is one of the research field of my supervisor, Pro.N.Sang, also a contour detection model based on nonclassical receptive field inhibition has been proposed by Doc. Q.L.Tang[16].

1.3 Content Arrangement for This Paper

The content arrangement for left sections of this thesis is as following:

In section 2, we will discuss the first-order processing model. First, a brief introduction of the neural and physiological basis of the first-order processing mechanism will be given; Second, some details and properties of the important Gbaor filters will be introduced; Third, a single-scale first-order process for grey image will be discussed and some experiment results will be given; Fourth, a multi-scale first-order process for grey image edge detection will be discussed and some experiment results will be given; Finally, we will try to extend our work to color images, which are more common nowadays.

In section 3, we will discuss the second-order processing model. First, a brief

introduction of the neural and physiological basis of the second-order processing mechanism will be given; Second, single-scale second-order process for grey image will be discussed and some experiment results will be shown; Third, for there are few computation models proposed on the contrast-defined second-order stimuli, we will put our emphasis on texture-defined second-order stimuli. In this part, a multi-scale gabor-filter bank based texture segmentation method has been proposed, a post-process pooling scheme based on K-Means clustering algorithm is proposed, some experiment results will be given; Finally, we will also try to extend our work to color images.

In part 4, we will give a conclusion to the previous work, also future research plan will be given.

2. First-order Processing

2.1 Neural and Physiological Basis

Simple visual stimuli whose principal attributes can be characterized as variations in local luminance are referred to as first-order stimuli. A first-order stimulus which has proven particularly useful for visual science experiments is a sinewave grating (Fig1. 3B): in a static image it’s entirely specified by parameters of orientation, spatial frequency, spatial phase, and contrast. Notice that attributes of this stimulus, such as its orientation and spatial frequency, are entirely specified by changes in luminance across space.

Early visual cortex neurons are selective for a remarkable variety of stimulus attributes. Like neurons earlier in the visual pathway, they respond only to stimuli presented within a small region of visual space, the neuron’s receptive field.Studies demonstrated it has selectivity for the orientation and also the spatial frequency of sine gratings.

These and other kinds of stimulus selectivity are easily understood for at least one kind of cortical neuron, termed ‘simple cells’: the receptive fields of these neurons have spatially segregated sub-regions, alternating in excitatory and inhibitory

effect (Fig.2. 1A). A light bar presented in an excitatory region will elicit a transient discharge at its onset, an ‘On-response’; it will also elicit a transient discharge when removed from an inhibitory region (presumably due to a rebound from inhibition), termed an ‘Off-response’. Dark bars have the opposite effects, eliciting On-responses, from inhibitory zones and Off-responses from excitatory zones. In visual cortex receptive fields, these excitatory and inhibitory regions are elongated and occur in alternation. The stimulus orientation selectivity of such a neuron can be understood in terms of the competing influences of these regions, if it is assumed that they are linearly additive, prior to a simple threshold. Thus if a light bar or a grating(Fig.2. 1B) is presented at an orientation different from that of the elongated sub-regions, their excitatory and inhibitory influences will cancel one another out. But an oriented bar or grating at the same orientation (Fig.2. 1C) will cause consistent stimulation of these sub-regions, which will synergistically sum to a strong response.

Selectivity for spatial frequency of sine-wave gratings can be understood similarly. A maximal response is obtained only if the spacing of light and dark regions of a grating correspond well with the spacing of excitatory and inhibitory regions(Fig. 2.1C), otherwise there will be a failure of effective summation, even if the orientation is optimal (Fig. 2.1D). But the sine-wave grating which maximally activates a simple type cell will be one whose light and dark bars are in register with the alternating excitatory and inhibitory sub-regions (Fig. 2.1C).

A. B.

C. D.

Fig. 2.1. Spatial selectivity of cortical neurons for sine-wave gratings understood in terms of linear summation. (A) Cartoon model of oriented, excitatory and inhibitory sub-regions of simple-type receptive field in visual cortex. (B) Sine-wave grating at differing orientation elicits poor summation of sub-regions, and no response. (C) An optimal grating is one whose dark and light bars have orientation and spacing corresponding to the neuron’s sub-regions, eliciting maximally effective summation. (D) Grating of correct orientation, but nonoptimal spatial frequency gives poor response because bars are not aligned with neuron’s sub-regions.

Many cortical neurons, termed ‘complex cells’, differ from simple-type cells in having On- and Off-responses which are not spatially segregated; nevertheless they also show a strong selectivity for orientation, spatial frequency, and velocity. However their selectivity can be understood in similar terms, if they add up the responses of a small pool of nearby simple-type cells, having the same selectivity but slightly differing receptive field positions.

2.2 Gabor Filters

A very large body of human psychophysics has supported the idea that the visual system can be considered an ensemble of such linear spatial filters, selective for orientation and spatial frequency. In such a ‘bank of filters’ scheme, threshold for stimulus detection is determined by the most sensitive filter[4].

But there is a question, to simulate the early visual cortical vision on computer. What kind of filters can best fit the functionality of the spatial-frequency and orientation selective receptive field of the visual neurons ?

In 1985, John G. Daugman showed in his important paper[9] that 2D linear spatial filters are constrained by general uncertainty relations that limit their attainable information resolution for orientation, spatial frequency, and two-dimensional (2D)

spatial position, which limit their attainable joint resolution in the 2D space domain and the 2D frequency domain. The family of 2D Gabor filters optimizes these uncertainty relations and thus achieves the theoretical limit of joint resolution in an information hyperspace whose four axes are interpretable as the two coordinates of visual space plus the polar spectral variables of orientation and spatial frequency. Each such theoretical filter occupies one irreducible quantal volume in the information hyperspace, corresponding to an independent datum, and thus an ensemble of such filters can encode information with optimal efficiency along these four coordinates.

In conclusion, plenty of compelling evidences show that in early cortical vision of HVS, input signal is decomposed into variant frequency and orientation channels by a filter bank with different spatial-frequency and orientation selectivity. Gabor transform can best fit this signal decomposition process, because it has two important features: first, it has good localized property in both spatial domain and frequency domain; second, evidences are presented that the 2D receptive-field profiles of simple cells in mammalian visual cortex are well described by Gabor elementary function. So, we can use a filter bank of Gabor filters with different spatial-frequency and orientation to process the image, so as to get different scale and orientation features.

Now let’s take a look at the properties of Gabor Filters.

Generally speaking, there are two types of Gabor filters: cos-phase Gabor filters and sin-phase gabor filters. A Gabor function in the spatial domain is a sinusoidal modulated Gaussian. For a 2-D Gaussian curve with a spread of σxand σy in the x and y directions, respectively,and a modulating frequency of ωf, a cos-phase Gabor filter is given by

⎧⎡⎤⎫xcosθ+ysinθ−xsinθ+ycosθ()()⎪⎪ffff

⎥f(x,y,σx,σy,ωf,θf)=exp⎨−⎢+⎬22

2πσxσy2σ2σ⎢⎥xy⎪⎪ (1) ⎦⎭⎩⎣

cos2πωf(xcosθf+ysinθf){}Similarly, a sin-phase Gabor filter is Gaussian envelop modulated by sinusoid carrier.

Cos-phase Gabor filter is even and thus symmetric, while sin-phase Gabor filter is odd and thus anti-symmetric, as show in the following Fig.2.2:

A B

C D

Fig.2.2 2D and 3D visualization of Gabor filters. Note that cos-pahse is even and thus symmetric, while sin-phase Gabor filter is odd and thus anti-symmetric (A):2D visualization of cos-pahse Gabor filters (B) 2D visualization of sin-pahse Gabor filters (C) 3D visualization of cos-pahse Gabor filters (D) 3D visualization of cos-pahse Gabor filters

In the spatial-frequency domain, the Gabor filter becomes two shifted Gaussians at the location of the modulating frequency. For a cos-phase Gabor filter ,its fourier transform is：

Fωx,ωy,σx,σy,ωf,θf

()221

=exp−2π2⎡σx2(ωxcosθf+ωysinθf+ωf)+σy2(−ωxsinθf+ωycosθf)⎤ (2)

⎣⎦2

221

+exp−2π2⎡σx2(ωxcosθf+ωysinθf−ωf)+σy2(−ωxsinθf+ωycosθf)⎤

⎣⎦2

{{}}

Fig.2.3 shows the frequency response graphically:

A B

Fig.2.3 2D and 3D visualization of the frequency response cos-pahse Gabor filter show in Fig.2.2A

Relationship between Frequency bandwidth and center frequency

For the Relationship between Frequency bandwidth and center frequency isn’t correlated with the orientation, we could consider Eq (2) when θf=0:

Fωx,ωy,σx,σy,ωf,0=

()2

exp−2π{2

⎡σx(ωx+ωf)+σyωy⎤+exp−2π⎡σx(ωx−ωf)+σyωy⎤⎣⎦2⎣⎦

{} (3)

}=0.5，then the frequency bandwidth is

2ω−ω，here we assume that whenω enableexp{−2πσ(ω−ω)}=0.5, the value of exp{−2πσ(ω+ω)} is so small that it can be ignored;

setωy=0 and exp−2π2σx2(ω1−ωf)f

⎡ln21⎤thus we getω1=±⎢⎥+ωf

⎣2πσx⎦The spatial-frequency bandwidth can be expressed in octaves so that it is commensurate with physiological data. For octave bandwidthBf,we have:

=(ωf+⎢⎡ln21⎤⎡ln21⎤ω−)/(⎥⎢⎥) (4) f

⎣2πσx⎦⎣2πσx⎦Also we get σxis determined byωfandBf: σx=

2+1

BfBf

2−1πωf

•

•ln22 (5)

Relationship between Orientation bandwidth and center frequency

Also set θf=0,that is, eq(2),we assume ωx=ωf and exp{−2π2σy2ωy2}=0.5,here

2222

σ•4ω+σω2⎤we assume that whenωyenableexp{−2π2σy2ωy2}=0.5,exp{−2π2⎡xfy⎣⎦} is

too small to be ignored, then we get ωy=

ln22•

πσy

⎛ln2ωy1⎞

Frequency bandwidth Bθ=2arctan=2arctan⎜•⎟ (6)

2πσωωf

yf⎠⎝Also we get σxis determined byωfandBθ： σy=

ln22•1

πωf

•

Bθtan

Multi-Scale Gabor filters

In application, a filter bank with different scale and variant orientation is always adopted. We rotate and scale the cos-phase Gabor filter shown in Fig.2.2A, and we will get a filter bank, as is show in Fig.2.4, in which the circularly-neighbored filters has an orientation difference of π/8;and the radial-neighbored filters have a frequency bandwidth of 1 octave.

Fig.2.4. We rotate and scale the cos-phase Gabor filter shown in Fig.2.2(A) to get this filter bank, this Figure is in the spatial frequency domain

The gray images of the above 4 (scale) x 8 (orientation) Gabor filters are shown in Fig.2.5:

Fig.2.5 Even Gabor filters with different scale and orientation(4 scales and 8 orientations)

2.3 Single-Scale First-order Processing

2.3.1 Computational models of simple and complex cells

Since simple cells play a substantial role in the following, we first briefly introduce a computational model of this type of cell. The response r of a simple cell which is characterised by a receptive field function g(x; y) to a luminance distribution image f(x; y); (x,y)∈Ω, is computed as follows (Ω -visual field domain):

χ(∫∫f(x,y)g(x,y)dxdy) (1)

where χ is the Heaviside step function (χ(z) = 0 for z < 0,χ(z) = z for z >0).

We use the following family of two-dimensional Gabor functions metioned above to model the spatial summation properties of simple cells:

⎧⎪⎡x'2y'2⎤⎫⎪

exp⎨−⎢2+2⎥⎬cos{2πωfx'+ϕ}gε,η,wf,θ,ϕ(x,y)=

2πσxσy⎪⎩⎣2σx2σy⎦⎪⎭x'=(x−ε)cosθf−(y−η)sinθf

(2)

y'=(x−ε)sinθf+(y−η)cosθf

where the arguments x and y specify the position of a light impulse in the visual field and ε,η,wf,σx,σyand ϕ are parameters as follows:

The pair (ε,η), which has the same domain as the pair (x; y), specifies the centre of a receptive field within the visual field. The standard deviation σx,σyof the Gaussian factor determines the (linear) size of the receptive field. They are determined by ωf,Bfand Bθ just as we have calculated in section 2.1.2 , Physiological evidence shows that half-response spatial frequency bandwidth Bθ is approximately one octave[14]. the parameter ϕ(ϕ∈(−π,π]), which is a phase offset in the argument of the harmonic factor cos(2πωfx'+ϕ),determines the

symmetry of the function

gε,η,wf,θ,ϕ(x,y):for

ϕ = 0 and ϕ = 180 it is symmetric,

or even, with respect to the centre (ε,η) of the receptive field; for ϕ = −90 and ϕ= 90, the function is antisymmetric, or odd, and all other cases are asymmetric mixtures of these two. In our experiments we use for ϕ the following values: ϕ= 0 for symmetric receptive fields to which we refer as ‘centre-on’ in analogy with retinal ganglion cell receptive fields whose central areas are excitatory; ϕ=πfor symmetric receptive fields to which we refer to as ‘centre-off’ since their central lobes are inhibitory; and ϕ=π/2 and ϕ=-π/2 for antisymmetric receptive fields with opposite polarities.

From the above discussion, the function of a simple cell can be modeled as a linear Gabor filter with a half-wave rectifier(in many cases it’s not necessary). Fig.2.6 shows the response of the sum of two luminance-defined sine gratings with variant frequencies:

+ = B

D E F

Fig.2.6. The response of the sum of two luminance-defined sine gratings with variant frequencies. (A) A stimulus obtained by the sum of a sine-grating(256*256) with wavelength 16 and wavelength (B) The amplitude Fourier spectrum of the stimulus shown in A. (C) The amplitude Fourier spectrum (magnified) of the a Gabor Filter with wavelength 16 and orientation the same as the stimulus. (D) The response of the stimulus filtered by Gabor shown in C (E) The amplitude Fourier spectrum (magnified) of the a Gabor Filter with wavelength orientation the same as the stimulus. (F) The response of the stimulus filtered by Gabor shown in E

From Fig.2.6 we can clearly know how Gabor’s frequency selectivity work in spatial frequency domain.

Complex cells, in contrast to simple cells, display several strong nonlinear properties. Hence, it is not possible to describe them adequately by linear models, and we have to consider nonlinear model neurons. Identical to the choice in a number of other studies we chose the two subunit energy model [11].

Each such model neuron consists of two subunits (Fig.2.7 ). Each of the subunits computes the scalar product of the same input patch (I) with a weight vector (W1,i, W2,i respectively). Hence each neuron is characterized by two linear receptive fields. Both outputs are subsequently squared and summed to define the neurons activity:

Ai=(W1,iI)2+(W2,iI)2

Fig.2.7 model of complex cells

These simulated neurons can, given appropriate weights, exhibit a large variety of response properties. Most of these properties are never observed for real neurons. The simulated neurons can, however, also act like a complex cell if both subunits have Gabor-wavelet-like receptive fields with identical orientation and spatial frequency, and the two wavelets have a relative phaseshift of 90° .If such a neuron is excited by a visual stimulus in form of a bar that is moved over its receptive field, each subunit has an activity that depends on the bar’s position. As the bar is shifted, the subunits alternate in having large squared activity. Thus the neuron’s activity, the sum of the squared subunits activities, changes only little as the bar is moved within the receptive field.

As discussed above, even and Odd Gabor filters’s module response can simulate complex cell, we called this Gabor Energy:

Eg(x,y,θ)=Ee2(x,y,θ)+Eo2(x,y,θ) (3)

Ee(x,y,θ)=he(x,y,θ)⊗I(x,y) (4) Eo(x,y,θ)=ho(x,y,θ)⊗I(x,y) (5)

In which, ⊗ represent convolution, I is input image, he and hoare symmetric and anti-symmetric Gabor filter. Fig.2.8 give the results the experiment results when the input image is filtered by even, odd, and Gabor energy respectively, from which we can see that Gabor energy can represent the features of the line and edges better:

A. B

C D

Fig.2.8. Simulated complex cell function with even, odd, and Gabor energy (A) Input image(B) Convolved with a vertical-orientation even Gabor filter (C) Convolved with a vertical-orientation Odd Gabor filter (D) results got from Gabor energy

In all the above experiment results, we use a Gabor filter with a particular scale and orientation. Now let’s see the experiment results of several synthetic image with a filter bank of the same scale while with multiple orientations(Fig2.8)

Original image

Channel 1: Orientation 0

Channel 2: Orientation 45

Channel 3: Orientation 90

Channel 3: Orientation 135

Pooling: WTA for orientations

Fig.2.8 the experiment results of several synthetic image with a filter bank of the same scale while with 4 orientations, the orientation bandwidth isπ/4 ,from the above results, we can see the filter results of four different orientation channels respectively, the last row show the results which is pooled over the four different orientation channels by the scheme “winner-takes-all(WTA)”,which will be discussed below. The left column shows the selectivity for edges with different orientation, note that the response of smallest square’s edge is weaker, which will not exist in multi-scale scheme; The middle column shows the selectivity for sine-gratings of the same frequency but different orientations; The right column shows linear-filter model has little response to the second-order envelope stimulus, here the filter frequency is in correspondence with the envelop frequency.

2.3.2 Pooling scheme over orientations : winner-take-all

Different orientation channels give the features of different orientations, but what we see is the pooling response from these channels. However, the pooling mechanism in HVS is currently not clear. So, we will use the simplest scheme here, that is so-called “winner- takes- all(WTA)”: in which for each pixel of the response, we choose the maxima response all over the orientation channels, or rather, each channel gets different features in a particular orientation, all the channels response for a given point can be seen as a feature vector, pooling is to find a “feature scalar for this point” according to the vector, WTA takes the maximum of the vector as the “feature scalar”. To cover all the “orientation space”, if we choose K orientation channels, we should take the orientation bandwidth asπ/k, as show in the Fig.2.9

A B C

Fig.2.9 Filters to cover all the “orientation space” (A) Four orientation channels ,each filer has an orientation bandwidth π/4;(B) Six orientation channels ,each filer has an orientation bandwidth π/6,(C) Eight orientation channels ,each filer has an orientation bandwidth π/8

Fig.2.10 shows the experiment results of the natural image:

(A)

(B )

Fig.2.10 Experiment results of the natural image, from which we can see the results extract most of the luminance-defined edges

2.4 Multi-Scale First-order Processing

Unlike simple synthetic images with relatively few frequency components, a natural image’a spectrum always consists of many different frequency components. So single-scale first-order processing can’t describe the visual cortex’s behavior properly, as can be see in Fig.2.10B, the smaller variance of Lena’s hat which corresponds to higher frequency can’t be seen in the experiment results. Let’s see such a Figure as shown in Fig 2.11A.

A B

C D

Fig.2.11 Zebras, this is obtained by rotate and scale a zebra image (A) input Image (B) large scale

filtered results (C)middle scale (D)small scale

From the results in Fig.2.11, we can see that “large scale” filters extract features of large scale, “Middle scale” filters extract features of middle scale, and“small scale” filters extract features of small scale. Then how to extract all the features? Multi-Scale processing is surely the right answer. In fact, HVS is itself multi-Scale for the existence of visual cortex tuned to different frequencies and orientations. 2.4.1 Pooling scheme over frequencies : winner-take-all

Similarly as the pooling scheme discussed previously, we will use “winner- takes- all” over all the frequencies. We can understand it in this way: each channel gets different features in a particular orientation and frequency, all the channels response for a given point can be seen as a feature matrix(row: scales, column: orientations ), pooling is to find a “feature scalar for this point” according to the matrix, WTA takes the maximum of the matrix as the “feature scalar”. To cover all the “orientation space”, if we choose K orientation channels, we should take the orientation bandwidth asπ/k.Then how to cover all the “frequency space”?

2.4.2 Filter bank design

Specification of the filter set properties involves choosing a set of frequencies and orientations that will cover the spatial-frequency space, and capture the image feature as much as possible.[7]proposed that a filter bank with orientation separation angle π/6, and frequencies as follow is a good choice:

12,22,42,...,and(Nc/4)2 cycles/image width, in which Nc is the length of image column

According to this frequency and orientation bandwidth are 1(octave) and π/6, respectively . The above proposed set of filter parameters were particularly selected so that it can properly capture image feature information. Fig 2.12 shows the filter bank arrangement in the spatial-frequency domain.

Fig 2.12 Gabor filter set in the spatial-frequency domain Frequencies are one octave apart; Frequency bandwidth are one octave each; Orientataion separation is

π/6

For the case of orientation separation of π/6,the number of filters required is 6log2(Nc/2) (for an image of width 256 pixels,42 filters are required). For efficiency consideration, also in natural image, what we desired most is the edge information, which is often corresponding to higher frequencies. So, in our experiments, only 3 scales is used, that is:

(Nc/16)2,(Nc/8)2,and(Nc/4)2 cycles/image width

Fig.2.13 shows the experiment results of input show in Fig.2.11A and Fig.2.10B

with the above parameters and WTA pooling scheme:

A B

Fig.2.13 the experiment results of input show in Fig.2.11A and Fig2.10Bwith the above parameters and WTA pooling scheme, notice that it has much better performance than single-scale processing, as we have expected

2.4.3 Post-processing: thinning and thresholding

These are post-processing techniques standardly used in image processing. Thinning thins edges in the output to one-pixel wide edges by non-maxima suppression(NMS). The edge regions will be more than one pixel wide. The actual

grey level edge is supposed to be at the position of maximum response strength, and all pixels in edge regions where this is not the case have to be suppressed. However, in edge regions, neighbouring pixels in direction of the edge are supposed to have similar response strengths. Thus, non-maximum suppression may only be performed in direction of the grey level gradient, i.e., orthogonal to the edge by Canny. It follows the following steps:

1. From each position (x,y), step in the two directions perpendicular to edge orientation θ(x, y).

2. Denote the initial pixel (x, y) by C, the two neighbouring pixels in the perpendicular directions by A and B.

3. If the M(A) > M(C) or M(B) > M(C), discard the pixel (x; y) by setting M(x,y) = 0,in which M(A) denote the response magnitude of the edge. Fig.2.14 shows the NMS processing:

Fig.2.14. Illustration to the non-maxima suppression. Pixels A and B are deleted because M(C) >

M(A) and M(C) > M(B). Pixel C is not deleted.

In my experiments, I adopt a method similar to the above NMS. It follows the

following steps[16]:

(1) Find the gradient orientation region for each pixel . The orientation of the gradient of each pixel corresponds to the orientation channel which generates

the maximum response, that is:

θ=argmax{E(x,y,α)|α=1,\",K},whereE(x,y,α)denote the filter

αresponse at (x,y) for the orientation channel α, K denotes the total number of the orientation channels.

These gradient orientations are partitioned as 4 regions, each region

⎧1, 1≤θ<(K/4+1)

⎪2, (K/4+1)≤θ<(K/2+1)⎪ρ=⎨

3,(/21)(3/41)KθK +≤<+⎪⎪⎩4, (3K/4+1)≤θ(2) Adopt non-maxima suppression to gradient response. For each region, different neighbouring pixels are used. The four regions and the corresponding orientation are shown in Fig 2.15

Fig.2.15 The four gradient orientation regions and the corresponding orientations

E.g. If the gradient orientation of the center pixel belongs to region 4, then we compare the response of x with the neighbouring pixels at position 4 as shown in Fig.2.15. If its response isn’t the local maximum, we set the pixel grey level to 0.

Fig 2.16 shows the experiment results of the above non-maxima suppression

algorithm.

A B C

Fig.2.16 experiment results of the above non-maxima suppression algorithm (A)A circle which

is 3 pixels wide, and the center pixel is lighter than the outer and inner neighbouring pixels (B) Response (C) Response with non-maxima suppression algorithm

Hysteresis thresholding results in a binary output image. If it is enabled, two

threshold values must be specified: T-high and T-low. These are given as fractions (between 0 and 1) of the maximum response value.

Pixels with responses higher than T-high are assigned the binary value 1 in the output, while pixels with responses below T-low are assigned the binary value 0. Pixels with responses between T-low and T-high are assigned the value 1 in the binary output if they can be connected to any pixel with a response larger than T-high through a chain of other pixels with responses larger than T-low.

Fig.2.17 shows the experiment results of the above non-maxima suppression

algorithm and hysteresis thresholding to “Lena” as shown in Fig.2.10B.

A B

C D

Fig.2.17 The experiment results of the above non-maxima suppression algorithm and hysteresis thresholding to “Lena” as shown in Fig2.10B. (A) Response of multi-scale processing(B) Response with non-maxima suppression (C) Response with non-maxima suppression and hysteresis thresholding, here T-high=0.3, T-low=0.05 (D) Edge detect using canny operator for comparison.

2.5 Color Image Processing

2.5.1 Color vision and the opponent theory

Color is not a physical quantity but perception. There are two kinds of photoreceptors in retina, known as rods and cones. The rods are responsible in dim illumination while the cones are responsible for color vision in daylight. The cones contain three different pigments which have different spectral sensitivities.We usually call the three cones as L-cone, M-cone, and S-cone (long, median, and short wavelength sensitive). The responses of the three cones are summed within a region and delivered to the brain which roughly analyzes the content of perceived spectrum and gives us the perception of color. Hering proposed the opponent theory considering the antagonism between colors occurred in the retina [6]. As shown in Fig. 2.18, after the stage of photoreceptors, the signals are coded in three opponent

channels as yellow-blue channel, green-red channel (two chromatic channels), and white-black channel (one luminance channel), and such a color space is so-called YCbCr color space. Forms of receptive fields in the three color opponent channels at pre-cortex stages are concentric. At cortex, receptive fields of in the three channels are specialized to specific orientations and frequencies . Color information is perceived within region therefore the resolution in the two chromatic channels must be coarser than the one in luminance channel in either orientation or spatial frequency.

Fig.2.18 Schematic diagram of the opponent theory

Fig.2.19 shows the results of converting an RGB image to YCbCr colorspace.

A B

C D

Fig.2.19. the results of converting a RGB image to YCbCr color space ,from which we can

clearly see that color is an important cue for edge detection in color images (A) input Color image (B) white-black (luminance ) component (c) yellow-blue(chromatic) component (D) green-red (chromatic) component

2.5.2 First-order computational model for color images

The model is implemented in four stages: i) Image decomposition in YCbCr color space containing two chromatic (Cb and Cr) and one luminance signal components (Y). ii) Processing of the three channel signals in parallel pathways to extract the 1st-order features. iii) Pooling from the three channel by WTA. iv) Post-processing by non-maxima suppression algorithm and hysteresis thresholding . These stages are presented below in reference to Fig. 2.20.

Fig.2.20 First-order computational model for color images

I will give a detailed description about the procedures and how to choose the parameters below:

i. Input Image Decomposition: Following Hering's opponent theory, we select the YCbCr color space to present the image in three components.

ii. A three-scales and six-orientations Gabor filter-bank scheme determines the Luminance features. All the eighteen filters in the scheme will convolute with the signal in Y channel (luminance). The features in chromatic channels are selective to lower frequencies and have coarser resolutions of frequencies, hence only filters of lowest frequency process the signals in Cb and Cr channels.

iii. Adopting WTA again to pool the output of the three channels. It’s in fact a combining of the color and luminance features.

iv. Post-processing by non-maxima suppression algorithm and hysteresis

thresholding .

The experiment results is given in Fig.2.21,input figure is given in Fig.2.19A

A B C

D E F

Fig.2.21 experiment results of the figure given in Fig.2.19A.(A) Response of the luminance(Y) channel (B) Response of the Cb channel (C) Response of the Cr channel (D)Pooling Response (E) Post-Processing (F) To be compared with (D), Response by input is converted from RGB to gray , you can easy find that what a important role the color play in the process of edge detection.

3. Second-order Processing

3.1 Neural and Physiological Basis

The earliest report of neuronal responses to second-order motion was by Albright (1992), who showed that neurons in primate area MT/V5 responded to moving regions of dynamic noise, on a static noise background(Fig.3.1A) .

Yet another line of research has investigated possible mechanisms of texture segregation, using stimuli having a central region differing from its background in local texture properties(Fig.3.1B) — again, with the same net luminances in the figure and ground, these could be considered second-order stimuli. For example, Olavarria et al. (1992) demonstrated responses in primate MT/V5 neurons using

textures of small lines, differing in orientation, and responses to texture borders were also found in primate area V1 (Nothdurft et al., 1999)[4].

Henning et al. (1975) introduced contrast envelope stimuli, like that shown in Fig. 1.2C, arguing that their detection could not be accounted for in terms of linear filter schemes. Several subsequent studies(e.g. Derrington and Badcock, 1985; Ledgeway and Smith, 1994b) indicated that contrast envelopes were detected by a different mechanism[4].

(A) (B) (C)

Fig.3.1.Examples of second-order stimuli .(A) Bar defined by dynamic noise, on background of

static noise. (B) Bar of vertical micro-elements, on background of horizontal elements. (C) Oriented grating patch, on background of same average luminance.

Use of contrast envelope stimuli to characterize second-order responses in single neurons

To demonstrate and characterize second-order responses from early visual

cortex neurons, we employed contrast envelope stimuli (Fig. 1.3C) consisting of a high spatial frequency sine-wave carrier whose contrast is modulated by a lower spatial frequency, drifting sin-ewave envelope. such a neuron as what we introduced in section 2 should respond well to the luminance grating, but not the contrast envelope: in the latter case, the effects of the fine-grain light and dark regions of the carrier would cancel out within the much larger summation sub-regions of the linear receptive field. To put this another (equivalent) way, the carrier spatial frequency is so high as to be outside the passband of the luminance-responsive linear receptive field.

A model of envelope-responsive neurons

That a neuron can show different preferred spatial and temporal frequency preferences, and differing degrees of direction selectivity, to luminance and envelope stimuli suggests that its firing reflects a combination of inputs from two distinct

pathways or ‘streams’, one mediating first-order and the other second-order responses. An adequate model for the one handling luminance grating responses could be a conventional linear filter model like that discussed above, but such a model would be entirely incapable of producing envelope responses for reasons outlined earlier.

Contrast envelopes can be thought of as analogous to AM (‘amplitude modulation’) radio signals, in which the amplitude of a radio frequency carrier has been modulated by a much lower frequency audio signal; in a AM radio, the audio signal is ‘demodulated’ by rectification followed by low-pass filtering to smooth out the irregularities introduced by the high frequency carrier. While a signal-processing scheme of this kind might mediate neuronal responses to contrast envelopes, some important modifications are needed to account for the physiological data. Firstly, the spatial and temporal tuning characteristics for envelope responses require that the filter following the rectifier be band-pass rather than low-pass. Secondly, to achieve the tuning to carrier spatial frequency and orientation, we need a frontend filter prior to the rectification (analogous to the radio frequency tuner which selects which station we receive). The intervening rectification ensures that the fine-grain positive and negative portions of the carrier-frequency signal do not cancel one another when smoothed by the later filter. Thus we are led to a ‘filter -> rectify -> filter’ (F->R->F) model, for the second-order stream mediating a cortical neuron’s response to contrast envelopes (Fig. 1.4).

This is a very attractive idea because the results of our neurophysiological data map on to it so straightforwardly. The very different tunings for orientation and spatial frequency of the carrier and envelope are easily understood: the selectivity for stimulus parameters of the carrier reflect the properties of the first filter, while the selectivity for envelope attributes reflects the tuning of the second filter.

Fig.1.4 illustrates the complete model with separate streams or paths for

processing first- and second-order stimuli. A simple luminance grating can elicit a response only via the first-order pathway: regardless of its spatial frequency, it can never satisfy the conflicting demands of the highly discrepant spatial frequency tunings of the early and late filters of the F->R->F stream. Similarly, a contrast envelope stimulus will exert an effect on the final output only via the F->R->F stream, for the reasons discussed already.

Many findings from human psychophysics can be similarly understood in terms of a F->R->F scheme. For example, second-order operations which are selective for carrier spatial frequency and orientation may indicate characteristics of the early filter, while specificity to attributes of the envelope could reflect properties of the late filter.

Possible physiological substrates of second-order processing

In the model of Fig.1.4, the upper branch which handles first-order processing could be any of the quasi-linear neurons of early visual cortical areas, though the low preferred spatial frequencies in our data suggest a greater contribution from second-tier areas (A18, V2). Our findings of selectivity for carrier spatial frequency and orientation suggest that the early filters of the F->R->F branch must be cortical, and the quite high optimal carrier frequencies suggest quasi-linear neurons in first-tier areas (A17, V1)as very likely candidates. The more interesting question is the neural substrate of the later filters of the F->R->F branch: neurons corresponding to this stage, if they exist, would in this scheme be exclusively responsive to second-order stimuli. They would not respond to full-field sine-wave gratings, of any spatial frequency or orientation, but would respond to contrast envelope stimuli, with the same selectivities already described for spatial frequency and orientation of both carrier and envelope, and in many cases also to direction of envelope motion. Neurons having all these properties have never been reported, though this might well be a consequence of not having used appropriate ‘search stimuli’, and use of protocols in which neurons are always first characterized with sine-wave gratings[4].

3.2 Single-Scale Second-order Processing

The implementation of the FRF operation is similar, except that the response image of the first filter (F1) was full-wave rectified and then convolved with a second filter(F2) (Fig.1.5,bottom). To exclude biologically invalid combinations, the early-stage filter was constrained to higher spatial frequencies than was the late-stage filter (i.e.λF2>2λF1, see Aaron P.Johnson [1]).

Fig.3.2 shows the experiment results of a contrast envelop processed by the

“Filter-Rectify-filter” scheme, in which the first stage filter is tuned to the carrier, and the second stage filter is tuned to the envelop.

A1 B1C1 D1

A2 B2C2 D2

Fig.3.2 the experiment results of a contrast envelop processed by the “Filter-Rectify-filter”

scheme. (A) A1 is the Contrast envelop with carrier wavelength 4, orientation π/2, while A2 is its frequency spectrum. (B) B1 is the output of the 1st stage filter with wavelength 4, orientation π/2, while B2 is its frequency spectrum. (C) C1 is the output of after the full-wave rectifier , while C2 is its frequency spectrum. (D) D1 is the output of the 2nd stage filter with wavelength , orientation π/2, while D2 is its frequency spectrum.

From Fig.3.2, especially from the frequency spectrum, we find Rectify is very important for such a FRF scheme because it performs the frequency spectrum transform and enable the 2nd stage filter to demodulate the envelop. [15] emphasis this importance in the aspect of average luminance, I think it’s a good way, so I write a experiment program according to it, Fig 3.3 gives the experiment results.

Fig.3.3 (A) The luminance profile of the contrast envelop shown in Fig.3.2A1, from which you can see that the average luminance is constant, thus it’s impossible for a first-order system to give a net response. (B) The luminance profile of the contrast envelop shown in Fig.3.2C1,from which you can see that the average luminance isn’t constant after being rectified, thus a post first-order system(the 2nd stage filter can detect the variance of the luminance

3.3 Multi-Scale Second-order Processing

3.3.1 Problems to build the computational model

For the FRF model mentioned above, there are still many problems. For example, it’s unclear whether λF2 has a determined relationship with λF1; what’s the exactly post-processing mechanism of second-order pathway? Is there a relationship between the optimal orientation of the first- and second-stage filters? Some results support the notion that the first-stage filters are predominantly of the same orientation as the second-stage filter, whereas other findings suggest orthogonal inputs. Conclusion by C. L. Baker’s experiment[1], imply that there is no rigid orientation combination between the first and second stage filters. As mentioned in section 3.1, there is even no evidences to show that the neurons corresponding to the later filters of the F->R->F branch exists in cortex.

Until now, there is no computer model for such a FRF model mentioned above for the sake of problems just discussed. Baker in his paper[1] , used a total of 1440 simulated F1–F2 combinations to cover the full range of orientations and spatial phases. By this method for each image analyzed, the convolutions required about 10 h to calculate and about 6.5 Gbytes for storage. What’s more , he didn’t even consider the pooling scheme in his paper because he just want to find the relationship between the first- and second-order information. Beside this , I look up lots of papers, little mentioned the pooling scheme of FRF model.

The FRF model mentioned above is originally proposed by the application of contrast envelop. It’s also difficult to imagine a natural scene with the attributes similar as the contrast envelop.

However, there is another line we can continue our research : texture-defined stimuli, which is also second-order, and some computational model based on HVS is proposed[5][6][7], in which they both use a “FRF” scheme which is a little bit different from the above one.

3.3.2 Texture segmentation based on a different “FRF” scheme

Texture is an important property of surfaces which characterizes the nature of the surface. An important task in image processing and machine vision is the task of segmenting regions of different texture in an image. However, it is not precisely defined what constitutes a “proper” region. Ideally, we would want each region to represent different “object” in the image, for the purpose of object recognition for example, or scene analysis. Still we cannot define what constitutes an “object” exactly. For example, if a bookshelf is filled with books, do we want to consider each book to be a separate object, or do we want the bookshelf and everything on it to be considered as one object? It is clear, then, that there is no one segmentation of an image that can be considered to be “right.” The “right” segmentation exists only in the mind of the observer, which can change not only between observers,but also within the same observer at different times.

To be able to solve the problem of texture segmentation we have to define what is texture. Unfortunately, there exists no unified definition of texture. However, a typical definition of texture in the literature is “a spatial arrangement of local (gray-level) intensity attributes which are correlated (in some way) within areas of the visual scene corresponding to surface regions” Another definition is “one or more basic local patterns that are repeated in a periodic manner[7].” So it could be agreed on that texture exhibits some sort of periodicity of basic patterns. This fact leads us to the idea that people use texture properties to identify different textures. Rao and Lohse [3] indicate that people are sensitive to three texture properties: repetition, directionality and complexity . This fact is very important if we would like to design a methodology for texture segmentation. Further investigation of how the Human Visual System (HVS) works on interpreting texture resulted in a robust approach for the texture segmentation problem.

According to the above investigation, a model for the HVS interpretation of texture has been based on multi-channel filtering of narrow bands. Simple cells in the visual cortex were found to be sensitive to different channels of combinations of various spatial frequencies and orientations. Since texture repetition can be characterized by its spatial frequency, and directionality by its orientation, then we

can fit the HVS model into a methodology that uses multi-channel filtering at different spatial-frequencies and orientation for texture analysis.

The multi-channel filtering approach is actually a multi-scale decomposition process, which is similar to wavelet analysis. In fact, one well known class of functions that are known to achieve both spatial and spatial-frequency localization is the Gabor function mentioned in section 2.1.2. This Gabor filter banks has been used extensively in texture segmentation due to the ability to tune a Gabor filter to specific spatial frequency and orientation, and achieve both localization in the spatial and the spatial-frequency domains.

The process of texture segmentation using Gabor filters involves proper design of a filter bank tuned to different spatial-frequencies and orientations to cover the spatial-frequency space; decomposing the image into a number of filtered images; extraction of features from the filtered images; and the clustering of pixels in the feature space to produce the segmented image. Both supervised and unsupervised approaches were used in texture segmentation. Supervised approaches rely on training methods and reference segmentations for performance assessment, while unsupervised approaches mainly rely on subjective assessment.

In this part, I design and implement a texture segmentation system based on the work done by [5][6][7], Fig.3.4 shows the scheme of the segmentation system. Segmentation is done in unsupervised mode. In section 3.4, I’ll extends this method to color image by the similar method as shown in section 2.4.

As outlined in Fig.3.4, the process of texture segmentation using multi-channel filtering involves the following steps:

(1)Gabor and Gaussian Filter bank design

(2)Decomposition of the input image using the Gabor filter bank to obtain first-order feature (3)Full-wave Rectify

(4)Applying spatial smoothing using the Gaussian filter bank and then obtain the second-order feature

(5)Clustering of pixels in the feature space.

Fig. 3.4. Proposed algorithm. First-order features is generated by a set of Gabor filtering. The second-order features are generated by a full-wave rectifier and a set of Gaussian filters. The following procedure is a pooling, or rather clustering process.

Gabor filter bank design

Similar as discussed in section 2.3, Specification of the filter set properties involves choosing a set of frequencies and orientations that will cover the spatial-frequency space, and capture the texture information as much as possible. [7] proposed that a filter bank with orientation separation angle π/6, and frequencies as follow is a good choice:

12,22,42,...,and(Nc/4)2 cycles/image width, Nc is the length of image column

According to this frequency and orientation bandwidth are 1(octave) and π/6, respectively . 6log2(Nc/2) (for an image of width 256 pixels,42 filters are required), For efficiency considerations, we do not use the filters at frequencies 12and

22 , 39

since they are not very useful, because these filters capture spatial variations that are too large to explain textural variations in an image [7]. Also, the DC gain of the filters is set to zero in order to prevent any response to constant areas in the image.

The above proposed set of filter parameters were particularly selected so that it can properly capture texture information. Center frequencies of channel filters must lie close to the characteristic texture frequencies or else the filter responses will fall off rapidly [7]. Also care has to be taken so that the filters do not overlap in the frequency domain to avoid aliasing.

Full-Wave Rectification

Just like the other filter-rectify-filter model, the rectifying operation is taken

after the operation of the Gabor filters. The intervening rectification ensures that the fine-grain positive and negative portions of the carrier will not disable another when the post smoothing operation is performed.

A testing texture image is given in Fig.3.5A, Fig.3.5B demonstrates the output of the Gabor filtering without rectification, and Fig.3.5C is the results of the same operation but with rectification(Here we use only four orientation channel according to the feature of the texture). The white pixels in the image represent the pixels that the matching features have been detected by the Gabor function. This result is similar to the behavior of the V1 cells. Because of the restriction of display, there are some pixels with negative response which do not appear in Fig.3.5B. In Fig.3.5C, the boundary of the regions is more apparently, and that is because of the rectification turning the negative response to positive.

Gaussian Post Filtering:

After the cells were stimulated by a specific signal, for example, a bar with

specific orientations, the output of the V1 cells responding to same direction will aggregate together. The region of cells which contain the same properties will respond stronger than the other regions. It is consistent with the “localization” properties of the textures. This effect can be simulated by a Gaussian post filters. It looks like the averaging with different weighting which is inverse proportional to the distance from the center of the post filter. Fig.3.5D shows the results after the processing of rectification.

0 degree 45 degree 90 degree 135 degree

Fig.3.5. A testing experiment to illustrate the important function of full-wave rectification and Gaussian filtering (A) testing texture image (B) 1st stage filtering results (C) results after full-wave rectification (D) results after gaussian post filtering

The σgaussianof the post Gaussian filter, which decides the smoothing level. Increasing can eliminate more noise, but the accuracy of the texture boundary may decrease. Because both the Gabor filters and the Gaussian filters have spatial information, the values of must cooperate with to obtain a better result. In my experiments, we choose σgaussian=2σgabor.

Pooling/Clustering scheme

At the end of the feature extraction step we are left with a set of feature images extracted from the filter outputs. Pixels that belong to the same texture region have the same texture characteristics, and should be close to each other in the feature space. The final step in unsupervised texture segmentation is to cluster the pixels into a number of clusters representing the original texture regions. Labeling each cluster yields the segmented image. In our experiment, we use the basic K-means clustering algorithm for simplicity. However, this means we have to provide the algorithm with the number of clusters beforehand, which means the number of different textures in the image is known, or could be estimated.

K-means clustering starts by assigning the cluster centers to random points in

the input set. Then it starts calculating the distance to from each point to the cluster centers and assigns each point to its nearest cluster center (based on the Euclidean distance.)The next step is to recalculate the cluster centers as the mean of each cluster. The algorithm works iteratively by assigning the points to their nearest cluster center and updating the cluster centers until it converges and no more changes could be made. When clustering is done, each pixel is labeled with its respective cluster, finally producing the segmented image. These steps are shown as Fig.3.6.

Fig.3.6.Diagram of basic K-means clustering algorithms.

Use the procedures described in Fig.3.4, I build a matlab program to perform the task of texture segmentation. Fig.3.7 gives some experiment results.

Note here, actually, this program doesn’t always generate such a good segmentation results as I show you in Fig.3.7, some times the program should be run several times ,and the best segmentation results were chosen. The reason that it’s not robust may result from the following weakness of the K-means clustering algorithm: z When the numbers of data are not so many, initial grouping will determine the

cluster significantly.

z The number of cluster, K, must be determined before hand.

z We never know the real cluster, using the same data, if it is inputted in a

different order may produce different cluster if the number of data is a few. z Sensitive to initial condition. Different initial condition may produce different

results of cluster. The algorithm may be trapped in the local optimum. z We never know which attribute contributes more to the grouping process since

we assume that each attribute has the same weight.

z Weakness of arithmetic mean is not robust to outliers. Very far data from the

centroid may pull the centroid away from the real one. z The result is circular cluster shape because based on distance.

A B C

D E F

H I

Fig.3.7 Some experiment results of the proposed texture segmentation algorithm Note that A,B,E and F are bulld from brodatz texture , While G, H,I are natural textures. The Number of clusters K is not the same for the above ones, as you can see from the color number of results

3.4 Texture Segmentation for Color Image

As mentioned in section 2.4, YCbCr color space is in accordance with Human Color vision. Here again we will use a method similar to that we adopt in section 2.4. We first decompose the color image into YCbCr color space containing two chromatic (Cb and Cr) and one luminance signal components (Y). Because Gabor filter is sensitive to the luminance-defined edge. Here our aim is to segmentation of the

texture, so we should not use the Gabor filters to the two chromatic channel. Considering that the chromatic channel uses the luminance variance to denote the chromatic variance, we can use the luminance of the chromatic component represent the first-order feature of the chromatic channel directly, however , it should be normalized to the response magnitude range of the luminance channel.

Fig.3.8 shows the scheme mentioned above.

Fig.3.8 Texture segmentation for color Images

Note that, if two images with different colors have the same texture feature when they were converted to gray, we assume they are different textures, which is illustrated in Fig.3.9B.

Fig.3.9 gives some experiment results of the above proposed algorithm, what I

want to mention is that the results are not so good as I imagine, the reason will be discussed below.

Original color image processed by converting proposed method

to gray image

Fig.3.9 Some experiment results of the above proposed algorithm, from these results we can see it

has better performance than gray ones, however, the Results given by the proposed method isn’t satisfying, what’s more, it’s not robust as mentioned above.

As mentioned in section 3.3 ,the weakness of K-means is one reason that the results are not satisfying. There maybe another factor we should not ignore: the weight relation between the chromatic channels and luminance channel, in the schemes above ,I simply normalized the luminance of the chromatic component to the range of the luminance channel response, however, there are much more subchannel for luminance channel, while for each chromatic channel ,it has no subchannel. So maybe a properly designed weight relations between the three channel can improve the performance, this is what I need make clear to further research.

4. Conclusion

In previous work, I have tried to build computational models from the first- and second-order processing mechanisms in HVS and the existed work of other researchers.

In section 2, I have built the computational models of the linear spatial

filters(first-order) for both grey and color input image, post-processing schemes are also proposed, from which we can see that the first-order system mainly dealt with the luminance-defined edge of the input image, whether it’s natural or not.

In section 3, I have built a computational model for the task of texture (second-order) segmentation on the basis of FRF model derived from HVS. Simple K-means clustering is used and the performance is satisfying except that it’s not a robust algorithm due to the weakness of K-means. From this section we can see that the second-order system mainly dealt with the contrast- and texture-defined stimuli , whether it’s natural or not.

However, frankly speaking, the performance of the proposed computational model may not achieve the performance as good as many methods proposed on the basis of other areas(not based on HVS). But considering there are many mechanisms unknown for human vision mechanisms, I believe it has a bright future, and that’s what I should do some further research in the future.

5.References

[1]Aaron P. Johnson ,Curtis L. Baker, Jr.,”First- and second-order information in natural images:a filter-based approach to image statistics”, J. Opt. Soc. Am. A, Vol. 21, No. 6, 2004

[2]Aaron P.Johnson, Frederick A.A.Kingdom, Curtis L.Baker,Jr. “Spatiochromatic statistics of natural scenes:first- and second-order information and their correlational structure”, J. Opt. Soc. Am. A,Vol. 22, No. 10, 2005

[3]A.R.Rao, G.L.Lohse, “Identifying high level features of texture perception,” CVGIP: Graphical Models and Image Proc. 55(3), pp. 5218-5233, 1993.

[4]Curtis L. Baker, Jr. , Isabelle Mareschal, “Processing of second-order stimuli in the visual cortex”, Progress in Brain Research , Vol. 134, 2001

[5]Chin-Teng Lin, Chao-Hui Huang ,Shi-An Chen,” CNN-Based Hybrid-Order Texture Segregation as Early Vision Processing and Its Implementation on CNN-UM ”,IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS,VOL. 54, NO.10, 2007 [6]Chin-Teng Lin, Tsung-Heng Tsai, ”Biological-Inspired Model for Hybrid-Order Chromatic Texture Segregation”, Cellular Neural Networks and Their Applications,

2005 9th International Workshop on,IEEE,May 2005

[7]Khaled Hammouda,Ed Jernigan,”Texture Segmentation Using Gabor Filters”, Course Project of SD775 at the University of Waterloo, Ontario, Canada, May 2003 [8] J.F.Canny, “A computational approach to edge detection”, IEEE Tran. Pattern Anal. Machine Intell, 8(6):679-698,1986

[9]John G. Daugman , ” Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters”, J. Opt. Soc. Am. A,Vol. 2, No. 7, 1985

[10] Javier R. Movellan,”Tutorial on Gabor Filters”

[11]Konrad P.Kording,Christoph Kayser, Wolfgang Einhauser, and Peter Konig, “ How Are Complex Cell Properties Adapted to the Statistics of Natural Stimuli?”, J Neurophysiol 91: 206–212, 2004

[12] Michael S. Landy,”Visual Perception of Texture”

[13] Nong Sang, Li Xiao, Bin Sun, “Computational experiment of first order motion perception”, Proceeding of the Asia-Pacific Workshop on Visual Information Processing, pp.81-86, November 7-9, 2006, Beijing, China

[14]N. Petkov , P. Kruizinga: “Computational models of visual neurons specialised in the detection of periodic and aperiodic oriented visual stimuli: bar and grating cells”, Biological Cybernetics, 76 (2),83-96. 1997

[15] Re′my Allard , Jocelyn Faubert, “Double dissociation between first- and second-order processing”, Vision Research 47, 1129–1141,2007

[16] 唐奇伶，桑农，“基于初级视皮层感知机制的轮廓检测模型及其应用”，华中科技大学博士论文，2007

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文