Automatic Histogram Clustering Image on Spectral and Textural Features.



1. Purpose of program

The Program is intended for automatic clustering the multi-spectral and textural images. The Program is designed by V.S. Sidorova in the laboratory of Image Processing of Institute Computing Mathematicians and Mathematical Geophysics of Siberian branch of Russian Academy of Sciences.

2. Algorithm cardinal principles

Vector space is clustering with proposed new technique using the estimations of classification quality. Quality is understood as isolation of clusters received. In base of clustering is the Narendra fast nonparametric algorithm, separating vector space into the unimodal clusters, corresponding to local maximums of multivariate histogram. The algorithm does not use any suppositions about functions of distribution for various clusters. It also does not required any a priori data (number of clusters, quantity of iterations etc.).

Maximums are searched with quickest descent method in a discrete vector space. The speed of algorithm is reached to account of presentation of vectors in form of linear list, ranked the vectors in them increasing order. Algorithm is described on laboratory site to detail, ibidem is placed the demonstration version, including executive file (exe) and test three spectral image of melting snows of the Land surface obtained with satellite NOAA. The Narendra algorithm expects the preliminary quantizing vector space, but number of levels of quantization was assigned arbitrarily.

The New algorithm automates the choice of number of levels, estimating cluster isolation on offered measure. Row of cluster distribution is build for the new systems of vectors produced with different number of quantizing levels. Then the best distribution is chosen, corresponding to minimum of offered measure for isolation of unimodal clusters. Such approach allows diminishing the number of clusters, best distribution for different detail of data herewith is received.

Offered indicated measure of estimation of quality of distribution is founded on computing average relations of multivariate histogram values on cluster border and in the field of its modal vector. It is shown it allows correct estimating cluster isolation for date of high density typical in remote sensing correct. The measure meet quality measure or validity of classification unsupervised, unlike traditionally measures, founded on dispersions and distances between clusters, which can be correct are used in classification supervised only.

3. Particularities of texture clustering

Textural features are a spatial, not point unlike spectral so there is some particularities of clustering. The statistical textural features are computed for one channel as addenda to spectral, only spectral or textural features can be given as private events. Textural features are computed in vicinities of each point of image and complement the multivariate vector. The neighborhood is a square window of equal size for all points.

The window size is computed automatically. The window sizes for statistician gathering affect the results of clustering greatly. The best cluster distribution for each size as from certain is found. It is expected for ones that for a certain size are stabilized the values of features for all textures of image and the number of the clusters is not changed. This size there is sought.

False clusters may be on the borders of objects with different texture. For indicating these narrow clusters threshold method is used. Part of border points by area for true cluster should be less than threshold. Threshold is taken as a part for window sizes. False cluster is united with neighbor on the image, based on coalescence in feature space. Algorithm of association false clusters is offered in chains. Chain of false clusters is formed up until will get true cluster, and then whole chain is assigned its number.

4. Principles of features choice

Automatic choice different models and statistic for texture based on the measure of quality of unlabeled classification. Two systems of features are performed:

1. Haralick statistics;

2. Models SAR.
A two-dimensional noncausal random field model called the «simultaneous autoregressive" (SAR) model with given set of neighbor pixels. A maximum likelihood (ML) approach for SAR param­eter estimation is used. The approximate parameters of model, characterizing the depen­dence of a pixel gray level to these of its neighboring pixels and overall variance of the noise, are computed for each point of image and used as texture features forming multivariate vector.

In foreign literature is shown haw the texture features, founded on models SAR are used for labeled classification of standard textures from the Brodatz album. Using these models for unlabeled classification of forest image is the- new development, raising level of automatic recognition in contrast with world analogues.

The results of clustering forest landscape images are compared for two systems of features. Classification of the image was carried out with the automatic clustering algorithm for textures. The comparison of two systems of features based on models of casual field (SAR) and on the Haralick statistics has shown that the quality of classification (on measure) above for the second system on all detail levels. The collation of cluster maps and the skeleton map built by specialists in forestry for ground-based forest inventory has confirmed this conclusion. Features based on models of casual field (SAR) show greater distinctive power than ones on the Haralick statistics. They have allowed recognize age phases for plantings of different forest types; close on visual characteristics, with accuracy of afforestation inspection.

5. Illustrations


There is the aerial autumn image of the forest landscape of West Siberia of scale 1:50000 on Fig. 1b. The size of the digital version of the image is 1157*1178, resolution: 5*5 sq.m/pixel.

Fig. 1.a) cluster map b) aerial autumn image of the forest landscape, с) skeleton map of afforestation inspection.

The map of forest stratums is shown on Fig. 1c, obtained by specialists in forestry for ground-based forest inventory. Each stratum corresponds to the forest plantings of certain type and age, but can include some percent of other elements. The most of forests on image are coniferous. Most difficult discernible on texture are cedar and pines planting of some phases. Fig. 1a shows the cluster map for clustering by two features: the parameter6of model SAR models, corresponding to level of noise and reflective texture granularity, and average tone. Exactly this pair opposite different combinations of features for differences statistics Haralick (MEAN, CONT, ASM and ENT) has provided least value of quality measure for clustering within the range of numbers quantization levels N of features space from 61 before 255. It is received: 41 cluster for 82 quantization levels N, value of quality measure 0.47. The forest corresponds to 30 clusters. Positions of cluster on the cluster map (Fig. 1a) correspond to the skeleton map built by specialists in forestry for ground-based forest inventory.


Fig. 2. a) Aerial image of the forest landscape, b) Skeleton map of afforestation inspection с) cluster map.

In Fig. 2a there is the black-and-white aerial photograph of deciduous forest landscape of Western Siberia. The resolution of electronic version of the image is
5x5 m x m/pixel. And the size is 1178x1157. The skeleton map of the forest stratums was obtained due to the forest assessment is shown in Fig. 2b. The forest stratums of osiers and poplar forests in several phases of evolution are presented. Birch forest occupies the small area at upper left. The significant area is covered by the motley grass, which has the texture also.

The space of features is clustered by the algorithm for texture images with three features: of the SAR model, average tone and standard deviation. These features allow distinguishing the plantings of the osiers and poplar, and the motley grass. As a result of clustering there are obtained: size of window 18x18, best classifications, the minimum of isolation measure equal to 0.38 is obtained at n = 61 for diapason 48 < n < 255. If n <40 we are not able to distinguish the grass from the some phases of the osiers. Then algorithm for merging narrow clusters was used. The algorithm automatically has chosen 100 the largest clusters, whose segments are most compact in image plane, and has merged the rest one to them. In Fig.2c the final map of clusters is shown. It is obtained 44 clusters of 100 correspond to the forest.

The forest poplar is located on the four large stratums: 38, 45, 97, and 68. They differ in age and composition. As a result of three features clustering each of these stratums is presented by its own clusters only. Most of the present osiers are a community of large trees of age 150 and regrowth of age 40, forming two tiers. They have been referred to the largest cluster on the first stage of clustering already. Two largest stratums 47 and 96 correspond to the community of osier regrowth of age 75 and 30. They are in the same cluster. The osiers of young age (10-30), quickly changing its view on image, are presented by several phases. They have fallen into different clusters. The birch forests at age 150 years slightly differ from the osiers at the same age by sight, however they have been isolated in own clusters too.

Analysis of results has shown that these maps correspond to one obtained with use of the forest assessment. Automatic recognition of plantings on image is inferior to results of forest assessment for landscapes of deciduous and coniferous forests of different ages.

6. Управление программой.

The Parameters of choice of state of working program are detailed in menu Help of user window. Interval and step for the number of quantizing levels to search best distributions may be given. The cluster map and tables are displayed for best classifications, in other mode, for all select numbers of quantizing levels.

The General features of founded distribution (importance of distinguish ability measure, the number of quantizing levels, the number of clusters) and characteristics corresponding to clusters: modes, areas, density of probability, boundary points on spectral channels are recorded in tables and may be displayed. The map of clusters is represented as a BMP-file (of 256 colors). The algorithm is implemented in software environment of system of object-oriented programming Visual C++ versions 5.0 company Microsoft with library of classes MFC developed for OS Windows. In software development a multiple documents mode was used.

Right after the start of the program a main a window contains the panel of instruments and status line. PROGRAM control is realized by means of commands of a menu. W hile moving the cursor on commands of menu or commands of panel of instruments the short description of function, connected with this command is displayed in left part of status line.

The Menu of main window consists of commands:File, View, Help. Each of these commands is, in turn, revealed menu with commands:
File:New... - creating the new empty affiliated window,
Open... - reading of existing file of an image or file-tables of clusters,
View:ToolBar - control of instrument panels visibility,
StatusBar - control of status line visibility,
Help: About... - information about program.
Using… - detailed description of program, input parameters and output results.
In the main window one can see the prompt: Open the file of image. The Menu File will help to choose the file for processing. When striking on command File/Open in the main window the affiliated window appears. When making the affiliated window a menu of main window is changed on menu of affiliated window by default, consisting of commands File, Clust, Equal, View, Window, Help. Each of these menus contains the submenu.
The Menu File contains the same commands, as the menu of main window, supplemented by commands
File:Close - close the file,
Save - save the file,
Save as - save the file under another name.
Print- print,
PrintPreview - preview of print,
PrintSetup - installing the parameters of print.
This group of usual commands of print allows printing an image or a table. We will notice that PrintPreview allows seeing the whole image.
Equal – image for given channel image is equalized.
Clust:ClustGist+Text - starts the program, which computes the selected textural features of image (if such are used); computes window size for statistician gathering, forms the total vectors (spectral + texture); builds multivariate the histogram of vector space, classifies vectors for the different numbers of quantizing levels of vector space, finds several best classification on the quality measure. Unites false clusters with nearby. For each chosen or got level (depending on given mode), algorithm builds the map an of clusters and shows this map in a new windo.
Also a table of features of resulted clusters is shown in a new window automatically. Clicking on left button on the specified position of cluster map, color of the corresponding cluster is changed by inverted one and cluster number is derived; according this number the cluster features can be defined in the table.
ClustTable - additional possibility to show the current table.
ClustMap- additional possibility to show the current cluster map.
RestoreMap- returns the original coloration to the clusters.
Window: New... -making the new window with a view of current (active) document,
Cascade - cascade location of windows,
Tile - dispose the windows horizontally (one under another),
ArrangeIcons - arranges the convolute windows of tables and images in the left lower corner of the main window.
The content of menu Help remains unchangeable.