Last Update : 12 April 2013 17:25

Русская версия

Image Processing Laboratory

Instute of Computational Mathematics and Mathematical Geophysics SB RAS



Hierarchical Cluster Algorithm for Аautomation of Remote Sensing Data Recognition

 V.S. Sidorova


 The histogram cluster algorithm, which creates hierarchy of distributions with clusters separated best, is proposed. The new hierarchical algorithm optimizes average cluster separability at the choice of the different quantizing nets system for subareas of feature vector space. It allows us to study the structure of the data and get a small number of well-separated clusters. Application to unsupervised classification of land cover data is illustrated.

The algorithm is implemented in software environment of system of object-oriented programming Visual C++ versions 5.0 company Microsoft with library of classes MFC developed for OS Windows. In software development a multiple documents mode was used.

Considered remote sensing date has large volume, high density and significant correlation for different types of land cover. The histogram approximates the probability density of this data. The fast nonparametric Narendra technique separates multidimensional vector space of the features into the unimodal clusters. Modal vectors correspond to local maxima of histogram, clusters boundaries correspond to valleys that are areas of law density of vector space. Description of Narendra technique [1] and demonstration version of its using for clustering satellite image are brought on site http://loi.sscc.en/lab/Weblab/LeraKlas/DEMRU/DemonEn.htm.

The Narendra algorithm quantizes discrete vector space preliminary for reduction of data volume and number of clusters. The automatic choice of number of quantization levels which corresponds to best cluster separability was proposed in [2].  The quality measure for each individual unimodal cluster and the quality measure of the overall distribution is calculated as an average of over all K clusters was defined in [2] also. Measure of cluster separability allows cluster validity for unimodal clusters disposed closely. The nature of investigated types of land cover is that spectral features are combined into clusters closely verging in repressing majority data.

But data can be heterogeneous and some areas of feature space may be of different cluster structure and separability. New hierarchical algorithm proposed makes the automatic choice of different numbers of quantization levels for various areas of a given vector space depending on cluster separability in them. It uses the minimization of the average separability for all obtained clusters. Hierarchical technique first finds number of quantization levels at which unimodal clusters are separated best. Then, it finds new own number of quantization levels, greater than beside parent, within each cluster obtained, and own best cluster distribution, and so on.

Due separability measure of one cluster does not depend on other, the measure of separability for K sub-clusters is determined as the average separability of these sub-clusters for the hierarchical algorithm too. The goal of the proposed hierarchical algorithm is a choice of such general distribution, that the separability of the aggregate of all obtained sub-clusters is the best, when the measure of separability of all sub-clusters has reached the minimum.

 Example 1

Consider the fragment of an image the Earth's surface, obtained by the satellite NOAA 24 April 2003. (Figure.1). It is taken two channels for illustration: the near-infrared R and the blue of the visible spectrum B. Occupy the upper part of the image is melting snow, at the bottom – the surface without snow almost. Figure 2 shows classification of melting snow, made-valued Russian Hydromet  according to ground measurements in the mentioned square. Figure 3 shows the cluster map obtained with proposed  hierarchical algorithm. The best classification corresponds to the minimum measures 0.07, the number of clusters K = 13. This minimum is obtained for the fourth stage of the hierarchy. The number of quantization levels for the snow-covered areas has reached n = 23, and for thawing n = 48. Clusters correspond to snow are in a good agreement with the classification scheme of Figure 2. The bottom part of the image (melted) agrees with four clusters. Cluster 13 is the coniferous forest. The boundaries of cluster 11 are agreeing closely with borders of Kazakh hills due to elevation of the area apparently.

For comparison, to reach the same detail of classification for melted part (n = 48), the basic algorithm (without hierarchy) have got the number of clusters K=55 and value measure of separability 0,43


Example 2

 In Fig. 5a there is the black-and-white aerial photograph of deciduous forest landscape of Western Siberia. The resolution of electronic version of the image is 5×5 sq.m/pixel.  And the size is 1178×1157 pixels.  The skeleton map of the forest blocks obtained due to ground afforestation inspection is shown in Fig. 5b. The forest blocks of osiers and poplar forests in several phases of evolution are presented. Birch forest occupies the small area at upper left. Three features were used for classification: ρ of the SAR texture model, gray value’s expectation T and standard deviation D. In Fig. 5c the final map of clusters is shown.

For the first stage of the hierarchy the minimum of the cluster separability measure equal 0, 14 for n = 16, the number of clusters K =7.  In Fig. 6a the map of these clusters is shown. The objects, referring to water surfaces, meadows and songs are in the different clusters. Nearly all forest objects have fallen into one big cluster 1. In Fig. 6b the map for the second stage of the hierarchy is shown. The number of sub-clusters is 16.

As a result of clustering is received 55 clusters, 24 of them pertain to the forest. The cluster separability measure equal 0.28. Maximum number of quantization levels is 64. Analysis of results has shown the maps of automatic recognition correspond to one obtained with use of the forest assessment.

When clusterization was carried without hierarchical method, then minimum of the measure reached 0.38 under n=61. Number of clusters was 365 herewith moreover majority of them were bad separated fine false clusters, appearing on borders of textures on image.



1. Narendra P.M. and Goldberg M.  A non-parametric clustering scheme for  LANDSAT. //   Pattern Recognition. 1977.  9. P. 207.

2.Сидорова В.С. Оценка качества классификации многоспектральных изображений гистограммным методом.//Автометрия. 2007.Том 43, №1, С. 37– 43.

3. V.S. Sidorova. Unsupervised Classification of Forest’s Image by Texture Model Features. // Pattern Recognition and Image Analysis.­ 2009.Vol. 19, No.4. pp. 698 – 703


©2013 Institute of Computational Mathematics and Mathematical Geophysics SB RAS