Hierarchical Clustering Algorithm to Analyze Areas Contaminated Waste Productions on Satellite Remote Sensing Data .



We consider the application of the dividend hierarchical histogram algorithm proposed VS Sidorova for mapping areas of pollution on the spectral features.

If the data of the spectral remote sensing of the earth contaminated areas constitute clusters in some spectral portion, then it is possible to apply cluster analysis. Conducting a global segmentation by clustering results allow visually compare the received field clusters with known ground observations. If some of the objects of a cluster is known, then the cluster analysis will find it and other objects more accurately track an object, will also "play" with the color clusters and more clearly see them. In addition, cluster analysis determines the spectral features of clusters as characteristics of the most dense (modal) of the cluster variation in each spectral channel area clusters and others. Clustering the more urgent that the multispectral remote sensing data have a huge amount.

Clustering large amount of remote sensing data is usually carried out in two ways: on the K centers (to be known in advance the number of clusters K and the position of their centers) and histogram. Here we consider the histogram algorithm. Unlike K-centers, the histogram (function of the value of a multidimensional vector of spectral features) directly displays features of thickening. The most popular is the histogram algorithm Narendra [1]. Multidimensional histogram regarded it as an approximation of the probability density of feature vectors. This is non parametric algorithm, it does not require any assumptions in advance about any form of distribution, nor the number of clusters. This algorithm is fast and simultaneously solves all the problems of clustering: find the local maxima of the histogram and the corresponding modal vectors, divides the space vectors for unimodal clusters to draw the line between the clusters in the valleys of the histogram. Dignity algorithm is the present different stocking only vector in the list is not empty and the multidimensional space for them. In addition, this list is ordered in a special way, and it provides high-speed algorithm.

Generally, a clustering algorithm is not incorporated a mechanism to evaluate the separability of clusters. However the reliability, the quality of the distribution of the vector valued precisely separability of clusters [2]. That is, after clustering required additional work to assess the separability and bad mergers separated clusters. But this merger lost unimodal clusters. Therefore, the author has proposed the following approach [3]. For different granularity feature space built cluster distributions and compared the average separability of its clusters. Thus, there remains unimodal clusters obtained, albeit in a new system averaged vectors. The method of averaging vectors and measures to evaluate the separability of clusters are provided [3]. Method detail or averaging vectors was reduced to a gradual increase in the number of quantization levels of the vector feature space [3]. Further development of the algorithm was associated with the differentiation approach to the different areas of the data. Mathematically, this resulted in what has been developed hierarchical divisible histogram algorithm. It has been observed that for different data requires different granularity to obtain the best separability clusters. To study the interaction of detail and separability and more thorough study of the complex hierarchical structure of remote sensing data has been suggested: carry out hierarchical clustering so that the clusters do not get worse given the separation d, and thus find the most detailed view of the data, different in different areas of the data[4,5].

Another important aspect is the choice of wavelengths, especially if a lot of channels. The eigenvalues ​​of the covariance matrix of the data vectors characterize the spectral dispersion of the vectors in the direction of the corresponding eigenvector (for a normal distribution of vectors). If you choose the quantization hypercube cell (namely, this form of the cell provides the smallest loss of information in the quantization), the eigenvalues ​​must be proportional to the numbers of quantization levels in the respective measurements of your own space. If you know the maximum number of quantization levels in a certain dimension, it is possible to calculate the absolute values ​​in the other dimensions. Viewed clustering algorithm finds this number for a given minimum separation of each cluster. Therefore it is possible to exclude some measure (for which the number of quantization levels is less than two), and thereby reduce the dimension of the space of its own spectral features [6].

Image of the Omsk region in seven spectral channels from the satellite satellite "Landsat-8" (15 m resolution, 02/08/2014) courtesy of the Siberian center of FSUE "SRC" PLANET "(Figure 1). A new clustering algorithm performs a pre-reduction of the dimension of the vector space of spectral features from seven to three. These three components of your own space vector is a linear combination of the original (in the visible and infrared). It was shown that they are sufficient for the required detail clustering. Detail, different in the production of clusters is determined divisible Histogram hierarchical algorithm for limiting the separation of clusters d = 0.15 (0

Fig. 1. Image (Omsk region.) in seven spectral channels from the satellite "Landsat-8".
The first three channels in the visible range (RGB), the rest in the infrared.
Fig. 2. The first stage of the hierarchy is shown. 6 clusters are obtained. Two of them (red and black) correspond fumes of the CHP of the Omsk region .
Fig. 3. Clustering for ten stages of the hierarchy is shown. Received 27 unimodal clusters. Different colors of clusters of snow in the Omsk region (pink, yellow, dark gray)
are considered by experts as the relevant areas of varying degrees of contamination of snow. Red and purple tones corresponds fumes.