Satellite image classification and segmentation using non-additive entropy Lucas Assirati,1 Alexandre Souto Martinez,2 and Odemir Martinez Bruno1 1)Scientific Computing Group, S˜ao Carlos Institute of Physics, University of S˜ao Paulo (USP), cx 369 13560-970 S˜ao Carlos, S˜ao Paulo, Brazil http://www.scg.ifsc.usp.br email: [email protected], [email protected] 2)Faculdade de Filosofia, Ciˆencias e Letras de Ribeira˜o Preto (FFCLRP), Universidade de S˜ao Paulo (USP), Avenida Bandeirantes, 3900, 14040-901 Ribeira˜o Preto, SP, Brazil - Instituto Nacional de Ciˆencias e Tecnologia de Sistemas Complexos email: asmartinez@ffclrp.usp.br Here we compare the Boltzmann-Gibbs-Shannon (standard) with the Tsallis entropy on the pattern recog- nition and segmentation of coloured images obtained by satellites, via “Google Earth”. By segmentation we mean split an image to locate regions of interest. Here, we discriminate and define an image partition 4 classesaccordingto a training basis. This training basis consists of three pattern classes: aquatic,urbanand 1 vegetationregions. Ournumericalexperimentsdemonstratethatthe Tsallisentropy,usedasafeaturevector 0 composed of distinct entropic indexes q outperforms the standard entropy. There are several applications 2 of our proposed methodology, once satellite images can be used to monitor migration form rural to urban n regions, agricultural activities, oil spreading on the ocean etc. a J Keywords: Entropy, Segmentation, Satellite images 0 1 I. INTRODUCTION Figure 1 shows an grayscale image and the histogram ] p˜(x) produced from this image: V Image pattern recognition is a common issue in C medicine, biology, geography etc, in short, in domains . s thatproducehugedatainimagesformat. Entropy,inits 0.012 c 0.01 origins is interpreted as a disorder measure. Neverthe- [ less,nowadaysitisinterpretedasthelackofinformation. x)0.008 1 Thus, it has been used as a methodology to measure the p(0.006 v 0.004 information content of a signal or an image. In image 6 0.002 analysis, the greater the entropy is, the more irregular 1 00 50 100 150 200 250 and patternless a given image is. The additive property INTENSITY 4 2 of the standard entropy allows its use in several situa- Figure 1. Image in gray levels and histogram p˜(x) . tions just by summing up image characteristics. Among 1 the non-additive entropies, we study the Tsallis entropy, 0 To properly use the entropic indexes, one must con- 4 which has been proposed to extent the scope of appli- sider normalized quantities: p(x) = p˜(x)/(L × L ), x y 1 cation of classical statistical physics. Here, we compare so that normalization condition 255 p(x) = 1 is : the additive Boltzmann-Gibbs-Shannon(standard)1 and x=0 v non-additive Tsallis entropy2 when dealing with colored satisfied. The standard entropyPof this image is: i H =− 255 p(x)lnp(x)= 255 p(x)ln(1/p(x)). X satellite images. x=0 x=0 ImagPes with low details,Pproduce empty histograms r We start defining the standard entropy for black and a andgeneratealow entropyvalue while imageswithhigh whiteimagesandwesimplyextenditsusetocolouredim- detailsproduceabetterfilledhistogram,generatinghigh ages,justifiedbyitsadditiveproperty. Next,weconsider entropy values. Figure 2 illustrates the comparison: the Tsallis entropy for black and white images and ex- tend it to coloured images. Due to non-additiveness, we callattentiontosomecharacteristicsthathelptoqualify these images more efficiently that the standard entropy. 0.03 II. NON-ADDITIVE ENTROPY 0.6 0.025 0.5 0.02 p(x)0.4 p(x)0.015 Firstly,consideranblackandwhiteimagewithL ×L 0.3 x y 0.2 0.01 pixels. The integers i ∈[1,Lx] and j ∈ [1,Ly] run along 0.1 0.005 thexˆandyˆdirections,respectively. Lettheintegerp˜i,j ∈ 00 50 IN1T00ENS15I0TY200 250 00 50 IN1T00ENS15I0TY 200 250 [0,255] represent the image gray levels intensity of pixel H = 0.078 H = 6.487 (i,j). The histograms p˜(x) of a gray levels image are obtained by counting the number of pixels with a given Figure 2. Comparison between images with low and high intensity p˜ . entropy i,j 2 Forcoloredimages,agivenpixelhasthreecomponents: of as a finite set, previously arrived by a human. In red (k = 1), green (k = 2) and blue (k = 3), and the practice, a certain segment of data is labelled with these integerintensity concerningeachoneofthese coloursare classifications. The classifier task is search for patterns written as p˜ ∈[0,255], so that k =1,2,3. This leads and classify a sample as one of the database classes. i,j,k to different histograms for each color: p (x), and hence To performthis classification,classifiersusually uses a k different entropies for each color: H , with k =1,2,3. feature vectorthat comes from a method of data extrac- k FortwoimagesAandB,foragivencolor,theentropy tion. Here, we use the multi-q analyses method67 that of the composed image, is the entropy of one image plus composesafeaturevectorusingcertainq-entropyvalues: the other Hk(A + B) = Hk(A) + Hk(B). This is the S~q =(Sq1(1);Sq1(2);Sq1(3);...;Sqn(1);Sqn(2);Sqn(3)). additivity property of the standard entropy, which leads Thereasontousethemulti-q analysisisthatafeature to: vectorgivesusmoreandricherinformationthanasingle value of entropy. The correctchoice of q indexes empha- 255 1 size characteristics and provide better classifications. H = p (x)ln ,k =1,2,3. (1) k k (cid:18)p (x)(cid:19) Thefollowingstepsdescribeimagetreatment,training xX=0 k and validation: Secondly, consider an black and white image men- • Using Google Earthsoftware, capture images from tioned before. The Tsallis entropy is for it general- izes the standard entropy3: S = 255 p(x)ln (1/p(x)), several locations (Figure 3); q x=0 q where the generalized logarithmiPc function is lnq(x) = • each image must be segmented in 16× 16 pixels (xq−1 −1)/(q−1), so that, as q → 1, one retrieves the partitions; standard logarithm, consequently the standard entropy. To build a feature vector, one simply uses n differ- • for each partition the colors Red, Green and Blue ent entropic values: S~ =(S ,S ,...,S ), are written in a tridimensional array; bw bw,q1 bw,q2 bw,qn so that n = 1 and q = 1, one retrieves the standard • for each array and for each color, histograms are entropy image qualifier. Notice the richness introduced built and the Tsallis entropies (Eq. 2) are calcu- by this qualifier. If n = 1, we have already an infinity lated, for q ∈[0,2] in steps of 1/10; rangeofentropyindexestoaddress. Thisrichnessisam- plified for n > 1, considering instances of : q < 1, q = 1 • the feature vector is created and the classifiers k- and q >14. nearestneighbors(KNN),SupportVectorMachine Since ln (x x ) = ln (x ) + ln (x ) + (1 − (SVM) and Best-First Decision Tree (BFTree) are q 1 2 q 1 q 2 q)ln (x )ln (x ),seeRef.5,S isnon-additiveleading applied; q 1 q 2 bw,q to interesting results when composing two images A and • an output image are delivered with the segmented B. TheentropyofthecomposedimageisS (A+B)= bw,q partitionshighlighted(aquaticregion=yellow,ur- S (A)+S (B)+(1−q)S (A)S (B),which,for bw,q bw,q bw,q bw,q ban region = cyan, vegetation regions = magenta) q 6= 1 is not simply summation of two entropic values. according with the classification of KNN classifier. This property leads to different entropic values depend- ingonhowonepartitionsagivenimage. Thefinalimage entropy is not simply to summation of the entropy of all its partitions, but it depends on the sizes of these parti- tions. (a) For coloredimages,we proceedas before, we calculate the entropy of each color component, in principle with different entropy indices values: q(r), q(g) and q(b). For sake of simplicity, we consider the same entropic index (b) for all the color components. For color k the entropy is: 255 1 S (k)= p (x)ln , (2) q k q (cid:18)p (x)(cid:19) xX=0 k (c) so that so that k =1,2,3 retrieves Eq. (1), for q =1. III. METHODOLOGY Figure3. Imagesobtained from Google Earth,from different Considering pattern recognition in images, the main regions. (a) Urban, (b) Aquatic,(c) Vegetation objective is to classify a given sample according to a set of classes from a database. In supervised learning, the Figure 4 outlines the steps of the methodology pre- classesarepredetermined. Theseclassescanbeconceived sented: 3 Urban Aquatic Vegetation Figure 4. schematic drawing of themethodology Table 1 presentsthe hit rate percentageof eachclassi- fier evaluatedforthe 3methods: multi-q analysis,multi- q analysis with attribute selection and standard entropy analysis. Since the use of a feature vector gives us more informationthanasingleentropyvalueitalsogivessome redundant information. In this context, the feature se- lection is important to eliminates those redundancies. (a)OriginalImage (b)KNN with1neigbour clas- sifier TableI.Severalclassifiersareused(SVM,KNNandBFTree) to compare the performance of the generalised entropy with respecttothestandardoneinpatternrecognition. Thenum- ber of features of each method is indicated in parenthesis. *a❳Cttrl❳aibsus❳tifie❳esrel❳ect❳Mione❳th❳o❳d Multi-q (60) Multi-q * (8) BGS (3) SVM 69.60 % 68.96 % 65.60 % (c)KNN with 3 neigbour clas- KNN(1 neighbour) 70.80 % 69.76 % 63.04 % sifier KNN(3 neighbours) 72.32 % 72.96 % 64.00 % Figure 5. Segmentation obtained by Multi-q method and KNN(5 neighbours) 72.80 % 73.28 % 64.16 % highlights provided by KNN classifier. The yellow color in- KNN(7 neighbours) 74.88 % 72.96 % 68.48 % dicates an aquatic region, the cyan color indicates an urban BFTree 72.16 % 72.80 % 67.36 % region and themagenta color indicates a vegetation region. Figure 5 depicts image highlights produced by KNN method, evaluated in a region that contains the three types of pattern classes: aquatic, urban and vegetation regions. 4 IV. CONCLUSION REFERENCES Our study indicates that the Tsallis non-additive en- 1C.E.Shannon,“AMathematicalTheoryofCommunication,”The tropy can be successfully used in the construction of a BellSystem TechnicalJournal,27,379–423–623–656 (1948). 2C. Tsallis, “Possible generalization of Boltzmann-Gibbs Statis- featurevector,concerningcolouredsatelliteimages. This tics,”JournalofStatistical Physics,52,479–487(1988). entropygeneralizesthe Boltzmann-Gibbsone,whichcan 3C. Tsallis, “The nonadditive entropy sq and its applications in be retrieved with q = 1. For q 6= 1, the image retrieval physics and elsewhere: Some remarks,” Entropy, 13, 1765–1804 success is better that the standard case (q = 1), once (2011). the entropicparameterq allowsthorougherimageexplo- 4A. L. Barbieri, G. F. de Arruda, F. A. Rodrigues, O. M. Bruno, and L.daFontoura Costa, “Anentropy-based approach toauto- ration. maticimagesegmentationofsatelliteimages,”PhysicaA:Statis- ticalMechanics anditsApplications,390,512–518 (2011). 5E.P.Borges,“Apossibledeformedalgebraandcalculus inspired ACKNOWLEDGMENTS innonextensivethermostatistics,”PhysicaA:StatisticalMechan- icsanditsApplications,340,95–101(2004). 6R. Fabbri, W. N. Gon¸calves, F. J. Lopes, and O. M. Bruno, Lucas Assirati acknowledges the Confederation of As- “Multi-q pattern analysis: A case study in image classification,” sociations in the Private Employment Sector (CAPES) PhysicaA:StatisticalMechanicsanditsApplications,391,4487– Grant . Odemir M. Bruno are grateful for Sa˜o Paulo 4496(2012). 7R. Fabbri, I. N. Bastos, F. D. M. Neto, F. J. P. Lopes, W. N. Research Foundation, grant No.: 2011/23112-3. Bruno Gon¸calves, and O. M. Bruno, “Multi-q pattern classification of also acknowledges the National Council for Scientific polarizationcurves,”arXiv,abs/1305.2876(2013). and Technological Development (CNPq), grant Nos. 308449/2010-0and 473893/2010-0.