ebook img

DTIC ADA494641: Ozturk Algorithm Augmentations PDF

0.83 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview DTIC ADA494641: Ozturk Algorithm Augmentations

AFRL-RI-RS-TR-2009-23 In-House Final Technical Report January 2009 OZTURK ALGORITHM AUGMENTATIONS APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. STINFO COPY AIR FORCE RESEARCH LABORATORY INFORMATION DIRECTORATE ROME RESEARCH SITE ROME, NEW YORK NOTICE AND SIGNATURE PAGE Using Government drawings, specifications, or other data included in this document for any purpose other than Government procurement does not in any way obligate the U.S. Government. The fact that the Government formulated or supplied the drawings, specifications, or other data does not license the holder or any other person or corporation; or convey any rights or permission to manufacture, use, or sell any patented invention that may relate to them. This report was cleared for public release by the 88th ABW, Wright-Patterson AFB Public Affairs Office and is available to the general public, including foreign nationals. Copies may be obtained from the Defense Technical Information Center (DTIC) (http://www.dtic.mil). AFRL-RI-RS-TR-2009-23 HAS BEEN REVIEWED AND IS APPROVED FOR PUBLICATION IN ACCORDANCE WITH ASSIGNED DISTRIBUTION STATEMENT. FOR THE DIRECTOR: /s/ /s/ STEVEN JOHNS, Chief JOSEPH CAMERA, Chief Multi-Sensor Exploitation Branch Information & Intelligence Exploitation Division Information Directorate This report is published in the interest of scientific and technical information exchange, and its publication does not constitute the Government’s approval or disapproval of its ideas or findings. Form Approved REPORT DOCUMENTATION PAGE OMB No. 0704-0188 Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden to Washington Headquarters Service, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington, DC 20503. PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS. 1. REPORT DATE (DD-MM-YYYY) 2. REPORT TYPE 3. DATES COVERED (From - To) JAN 09 Final May 07 – Dec 08 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER In-House OZTURK ALGORITHM AUGMENTATIONS 5b. GRANT NUMBER N/A 5c. PROGRAM ELEMENT NUMBER 62702F 6. AUTHOR(S) 5d. PROJECT NUMBER 459E Steven Salerno 5e. TASK NUMBER PR 5f. WORK UNIT NUMBER OJ 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION REPORT NUMBER AFRL/RIEC 525 Brooks Rd. N/A Rome NY 13441-4505 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR'S ACRONYM(S) N/A AFRL/RIEC 525 Brooks Rd. 11. SPONSORING/MONITORING AGENCY REPORT NUMBER Rome NY 13441-4505 AFRL-RI-RS-TR-2009-23 12. DISTRIBUTION AVAILABILITY STATEMENT APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. PA#88ABW-2009-0176 13. SUPPLEMENTARY NOTES 14. ABSTRACT In this effort, augmentations to the Ozturk Algorithm were investigated, including additional graphical representations and determination of data correlation effects as introduced by data filtering. A graphical user interface was developed to demonstrate the algorithm concepts. 15. SUBJECT TERMS Ozturk Algorithm, Rank-ordered statistics 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF RESPONSIBLE PERSON ABSTRACT OF PAGES Andrew J. Noga a. REPORT b. ABSTRACT c. THIS PAGE 19b. TELEPHONE NUMBER (Include area code) UU 32 U U U N/A Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std. Z39.18 Table of Contents  1.0 Introduction ______________________________________________ 1  1.1 Overview of the Ozturk Algorithm ___________________________ 1  1.2 Summer Overview _________________________________________ 2  2.0 PDF Identification & Analysis – Ozturk Algorithm _____________ 2  2.1 Procedure – Ozturk Algorithm ______________________________ 3  2.2 Extension of Features – Ozturk Algorithm _____________________ 7  2.2.1 Extension – Plotting Techniques ____________________________ 8  2.2.2 Extension – Correlation Test ______________________________ 13  2.2.3 Extension – Multi-Distribution Identification ________________ 18  3.0 GUI Implementation ______________________________________ 19  3.1 GUI – Normality Test _____________________________________ 20  3.2 GUI – Identification Charts ________________________________ 21  3.3 GUI – Analysis with Identification Charts ____________________ 22  3.4 GUI – Correlation Test ____________________________________ 22  3.5 GUI – User Information Tabs ______________________________ 23  4.0 Future Work ____________________________________________ 25  4.1 Acknowledgements _______________________________________ 26  4.2 References ______________________________________________ 27  i List of Figures  Figure 2.1 – Vector Chart for Normal (n = 6) .................................................................... 4  Figure 2.2 – Sample Data with Confidence Ellipses .......................................................... 6  Figure 2.3 – Distribution Identification Chart (Old)[1] ........................................................ 8  Figure 2.4 – Distribution Identification Chart (New) ......................................................... 9  Figure 2.5 – 3-D Distribution Identification Chart ............................................................. 9  Figure 2.6 – Multiple 3-D Distribution Identification Charts ........................................... 10  Figure 2.7 – Regular ID Chart with Data .......................................................................... 11  Figure 2.8 – Bivariate Histogram with Data[6] .................................................................. 11  Figure 2.9 – Line Contour Chart with Data ...................................................................... 12  Figure 2.11 – FFT of Uncorrelated/Correlated Data ........................................................ 15  Figure 2.12 – Low pass FFT Real and Imaginary Plots ................................................... 16  Figure 2.13 – Bandpass FFT Real and Imaginary Plots ................................................... 16  Figure 2.14 – ROC Curve of Correlation Test .................................................................. 17  Figure 2.15 – Laplace(0,0.625) v Normal(0,1) ................................................................. 18  Figure 3.1 – Updated GUI Main Window ........................................................................ 20  Figure 3.2 – GUI Normality Test Panel ............................................................................ 21  Figure 3.3 – GUI Normality Test Plot .............................................................................. 21  Figure 3.4 – GUI Identification Charts Panel ................................................................... 22  Figure 3.5 – GUI Analysis Panel ...................................................................................... 22  Figure 3.6 – GUI Correlation Test Plot Area .................................................................... 23  Figure 3.7 – GUI User Information Tabs.......................................................................... 24  ii 1.0 Introduction This report covers work initiated in the Summer of 2006 and completed in the Summer of 2008 as a summer engineering aide. Work in the applications of ranked- ordered statistics as a method of signal data analysis was conducted. More specifically work was conducted on the Ozturk algorithm, which is considered to have demonstrated superior performance over histogram and simple parameter-based techniques [1-4]. The work includes the research and implementation of new techniques based on the Ozturk algorithm. A GUI was also created in order to house the previously done work, [5] and the newly created implementation tools. 1.1 Overview of the Ozturk Algorithm The Ozturk algorithm is an identification algorithm that allows for a variety of tests that not only identify a best fit probability density function (PDF), but also analyzes the sample data. The core algorithm was developed by Aydin Ozturk [3], and further worked upon in concurrence with E.J. Dudewicz and others [4]. Prior work was carried out with the help of Syracuse University and the only documented implementation of the algorithm for real world use was associated with research conducted at the AFRL Sensors Directorate at the Rome Research Site. The basis of the algorithm deals with constructing a null-linked graph of vectors from the rank ordered statistics of the sample data. The algorithm is scale and location invariant, which makes it particularly attractive. From this graph one can easily identify a distribution based on its endpoint, Q , and surrounding null points. From the ordered n statistics other tests can be carried out, including but not limiting to, testing for Normality, parameter estimation, identifying the best fit distribution and the newest test, the detection of correlation in Gaussian data. The power of the test comes from the ability to use small sample sizes while still being able to correctly identify the best fit distribution. The entirety of the algorithm will be discussed in Section 2. 1 1.2 Summer Overview Much of the work done the Summer of 2006 was based on the previous work done in the Summer of 2005. Since there was now a working GUI implementation of the Ozturk algorithm, it was time to both test the algorithm’s capabilities and limits. In [1-4] there has been no mention of the result of correlated data. This produces a potential problem with real world data, since it is not always uncorrelated. When you have this real world data, how do you determine whether or not it is correlated, and whether or not the correlation is strong enough to affect the algorithm? There needs to be either a pre- processing step or test to determine if the data is correlated, and if it is how strongly it is correlated. The second major constraint is that, there are distributions that can mimic other distributions if the parameters are set in a specific manner. In this case the algorithm can only identify one distribution, while there may actually be multiple distributions the data could be constructed from. Both of these issues were the basis for most of the work, along with the full construction of a new Matlab GUI to provide the user the ability to implement the new tools. 2.0 PDF Identification & Analysis – Ozturk Algorithm Most statistical tests that are used presently test a data set for a specific distribution. Often the test is designed specifically to assess the normality (Gaussian PDF is assumed) of a data set. However when the solution does not match the hypothesis no alternative possibility is given. This results in the tedious task of having to do multiple statistical tests in order to identify the best fit distribution. Also a majority of the tests start to lose their power when the sample size decreases. Both these common issues are addressed through the Ozturk algorithm [1-4]. The algorithm uses its own test statistics to create “linked vectors” to create a goodness-of-fit test for univariate and multivariate distributions. 2 2.1 Procedure – Ozturk Algorithm General reference is made to [2, 3]. Let X ≤ X ≤ … ≤ X be the sample 1:n 2:n n:n ordered observations from a distribution with the cumulative distribution function (CDF) F(x;μ, σ) = F{(x- μ)/σ} where μ and σ are the location and scale parameters, respectively. For our case we use the Normal distribution as the null hypothesis; however any distribution could be used as the null. For illustration we will assume that X ≤ X ≤ … ≤ X is from a Normal 1:n 2:n n:n distribution with an unknown mean μ and unknown variance σ2. To make the sample linked vectors we need two main components, the length of the ιth vector, and the angle to the horizontal axis θ. Let Y =(X − X)/S 2.1 i:n i where X = ∑X /n and S ={∑(X − X)2 /(n−1)}1/2. Y is now considered the i:n i:n i:n length of the vectors. Now let m , m , …, m be equivalent to the expected order statistics of the 1:n 2:n n:n standard Normal distribution, m = E((X - μ)/σ). The angle horizontal to the x-axis is θ i:n i:n i = πФ(m ) where π = 3.14159… and i:n m i:n Φ(m ) = ∫(2π)−1/2e(−t2/2)dt. 2.2 i:n −∞ Since m < m < … < m and 0 < Ф(m ) < 1 then 0 < θ < θ < … < θ < π. θ is 1:n 2:n n:n i:n 1 2 n i:n now considered the angle between the x-axis and the ιth vector. To obtain the sample linked vectors we use the points Q = (U,V), where i i i i i U = (∑|Y |Cosθ)/nand V = (∑|Y |Sinθ)/nare the coordinates of Q (i = i j:n j i j:n j ι j=1 j=1 1,…,n) and Q = (0,0). U and V can then be defined as the test statistics, as following 0 i:n i:n n U =1/n∑|Y |Cosθ , 2.3 i:n i:n i i=1 n V =1/n∑|Y |Sinθ . 2.4 i:n i:n i i=1 3 To obtain the null linked vectors to construct the “known” of the null hypothesis m΄ = E(|Y |), and E(m ) are used. To obtain the expected values equation 2.5 must be i:n i:n i:n numerically integrated with the specific n size of the data to generate the values. n! ∞ ⎡1 ⎤k−1⎡1 ⎤n−k E(x ) = ∫x −Φ(x) +Φ(x) φ(x)dx, 2.5 k|n (n−k)!(k −1)! ⎢⎣2 ⎥⎦ ⎢⎣2 ⎥⎦ −∞ x where φ(x) =(2π)−1/2e−x2/2 and Φ(x) = ∫φ(x)dx. 0 Using the expected values of m and Y we obtain the point Q = (U ,V ) which i:n i:n n n n becomes the null-linked vectors endpoint that is the “known”, other extensions use this information for analysis. In a sample that we assume to be Normal, we would expect that the sample linked vectors would closely follow the trajectory of the null-linked vectors, as in Figure 2.1. vv nn== 66 QQ00QQnn θθ66 VV θθ55 θθ44 θθ33 θθ22 θθ11 uu UU Figure 2.1 – Vector Chart for Normal (n = 6) However if the sample is not Normal the two paths should differ; how much they differ by depends on the sample distribution. This use of the algorithm allows for an ad hoc version (visual inspection) to be used and allow for distribution identification. This ad hoc method can be used; however it is rarely done due to the fact that the algorithm generates its own set of test statistics that can with far better accuracy provide an answer to the best fit distribution of a data set in an automated fashion. Because of the properties of the test statistics generated by the Ozturk algorithm, we can generate a set of equations to model their behavior. Using m΄ = E(|Y |), we i:n i:n 4 define the expected value for Q for the Normal distribution to be E(Q ) = E(U , V ) = n n n n n (0,(1/n)∑m' Sinθ). Symmetric identities for the Normal distribution can be i:n i i=1 employed to see that E(U ) = 0, and a model can be fitted to V to obtain n E(V ) =0.326601+0.412921/n. 2.6 n These models can also be applied to other symmetric distributions as well. It is seen that, as long as a distribution is symmetric; E(U ) = 0. With this, models can be provided for n the E(V ) for any other symmetric distribution. Also using more properties of the test n statistics and the linked vectors we can obtain models for the variances of U and V n n Var(U )=σ2 =0.02123/n+0.01765/n2, 2.7 n u Var(V )=σ2 =0.04427/n−0.0951/n2. 2.8 n v For large values of n it can be shown that the distributions of U and V are both Normal. n n Since Q is the joint distribution of U and V , we can apply the bivariate normal n n n distribution to the Q endpoint. Using the exponent component of the general bivariate n normal we have 1 ⎡(U −μ)2 (U −μ)(V −μ) (V −μ)2⎤ ⎢ n u −2ρ n u n v + n v ⎥, 2.9 2(1−ρ2) σ2 σσ σ2 ⎣ ⎦ u u v v where μ and μ are the means, σ 2 and σ 2 are the variance of U and V respectively, and ρ 1 2 1 2 is the coefficient of correlation between U and V . Under the assumption that μ is n n u always 0 and that value of ρ between U and V is 0, we obtain the simplified version of n n equation 2.9, U2 (V −μ)2 n + n v =−2lnα, 2.10 σ2 σ2 u v where α is the confidence level. For any given sample size, μ , σ 2, and σ 2 can be v u v obtained from equations 2.6, 2.7 and 2.8, respectively. With these values equation 2.10 can be solved for α, and return the specific α value for each set of endpoints of a given data set. Setting equation 2.10 to a set of ellipses, generated by varying the value of α, and putting it onto the null endpoint Q , we can create a set of 100(1- α) % confidence n ellipses around the Normal distribution. With these confidence ellipses we can now set thresholds on the sample data in order to differentiate between potential Gaussian data, or 5

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.