SPEECH ANALYSIS AND SYNTHESIS OF THE PUNJABI LANGUAGE A Thesis submitted in fulfillment of the requirement for the award of the degree of DOCTOR OF PHILOSOPHY in COMPUTER SCIENCE By DR. SURINDERPAL SINGH DHANJAL (Registration No. 9051453) Department of Computing Science Thompson Rivers University, KAMLOOPS, BC, Canada [email protected] Supervisor: DR. SATVINDER SINGH BHATIA Professor, School of Mathematics & Computer Applications Thapar University, PATIALA, Punjab, India [email protected] School of Mathematics & Computer Applications Thapar University Patiala-147004, Punjab, India August 25, 2014 Sri Satguru Jagjit Singh Ji eLibrary [email protected] ii Sri Satguru Jagjit Singh Ji eLibrary [email protected] DEDICATED TO MY PARENTS Late Rala Singh Dhanjal (Father) and Late Nand Kaur Dhanjal (Mother) MY BROTHERS Late Malkit Singh Kundan (eldest Brother) and Late Mrs. Tej Kaur Dhanjal Late Amarjit Singh Dhanjal M.A., B.Ed. (elder Brother) and Mrs. Manjit Kaur Dhanjal Amritpaul Singh Dhanjal M.A., LL.B. (elder Brother) and Mrs. Mandip Kaur Dhanjal MY SISTERS Late Amar Kaur Sond (eldest Sister) and Late Mr. Harjit Singh Sond Late Ranjit Kaur Ghattaurha (elder Sister) and Mr. Gurcharan Singh Ghattaurha Harbans Kaur Ghattaurha (elder Sister) and Late Mr. Bhagwan Singh Ghattaurha Gurcharan Kaur Dhiman (younger Sister) and Mr. Ranjit Singh Dhiman MY WIFE Balbir Kaur Dhanjal B.Sc.N. and MY ONLY CHILD Ms Samta Dhanjal B.A. (Math and Bus. Mgmt.), Post-Bac Teaching Diploma (TESL) iii Sri Satguru Jagjit Singh Ji eLibrary [email protected] I DO NOT KNOW WHAT I MAY APPEAR TO THE WORLD, BUT TO MYSELF I SEEM TO HAVE BEEN ONLY LIKE A BOY PLAYING ON THE SEA-SHORE, AND DIVERTING MYSELF IN NOW AND THEN FINDING A SMOOTHER PEBBLE OR A PRETTIER SHELL THAN ORDINARY, WHILST THE GREAT OCEAN OF TRUTH LAY ALL UNDISCOVERED BEFORE ME. - ISAAC NEWTON (1642-1727) WHAT WE KNOW IS NOT MUCH, WHAT WE DO NOT KNOW IS IMMENSE. - PIERRE-SIMON LAPLACE (1749-1827) IF WE KNEW WHAT IT WAS WE WERE DOING, IT WOULD NOT BE CALLED RESEARCH, WOULD IT? - ALBERT EINSTEIN (1879-1955) iv Sri Satguru Jagjit Singh Ji eLibrary [email protected] ACKNOWLEDGEMENTS The work leading to this doctoral thesis has been carried out by me as a research scholar at the School of Mathematics and Computer Applications (SMCA) of Thapar University (formerly Thapar Institute of Engineering & Technology), Patiala since July 2005. The time utilized to complete this project has been educational for me in the true sense of the word. First of all, I express my heartiest gratitude to my Supervisor Dr. Satvinder Singh Bhatia (Professor and former Head, School of Mathematics & Computer Applications, Thapar University, Patiala, India) for his inspiring guidance and many stimulating discussions throughout the completion of this work. In fact, he was such a constant source of inspiration that his real contribution to this project and my life cannot be expressed in words. I take this opportunity to thank Prof. Prakash Gopalan (Director, Thapar University) for providing me University's necessary resources and facilities to carry out this research work. I am deeply thankful to Dr. P. K. Bajpai (Dean, Research and Sponsored Projects) and Dr. S. K. Mohapatra (Dean, Academic Affairs), Thapar University for their support and needful help during the various stages of this research. I express my gratitude to the Doctoral committee comprising Dr. Rajesh Kumar (Head, SMCA) and Dr. Mahesh Kumar Sharma (Associate Professor, SMCA) for monitoring the progress and providing valuable suggestions for improvement of this work. I also take this opportunity to express my sincere thanks to all the faculty members and staff members in the SMCA, Thapar University, Patiala and in the Department of Computing Science, Thompson Rivers University Kamloops, Canada (in particular Dr. Faheem Ahmed, Ms. Mridula Sharma, Ms. Marcy Desrosiers, and Mr. Jagvinder Basran) for discussions, suggestions, co-operation, and technical support to complete this work. I will be missing in my duties if I do not thank Mr. Gurbinder Singh (Deputy Registrar, Academic) for his untiring help throughout my stay in Thapar University as well as when I was away to my work place at Thompson Rivers University Kamloops, Canada. Navjot Josan and Priya Shahi (Research Scholars, SMCA), Digamber Singh (SMCA), Narender Sharma, Mahender Yadav, and Surat Singh (guest house staff) at Thapar University deserve a special mention for their kind support. Many stimulating discussions with distinguished scientists at International Institute of Information Technology (IIIT), Hyderabad, India (Dr. Yegnanarayana, and Dr. Peri Bhaskararao), the continuous encouragement from Prof. Manmandar Singh (Punjabi University, Patiala), along with the technical support from Research Scholars Nivedita and Ganga Mohan (IIIT, Hyderabad), Shipra Arora and Gautam Kumar (KIIT, Gurgaon, India), Kulvinder Kular (Omni TV, Vancouver, Canada), Kirpal S. Pannu (Toronto, Canada), and Gurmail Rai (Surrey, Canada) are gratefully acknowledged. v Sri Satguru Jagjit Singh Ji eLibrary [email protected] vi Sri Satguru Jagjit Singh Ji eLibrary [email protected] ABSTRACT The present thesis entitled “Speech Analysis and Synthesis of the Punjabi Language” embodies the investigations carried out by me at the School of Mathematics and Computer Applications (SMCA), under the supervision of Dr. Satvinder Singh Bhatia (Professor, SMCA, Thapar University, Patiala). The quality of the synthesized speech, also known as (aka) synthetic speech, has been an important issue for speech researchers. This project is an attempt to address this important issue of synthesizing high quality speech in the Punjabi language. The Punjabi language is a modern Indo-Aryan language. The Punjabi language has been ranked amongst the top 15 spoken languages of the world. Over the years, this ranking has varied between 10 and 18. With more than 90 million native speakers, and more than 140 million speakers in 150 countries of the world, the Punjabi is considered amongst the world’s 10 most influential languages, and is respectfully included in an article by the same name, “The World’s 10 Most Influential Languages” by George Weber. Andronov reports that the Institute of Oriental Studies of the Russian Academy of Sciences has prepared and preserved dozens of reports on the Punjabi language. Any research on the Punjabi language, therefore, assumes an international significance. There are many dialects of the Punjabi language mainly written in the two scripts. Punjabi University (Patiala, India) published a list of 31 dialects of the Punjabi language. In the Punjab state of India, the Punjabi Language is written in the Gurmukhi script, whereas in the Punjab state of Pakistan, the Punjabi language is written in the Persian (or Shahmukhi) script. It will be an enormous task to deal with all dialects of both scripts of the Punjabi language. This research work, therefore, concentrates only on one dialect (the Malwai dialect) and one script (the Gurmukhi script) of the Punjabi language. This research work has been carried out with the following three objectives in view as set in the Institute Research Board (IRB): (i) To design a new phonetic alphabet for speech processing in the Punjabi language consistent with the ARPAbet because at present, the symbols consistent with the ARPAbet phonetic transcription do not exist in the Punjabi Language. (ii) To develop a new text and speech corpus for the Punjabi language because no computer database/corpus exists in the Punjabi language, where the representative speech sentences concentrate on a specific dialect (the Malwai dialect) of the Punjabi language. (iii) To conduct the linear prediction analysis and synthesis (aka Linear Predictive Coding or LPC) of the Punjabi speech sentences because no work has so far been vii Sri Satguru Jagjit Singh Ji eLibrary [email protected] reported in the literature where the Punjabi speech has been analyzed/synthesized using the linear prediction model of speech production. The first issue is the actual representation of the phonemes. In linguistics, the International Phonetic Alphabet (IPA) designed by the International Phonetic Association is used to represent the phonemes. However, the major limitation of this representation is that it needs some special symbols that are not readily available on computer keyboards. In this work, a new phonetic alphabet consistent with the ARPAbet phonetic transcription of the Punjabi language has been developed in CHAPTER II. This newly designed scheme called PUNJARPAbet has been graphically presented in CHAPTER VI. This achieves our first objective set for this study. The second issue is to develop a new corpus suitable for achieving the third objective. A new text and speech corpus with several original features has been developed in CHAPTER IV. The contents of this chapter meet the second objective set for this research. The new corpus has at least twenty original features such as complete sentences (rather than words only), some lines of the folk songs in the original form and the modified form, some single line folk songs (bolis), some newly-written bolis, some slowly fading words of the Punjabi language, and some ṭheṭh pendu shabads (rustic words). In the corpus developed in this work, the speech sentences have been recorded in the Malwai dialect of the Punjabi language, and the sentences have been written in the Gurmukhi script. The fact that the new corpus is very rich and versatile due to a wide variety of its linguistic and cultural features makes it an ideal corpus for any serious speech processing work in the Punjabi language. The third issue is to analyze and synthesize the Punjabi speech. No work has so far been reported in the literature where the Punjabi speech has been analyzed/synthesized using the Linear Predictive Coding (LPC). This work investigated the linear prediction analysis and synthesis of the Punjabi speech for the first time. The linear prediction technique is rated very high in the broad field of speech processing. This technique has been chosen because of its popularity, sound mathematical properties, simplicity, and its effectiveness as evidenced by the SPEECH UNDERSTANDING RESEARCH (SUR) project of the Advanced Research Projects Agency (ARPA) started in early 1970s. The third objective set for this project has been accomplished in CHAPTER V and graphically presented in CHAPTER VI. The present thesis consists of seven chapters, and five appendices. The work carried out in this thesis can be described as follows: Chapter I is introductory and it presents the basic concepts required in this thesis. Chapter II introduces the main concepts required to understand the broad field of Linguistics so that one can understand the corpus designed in Chapter IV. A new phonetic alphabet (called PUNJARPAbet) consistent with ARPAbet is designed in this chapter. This chapter achieves objective 1 set for this research. Chapter III prepares the viii Sri Satguru Jagjit Singh Ji eLibrary [email protected] reader for the linear prediction speech analysis and synthesis of the Punjabi language to be conducted in Chapter V. In Chapter IV, a new text and speech corpus has been designed. It covers objective 2 set for this study. Chapter V describes in detail the mathematical concepts involved in the Linear Prediction analysis and synthesis of the Punjabi speech sentences. It accomplishes objective 3 set for this project. In Chapter VI, the spectrographic analysis is conducted by using Praat and MATLAB where the graphs for formant frequencies, intensity, pitch, V/UV contours, and the Fast Fourier Transform (FFT) of the speech waveforms have been plotted to gain additional insight into the results obtained in Chapter V. The graphical analysis evaluating the newly-designed coding scheme PUNJARPAbet is also included in Chapter VI. Chapter VII summarizes the whole work, highlighting the original contribution of this project. This chapter concludes by stating the future directions. The thesis also includes five Appendices. Appendix A and Appendix B include the Text Corpus developed in this project. Appendix A includes about 200 sentences and single line folk songs (bolis) while Appendix B includes more than 100 bolis. In both appendices, the complete corpus has been transcribed both in the IPA and the newly developed phonetic alphabet PUNJARPAbet. Appendix C describes the Levinson- Durbin algorithm to efficiently solve the Normal Equations resulting from the Autocorrelation Method for the linear prediction analysis. Appendix D includes the SIFT algorithm for computing the V/UV decision and the pitch extraction. Appendix E lists twenty-five Punjabi speech sentences synthesized in the present study. The thesis concludes by listing the Bibliography of various publications and websites cited in this work. ■ ix Sri Satguru Jagjit Singh Ji eLibrary [email protected] PUBLICATIONS AND PRESENTATIONS The following papers have been published in refereed International/National Journals. Presentations have been given at six different organizations/universities in Canada and India based on the work completed during this project. Papers Published (Refereed Journals and International Conferences): 1. Surinder Dhanjal and Satvinder Singh Bhatia; “PUNJARPAbet: A New Phonetic Alphabet for Speech Processing in the Punjabi Language”; Journal of Circuits, Systems, and Computers, World Scientific Publishing Co.; Vol. 23, No. 5, June 2014 (DOI: 10.1142/S0218126614500704) (SCI) 2. Surinder Dhanjal and Satvinder Singh Bhatia; “Punjabi Speech Processing: LP Analysis and Synthesis”; 4th World Conference on Innovation and Computer Sciences (INSODE-2014), Rome, Italy; April 11-13, 2014 (accepted for publication in AWERProcedia Information Technology and Computer Science). 3. Surinder Dhanjal and Satvinder Singh Bhatia; “Development of a Standard Text and Speech Corpus for the Punjabi Language”; 16th International Oriental COCOSDA / Conference on Asian Spoken Language Research and Evaluation (CASLRE) 2013 Conference, KIIT Gurgaon, India; Nov 25-27, 2013. - (IEEE Xplore Digital Library DOI: 10.1109/ICSDA.2013.6709891) - Shortlisted by Department of Electronics and Information Technology (DeitY), MC&IT, Government of India for upload on TDIL (Technology Development for Indian Languages) Data Center (http://tdil-dc.in) portal*. 4. Surinder Dhanjal and Satvinder Singh Bhatia; “Computerization of the Punjabi Language: L.P. Analysis and Synthesis”; PARKH (biannual Journal), Panjab University, Chandigarh, India; Vol. I, Jan-June 2013, pp. 179-188. 5. Surinder Dhanjal and Satvinder Singh Bhatia, “A New Corpus for the Punjabi Speech Processing”, International Symposium on Frontiers of Research on Music and Speech (FRSM-2012); Kamrah International Institute of Technology (KIIT), Gurgaon, India; Jan 18-19, 2012; pp. 223-227. 6. Surinder Dhanjal and Satvinder Singh Bhatia, “Computerization of the Punjabi Language”, 2nd World Punjabi Conference; Panjab University, Chandigarh; Feb 24-25, 2009. x Sri Satguru Jagjit Singh Ji eLibrary [email protected]