ebook img

Quantitative linguistics PDF

252 Pages·1992·23.938 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Quantitative linguistics

QUANTITATIVE LINGUISTICS LINGUISTIC & LITERARY STUDIES IN EASTERN EUROPE (LLSEE) The emphasis of this scholarly series is on recent developments in Linguistic and Literary Research in Eastern Europe it includes analysis, translations and syntheses of current research as well as studies in the history of linguistic and literary scholarship Founding Editor: John Odmark † General Editor: Philip A. Luelsdorff Volume 37 Marie Těšitelová Quantitative Linguistics QUANTITATIVE LINGUISTICS by Marie Těšitelová JOHN BENJAMINS PUBLISHING COMPANY AMSTERDAM / PHILADELPHIA 1992 Scientific Editor Prof. Dr. Jan Petr, DrSc. Reviewer Ing. Josef Machek, CSc. Translated from Czech by Ivana Hajičová PhD Co-edition with ACADEMIA, Publishing House of the Czechoslovak Academy of Sciences, Prague, 1992 Sole rights world-wide, with the exception of Albania, Bulgaria, China, Cuba, Czechoslova­ kia, Hungary, Mongolia, North Korea, Poland, Rumania, U.S.S.R., Vietnam and Yugosla­ via: John Benjamins B. V., Amsteldyk 44, P. O. Box 75577, 1070 AN, Amsterdam, Nether­ lands Library of Congress Cataloging-in-Publication Data Těšitelová, Marie. Quantitative linguistics/by Marie Těšitelová p. cm. - (Lingustics & literary studies in Eastern Europe (LLSEE), ISSN 0165-7712; v. 37) Includes bibliographical references. 1. Linguistic - Statistical methods. 2. Mathematical linguistics. I. Title. II. Series. P138.5.T48 1992 410M'51 -dc 20 91-27930 ISBN 90 272 1546 4 (Eur.)/1-55619-262-2 (US) (alk. paper) CIP © Copyright 1992 - Marie Těšitelová No part of this book may be reproducesd in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. Printed in Czechoslovakia CONTENTS I. QUANTITATIVE LINGUISTICS 11 1. The scope of quantitative linguistics 11 2. The object of quantitative linguistics 13 3. A note about the foundations of quantitative linguistics 15 II. METHODS OF RESEARCH 16 1. Unit of population 17 1.1. Unit of population in lexical statistics 17 1.2. Unit of population in grammatical statistics 19 1.2.1. Unit of population in morphological statistics 19 1.2.2. Unit of population in syntactic statistics 20 1.3. Unit of population in semantic statistics 22 1.4. Unit of population in other domains of quantitative linguistics 22 1.5. Conclusions 23 2. Problems of sampling the material 24 2.1. Sampling of material from the qualitative viewpoint 24 2.1.1. Linguistic criteria 24 2.1.2. Psychological criteria 27 2.1.3. Sociological criteria 29 2.1.4. Other criteria 30 2.1.5. Conclusions 31 2.2. Sampling of material from the quantitative viewpoint 31 2.2.1. Types of sampling 32 2.2.1.1. Systematic sam pling 32 2.2.1.2. Random sampling 33 2.2.1.2.1. Random sampling of pages 34 2.2.1.2.1.1. Random sampling of pages in lexicon 35 2.2.1.2.1.2. Random sampling of pages in grammar 36 2.2.1.2.2. Random sampling of words 36 2.2.1.2.3. Conclusions 38 2.2.1.3. Cluster sampling 39 2.2.1.3.1. Cluster sampling methods 39 2.2.1.3.2. Cluster sampling in lexical statistics 40 2.2.1.3.3. Cluster sampling in grammatical statistics 42 6 2.2.1.3.3.1. Cluster sampling in morphological statistics 43 2.2.1.3.3.2. Cluster sampling in syntactic statistics 45 2.2.1.3.4. Cluster sampling in semantic statistics 46 2.2.1.3.5. Conclusions 46 3. Some statistical and other characteristics common in quantitative linguistics . 47 3.1. Frequency, rank, order 47 3.1.1. The Zipf Laws 50 3.1.1.1. The First Zipf Law 50 3.1.1.2. The Second Zipf Law 53 3.1.1.3. The Third Zipf Law 55 3.1.1.4. Conclusions 56 3.2. Mean 56 3.3. Variance and standard deviation 57 3.4. Frequency distribution 59 3.5. Coefficients 61 3.5.1. Coefficient of dispersion 61 3.6. Correlation, correlation coefficient 63 3.7. Concepts of information theory, entropy, redundancy 65 3.8. Conclusions 66 III. THE MAIN AREAS OF QUANTITATIVE LINGUISTICS 67 1. Lexical statistics 67 1.1. The object of lexical statistics 67 1.2. Problems of methods of investigation 68 1.2.1. Unit of population and size of corpus 69 1.2.2. Word-frequency distribution 70 1.2.2.1. The zone of words of the higher and highest frequency 71 1.2.2.2. The zone of words of medium frequency 72 1.2.2.3. The zone of words of the lower and lowest frequency 73 1.2.2.4. Conclusions 75 1.2.3. The so-called richness of vocabulary 75 1.2.3.1. The formula of P. Guiraud 76 1.2.3.2. The formula of J . Mistrik 79 1.2.3.3. The formula of M. Těšitelová 80 1.2.3.4. Conclusions 81 1.3. Selected publications on lexical statistics 82 1.3.1. General characteristics 82 1.3.2. Publications on lexical statistics concerning Slavonic languages 84 1.3.2.1. Czech 84 1.3.2.2. Slovak 86 1.3.2.3. Russian and Ukrainian 87 1.3.2.4. Polish 88 1.3.2.5. Other Slavonic languages 89 1.3.3. Publications on lexical statistics concerning Germanic languages 90 7 1.3.3.1. German 90 1.3.3.2. English 92 1.3.3.3. Other Germanic languages 94 1.3.4. Publications on lexical statistics concerning Romanic languages 95 1.3.4.1. French 95 1.3.4.2. Spanish 96 1.3.4.3. Roumanian 97 1.3.4.4. Italian 97 1.3.5. Publications on lexical statistics concerning other languages 98 1.3.5.1. Latvian 98 1.3.5.2. Estonian 98 1.3.5.3. Hungarian 99 1.3.5.4. Finnish 99 1.3.5.5. Chinese 100 1.4. Conclusions 100 2. Grammatical statistics 100 2.1. The object of grammatical statistics 100 2.2. Methods of research 101 2.3. The components of grammatical statistics 102 2.3.1. Morphological statistics 102 2.3.1.1. The object of morphological statistics 102 2.3.1.2. Methods of research 102 2.3.1.2.1. Unit of population in morphological statistics 102 2.3.1.2.2. Sampling of the material in morphological statistics 104 2.3.1.2.3. Conclusions 109 2.3.1.3. Selected publications on morphological statistics 109 2.3.1.3.1. Czech 110 2.3.1.3.2. Slovak 115 2.3.1.3.3. Russian and other Slavonic languages 115 2.3.1.3.4. Other languages 117 2.3.1.4. Conclusions 118 2.3.2. Syntactic statistics 119 2.3.2.1. The object of syntactic statistics . . . 119 2.3.2.2. Methods of rese arch 119 2.3.2.2.1. The unit of population in syntactic statistics 119 2.3.2.2.2. Sampling of the material in syntactic statistics 120 2.3.2.3. Selected publications on syntactic statistics 126 2.3.2.3.1. Czech 127 2.3.2.3.2. Slovak 130 2.3.2.3.3. Russian and other Slavonic languages 130 2.3.2.3.4. Other languages 131 2.3.2.4. Conclusions 134 3. Semantic statistics 134 3.1. The object of semantic statistics 135 8 3.2. Methods of research 135 3.2.1. Unit of population in semantic statistics 136 3.2.2. Selection of methods and material in semantic statistics 138 3.3. Selected publications on semantic statistics 140 3.4. Conclusions 144 IV. OTHER DOMAINS OF QUANTITATIVE LINGUISTICS 145 1. Phonological statistics 145 1.1. The object of phonological statistics 145 1.2. Methods of research 146 1.2.1. Unit of population in phonological statistics 146 1.2.2. Sampling of the material in phonological statistics 147 1.3. Selected publications on phonological statistics 148 1.3.1. Czech 149 1.3.2. Slovak 150 1.3.3. Russian and other Slavonic languages 151 1.3.4. Other languages 153 1.3.5. Conclusions 155 2. Graphemic statistics 155 2.1. The object of graphemic statistics 155 2.2. Methods of research 156 2.2.1. Unit of population in graphemic statistics 156 2.2.2. Sampling of the material in graphemic statistics 156 2.3. Selected publications on graphemic statistics 157 2.3.1. Czech 157 2.3.2. Other languages 158 2.4. Conclusions 159 3. Stylistic statistics 160 3.1. The object of stylistic statistics 160 3.2. Methods of research 161 3.2.1. Unit of population in stylistic statistics 161 3.2.2. Sampling of the material in stylistic statistics 162 3.3. Selected publications on stylistic statistics 165 3.3.1.1. Czech 166 3.3.1.2. Slovak 167 3.3.1.3. Russian and other Slavonic languages . . 168 3.3.1.4. Other languages 170 3.4. Conclusions 177 4. Typological statistics 177 4.1. The object of typological statistics 177 4.2. Methods of research 177 4.2.1. Unit of population in typological statistics 177 4.2.2. Sampling of the material in typological statistics 178 9 4.3. Selected publications in typological statistics 178 4.4. Conclusions 181 5. Statistics concerning the development of language(s) 181 5.1. The object of quantification 181 5.2. Research methods 181 5.3. Selected publications on statistics concerning the development of language(s) . . 183 5.4. Conclusions 188 6. Word-formation statistics 188 V. THE APPLICATION OF THE RESULTS OF QUANTITATIVE LINGUISTICS 190 1. Linguistic applications 190 2. Applications in education 192 2.1. Teaching the mother tongue 192 2.2. Foreign language teaching 195 3. Interdisciplinary applications 197 4. Technical applications 198 5. Conclusions 199 VI. QUANTITATIVE LINGUISTICS AND COMPUTERS 200 VII. PERSPECTIVES OF QUANTITATIVE LINGUISTICS 203 Notes 208 References 209 List of abbreviations of the analyzed texts and other language materials 239 List of other abbreviations 241 Name index 243 Subject index 248

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.