STATISTICAL METHODS by STEFAN SZULC Formerly Professor of Statistics University of Warsaw Translated from the Polish by J. STADLER Translation edited by H. INFELD, C.D.I. FORRESTER PERGAMON PRESS OXFORD · LONDON EDINBURGH · NEW YORK PARIS . FRANKFÜRT PANSTWOWE WYDAWNICTWO EKONOMICZNE WARSAW Pergamon Press Ltd., Headington Hill Hall, Oxford 4 & 5 Fitzroy Square, London W.l Pergamon Press (Scotland) Ltd., 2 & 3 Teviot Place, Edinburgh 1 Pergamon Press Inc., 122 East 55th St., New York 22, N.Y. Pergamon Press GmbH, Kaiserstrasse 75, Frankfurt-am-Main Copyright (g) 1965 PANSTWOWE WYDAWNICTWO EKONOMICZNE Library of Congress Catalog Card No. 64-18438 Printed in Poland INTRODUCTION 1. THE CHIEF TREND OF DEVELOPMENT IN STATISTICS The presentation of selected problems from the history of statistics is not the main purpose of this book. Rather it is concerned with certain facts, the knowledge of which helps in understanding modern statistics. The presentation of these facts will be neither exhaustive nor systematic: many important facts and the names of some eminent authors will be omitted. On the other hand, certain facts may be mentioned which would rank low in importance in a systematic exposition. Em phasis will be laid on the presentation of facts in such a way as to stress their relevance and meaning. a. The collection of numerical data for purpose of government administration. Even in remote antiquity we can find traces of research that—in accordance with present terminology—could be called statistical research. In Egypt, Babylon, Persia, ancient China and Israel a census of population is known to have been taken. In Egypt—with her centralized and bureaucratic state organization—the popula tion census was one of the mainstays of public administration. Censuses of popu lation and capital assets, primarily for fiscal purposes, were made, beginning from the time of the 2nd dynasty, that is, at least 3000 years before our era, regularly every two years, and later annually.1 In the Bible we find information about several population censuses. Their de scription is so detailed that we can effectively reconstruct their technique. In the Fourth Book of Moses, "Numbers", there is a description of two censuses of the male population in Israel "from the age of 20 and over, of those who can go to war". These censuses were made in the desert during the migration. In order to make them, special officials were appointed, one for each generation, and these reported the number of persons according to tribes. The censuses did not apply to the Lévites, who were counted separately—all males "one month old and over". Besides this, records were made of all the Lévites aged 30-50 working on the Ark of the Covenant. On another occasion all firstborn males one month old and over were counted.2 Numerical results of these censuses were given, but they are not at all reliable. In the Second Book of Samuel there is a description of a population census which was made at the time of King David. The census applied to "strong, sword- 1 Plinio Fraccaro, Enciclopedia ltaliana, vol. 9, Milano— Rome 1931, p. 734. 2 "Numbers", chapters 1, 3, 4, 26. 1 2 INTRODUCTION wielding men". On the order of the King it was made by the Military Commander, Joab, who visited all the territories of Israel and Juda. The census lasted for 9 months and 20 days. The Lévites and the Benjaminites were excluded from the census.3 Apart from these censuses we find numerical records which we would call statistical today. For instance, in the First Book of Esdras, Chapter 2, there are statistics on the repatriation from Babylonian captivity at the time of Cyrus and Artaxerxes. The number of the repatriates in given by tribes, as well as the number of male and female servants, male and female singers and livestock which the repatriates brought with them. These were not censuses in the modern meaning of this term, but even then the basic features of a census did appear. The objective of this research—and this should be particularly stressed—was the accumulation of factual data necessary for military and fiscal administration purposes, as well as for rdigious administra tion —as was the case, for instance, with the Israelites. We therefore come across such research primarily where there is a state organization conscious of its objectives. The most far-reaching in this respect was the ancient Roman census whose beginnings date back to the epoch of the Kings, and which later extended over the whole period of the Republic and the beginning of the Empire, about five hundred years altogether. The name of census was attached to fairly regular lists—usually prepared at 5-year intervals—with the names of all Roman citizens and the mem bers of their families, both adults and children, and with a detailed description of fixed property and livestock, farm animals and slaves. The objective of the census was not "statistics", i. e. the numerical presentation of socio-economic re lations, but the preparation of a list of citizens, persons who had definite rights and obligations (military service, taxes, etc.). The census contained an abundance of material, so diversified, and reaching so deep into the essence of economic relations, that the extent of the information it provided compares favourably with the most exhaustive and comprehensive statistical studies of modern times. Similar censuses were taken in newly conquered provinces, immediately after their annexation. There is also fairly abundant material of a statistical nature from the Middle Ages. In contrast to the Roman census it pertains more to private economy—in line with the feudal State system. I refer, first of all, to the estate inventories of royal families, of princes, feudal overlords and monasteries: quite a few of these records have been preserved intact.4 The detailed description of England in the Domesday Book, ordered by William the Conqueror in 1085 and completed in 1086, is an example of a nation-wide 3 The Second Book of Samuel, ch. 24. 4 Most detailed instructions on keeping such inventories are contained in Capituiare de Villis of Carl the Great. INTRODUCTION 3 survey. The description comprised the property of the king, the clergy, the church institution, the barons, and so on. It provided an estimate of the value of real estate, the number of plough teams (a team comprised eight oxen), the area of arable land, meadows, woods, pastures, fish ponds and rivers (harnessed by mill dams), water mills and other sources of income, and finally the number of peasants of different categories. For certain areas of England livestock were also included.5 This description does not contain the numerical tables characteristic of present- day statistical studies: according to our terminology it is raw statistical material which can be tabulated just as easily as the census material taken in our times. Very early censuses were also taken in Russia, first by the Tartars—several times, starting from the middle of the thirteenth century—and then by the Grand Dukes of Moscow, Dymitri Donskii (1359-89) and Vasyl Dymitrovitch (1389-1425).e Together with the centralized state—whose activities began to affect man's life to an increasing extent toward the end of the fourteenth century—there appeared a greater need for statistical data covering a greater variety of problems. The beginnings can already be found in the early Italian states, particularly in Venice and Florence. "The governments of Venice and Florence collected detailed data on the balance of trade, the level of production in each industry, the popula tion and its wealth, tax and customs revenues, the tonnage of ships, education, religious relations and social institutions. They did not confine themselves to collecting detailed information on the internal situation of their countries, but also gathered exhaustive statistical data on the situation in the neighbouring states with which they maintained relations or were at war."? Information and statistical material was collected in Spain during the reign of Philip II, in France at the time of Henry IV (under the direction of Sully) and Louis XIV (under Colbert), in Russia during the reign of Peter I, and in Prussia during the reigns of William I and Frederic II. Not all information was numerical: there were verbal descriptions, and so it was not always what are called statistics today. The use of numerical data, however, was on the increase and they were becoming more diversified. At the end of the eighteenth century they pertained to population, professions and trades, agriculture, handicrafts, commerce, taxes, church relations, etc. The French Revolution and the growth of State organization in the post- revolutionary period intensified the need for numerical data in a growing number of fields. At the beginning of the nineteenth century the first statistical bureau were set up with the special and exclusive task of collecting numerical data on the state and society. Such bureau existed in France, Italy, Spain, in some German 5 Encyclopaedia Britannica, vol. VII, 1947, pp. 514-5. 6 Bolshaya Sovyetskaya Entseclopedya, vol. 45, 1940. 7 S. Inglot, Historia spoleczna i gospodarcza sredniowiecza. (Social and Economic History of the Middle Ages), 2nd ed. Wroclaw, 1949, p. 255. 4 INTRODUCTION States (particularly in Bavaria and Prussia), and also in Poland (the Duchy ot Warsaw in 1810). Not all of these bureaux survived long in their original form, but a start had been made. During the nineteenth century the activities of govern ment statistical offices increased, and so did the number of statistical publications. At first the information gathered was regarded as a government secret: the govern ment did not feel obliged to publish it. If any statistical information leaked out and became public knowledge, it was through the studies of individual scholars who had permission to use official statistics. In the eighteenth century, and at the beginning of the nineteenth century, there were many such individual studies, some of them very valuable. With the passage of time the views on this matter changed: it was recognized that statistical offices should not only collect and process statistical data, but should also make them available to the public, so as to make possible the formation of independent, individual opinions and scientific research. The publi cations of statistical offices increased in number and gradually formed whole libraries. For instance, the publications of the Central Statistical Office of the Republic of Poland between 1919 and 1939 exceeded 300 volumes, some of them very large, in spite of rather modest financial means. In this period of the development of statistics, data were collected for administra tive purposes. The scope and nature of information changed in the course of history, depending upon the changes in the scope and nature of government activities, but as a rule the objective was always to satisfy the needs of the administration, par ticularly those of a military and economic nature. b. The "science" of government. The origin of the term "statistics". The ex pression "statistics" (German Statistik) was first used in print in the middle of the eighteenth century by Gottfried Achenwall, Professor at the University of Marburg and later in Gottingen. However, judging from the way the word "statistics" was referred to, it would appear that it had been in use before. Achenwall speaks, though, of the "so-called statistics", and in his handwritten notes he mentions that this name "bloomed" ("florebat") in the seventeenth century. Also the adjective "statisticus" was already in use in the seventeenth century. According to Achenwall's definition, statistics is the science of the structure of the state in the broad sense of this word and the structure of the state is defined as "the totality of essential peculiarities of the state" ("Der Inbegriff der wirklichen Staatsmerkwürdigkeiten eines Reiches oder einer Republik"). To this belongs everything that is related to the welfare of the country in the positive or negative sense, i. e. physiographical conditions, population, the system of government, the social system, legislature, economy, etc. As we shall see later, the word statistics was used here to denote a subject which is very remote from what we call statistics today. The subject denoted by this term—the science of "the peculiarities of the State", or the science of the State, or more exactly, the Science of States. ("Staats- INTRODUCTION 5 künde1)—has evolved from the same needs of modern government administration that led to the collection of statistical data. A strong and active State, conscious of its objectives, must know its resources as well as the means and resources of other states. Achenwall was not by any means the first, to make an attempt at a comparative systematization of this kind of information for a number of countries. The beginnings should perhaps be sought in classical antiquity, in the lost work by Aristotle Politeicd, which contained a description of the Greek states, in the author's time. As direct antecedents of Achenwall, numerous Italian, French and Dutch authors from the end of the sixteenth century and from the seventeenth century should be mentioned. The more important of them are the Italians Fran cesco Sansovino (1562) and Giovanni Botero (1589), the Frenchman Pierre d'Avity (1614), the Dutchman Jan de Laet and his coworkers (1624-^K)),* then the German professors whose lectures, partly published, pertained to the same subject; Herman Conring, professor at Helmstadt is considered the most outstanding of them. He was the first—in the seventeenth century—to introduce the science of the State to the university curriculum. The works of these professors were primarily of a descriptive nature. Numerical material was not yet available. Later, the authors used figures to a limited extent. In principle, however, the idea of presenting the phenomena studied in figures had not yet been conceived of at that time. On the other hand, it should be realized that the "Science of the State" was in fact a collection of useful information on different subjects, rather than a science with a defined subject. No wonder, therefore, that with the development of knowledge this science was divided into a number of separate subjects, such as economics, state and administrative law, geography, etc. After Achenwall, many authors published their works in this field; some of them tried to introduce methodically new points of view (as did Anton Friedrich Büsching and August Ludwig von Schlözer). From the beginning of the nineteenth century one can only talk of epigones. However, even in the middle of the century there still appear quite numerous monographs of a descriptively numerical char acter, in Polish as well, e.g. "The geographical-statistical outline", or "The historical-statistical outline" of a given country, or of a part of it. Mention should be made here of a small study by Staszic On Poland's Statistics—a short collec tion of information needed by those who want to liberate this country and those who want to rule it, (Warsaw, 1807). In the title itsedf there is a reference to the "Science of the State" as a collection of information needed by stat esmen. "Statistics" in this sense and under this name survived longest in Austrian school text-books comprising a collection of information on the system of govern- 8 The figures in brackets denote the dates of publication of principal works on the Science of the State by the authors mentioned above. 6 INTRODUCTION ment, geography, law, economic, etc. In Austria this subject "statistics" was in cluded in civil service examinations. (Its contents largely correspond to those of what is called in Poland "The science of Poland and the contemporary world"). On the whole the descriptive "Science of the State" does not actually bring any thing essential to the history of statistics, except the very name "statistics". But just here something strange happened: the branch of science for which this name had been created disappeared, while the name "statistics" itself remained, to denote something quite different. Even the old authors, particularly Coming, emphasized that the information about the state of the country should be concrete. There are certain problems, however, with regard to which concrete information can be provided only in figures. The amount of numerical material increased greatly during the eighteenth century: authors writing on the Science of the State provided more numerical tables to illustrate their points. There even developed a special trend among the "Scientists of the State" to limit the description of a state to the table form. These tables, incidentally, contained not only numbers, but also—placed in appropriate panels—descriptive information, for example, of the political system. The originator of this method was the Russian J. K. Kurillov and, perhaps independently, the Dane J. P. Anchersen, the author of Descriptio Statuum Cultiorum in Tabulis (Copenhagen and Leipzig, 1741). Towards the end of the eighteenth century this method of presenting facts found many imitators, to the great dismay of the believers in descriptive statistic of the Achenwall variety, who considered that this soulless method of presentation deprived statis tics of important qualities. At the same time, it became fashionable to collect, for one's own pleasure, numerical data on different subjects and from different sources. This led to the practice of using the term statistics for the mere collections of figures. Finally, the name statistics began to be applied to collections of figures gathered by the government for administrative purposes. In this way the meaning of the term broadened and almost imperceptibly began to denote something that was only loosely related to the old meaning of statistics. Nor did the evolution stop here. c. Graunt and the political arithmeticians, Statistical method. John Graunt's work was published in London in 1662. Its abbreviated title was Natural and Political Observations Made upon the Bills of Mortality... of the City of London. This date can be truly regarded as the date of the birth of statistics in the modern sense: as a specific method of numerical analysis of a certain type of phenomena. Graunt (1620-74) was a merchant; his hobby was collecting; his knowledge was acquired by self-education. Graunt's Observations were presented to the Society of Philosophers (later the Royal Society), the most important learned Society in London; this resulted in Graunt's election to membership of the Society. His activity in the Society, however, was negligible and short-lived. Apart from INTRODUCTION 7 the Observations and a small essay on the propagation and growth of carp and salmon presented to the Society of Philisophers, no other works by Graunt are known. The Observations are based on records of births and deaths. Records of baptisms and funerals and sometimes also of marriages were kept in various European cities, including London, as early as the sixteenth century. In London they had been published since the beginning of the seventeenth century as a weekly called Bills of Mortality. At first only the number of deaths was given (deaths due to the plague were shown separately), but later data were introduced on baptisms, causes of death, sex of the deceased and baptised. Graunt had at his disposal a sizeable collection comprising several dozens of annals and he proved that out of these "meager" Bills, as he called them, varied and very important sdentifics conclusions could be drawn. His conclusions were of different kinds. He determined the population of London, the number of men and women, married persons, fertile women, men capable to bear arms, etc. He was surprised to find out that there was an almost perfect balance between the sexes of new-born babies, with a small surplus of boys over girls (in London 14: 13 more or less) maintained on the same level in spite of the apparently random nature of this phenomenon. The author arrived at the frequencies of different causes of deaths, and particuiariy the low probability of deaths from certain diseases particularly feared by the people. Graunt discovered that in London the number of deaths exceeded the number of births, and in spite of that the population of the city increased because of the influx from outside. He described how the plague decimated the population of the city (e.g. in 1625, one of the worst periods, only 6983 baptisms were recorded, as compared with 54,265 deaths, of which 35,417 were caused by the plague). The population of London was at that time usually estimated as at least one million. Graunt arrived at a different figure by using the following method: (1) Assuming that the number of child-bearing women was twice as high as the number of births (i.e. that every married woman bore a child every two years), and knowing that the annual number of births was 12,000 he arrived at the figure of 24,000 child-bearing women and 24,000 X 2 = 48,000 families. (The other 24,000 were families in which there were no married women of child-bearing age). Considering a family to consist of 8 persons (the parents, 3 children and 3 servants) he arrived at the figure of 384,000 persons. (2) According to the data from several parishes there were 3 deaths per 11 fam ilies per annum; at the average rate of 13,000 deaths (in the years in which there was no plague) this also gave approximately 48,000 families. (3) On the basis of the map of the centre of London, Graunt estimates that within the city walls there were 11,880 houses (and as many families). Since the 8 INTRODUCTION number of deaths in all London was four times as high as the number of deaths within the city walls, he again arrived at the figure of 11,880 X 4 = 47,520 houses (families) which is almost identical with the figures previously arrived at. The results achieved in this way may not have been, and probably were not very accurate, but the figure 384,000 showed the order of magnitude of the population of the city, whereas the statements of his contemporaries about one or several million were sheer fantasy. To realise the pioneering importance of Graunfs reasoning it is necessary to recapture the atmosphere of a time when statistical data were met for the first time. The author found "much pleasure in drawing so many profound and unex pected conclusions" from these "meager and despised Bills". He rightly considered it as his merit that he "reduced a great number of complex volumes to a few, clear tables, and formulated in a few, concise paragraphs, without elaborate arguments, the conclusions that naturally follow from them." Graunt showed that a penetrating analysis of figures permits one to discover many essential and impor tant things which are hidden from the eyes of people who do not know how to approach a problem in the right way. His contribution in this respect cannot be denied in spite of the fact that many of his conclusions turned out to be wrong in the light of later studies based on more detailed material. Graunt was also the first to understand that there is a regularity, an "order", in the nature of the laws governing phenomena, that can be discovered if large numbers of them are studied. Thus Graunt laid the foundations for what we call to-day statistical methods. We have mentioned here the works of Graunt but not of William Petty (1623- 87)—the most prominent of the political arithmeticians—who, like Graunt, pre sented social and economic relations on the basis of numerical estrinates, but whose main contribution lies in a different field. Petty was primarily an economist: the first of the scientists to create classical political economy, according to Marx, Petty often used figures to prove his statements, but as far as the depth of statistical analysis is concerned, as well as a critical approach to numerical data, he comes second to Graunt.9 d. The theory of probability and statistics. The theory of probability is a branch of higher mathematics. It was first applied—as early as the seventeenth century—to 9 Extensive information on the history of statistics in earlier times can be found in V. John's Geschichte der Statistik, pt. 1, Von dem Ursprung der Statistik bis auf Quetelct (1835), Stuttgart 1884, p. 376. (Pt. 2 has never been published), and M. Ptucha Otcherky po istoree statistikee 17-18 viekov, Ogiz, 1945, p. 352; the latter also provides information on the development of statistics in Russia up to the end of the eighteenth century. A short chap ter on the development of statistics in pre-revolutionary Russia can be found in the textbook by S. S. Ostroumov, entitled: Soudyebnaya Statistika, Tchast Obshtchaya, Moscow, 1949, p. 239. The author points out the independent achievements of Russian statistical thought, and the great practical importance of Russian statistics.