Editorial Policy for the publication of proceedings of conferences and other multi-author volumes Lecture Notes aim to report new developments -quickly, informally, and at a high level. The following describes criteria and procedures for multi-author volumes. For convenience we refer throughout to "proceedings" irrespective of whether the papers were presented at a meeting. The editors of a volume are strongly advised to inform contributors about these points at an early stage. § 1. One (or more) expert participant( s) should act as the scientific editor( s) of the volume. They select the papers which are suitable (cf.§§2-S) for inclusion in the proceedings, and have them individually refered (as for a journal). It should not be assumed that the published proceedings must reflect conference events in their entirety. The series editors will normally not interfere with the editing of a particular proceedings volume -except in fairly obvious cases, or on technical matters, such as described in §§2-S. The names of the scientific editors appear on the cover and title-page of the volume. §2. The proceedings should be reasonably homogeneous i.e. concerned with a limited and well defined area. Papers that are essentially unrelated to this central topic should be excluded. One or two longer survey articles on recent developments in the field are often very useful additions. A detailed introduction on the subject of the congress is desirable. §3. The final set of manuscripts should have at least 100 pages and preferably not exceed a total of4 00 pages. Keeping the size below this bound should be achieved by stricter selection ofa rticles and NOT by imposing an upper limit on the length of the individual papers. §4. The contributions should be of a high mathematical standard and of current interest. Research articles should present new material and not duplicate other papers already published or due to be published. They should contain sufficient background and motivation and they should present proofs, or at least outlines of such, in sufficient detail to enable an expert to complete them. Thus summaries and mere announcements ofp apers appearing elsewhere cannot be included, although more detailed versions of, for instance, a highly technical contribution may well be published elsewhere later. Contributions in numerical mathematics may be acceptable without formal theorems/proofs provided they present new algorithms solving problems (previously unsolved or less well solved) or develop innovative qualitative methods, not yet amenable to a more formal treatment. Surveys, if included, should cover a sufficiently broad topic, and should normally not just review the author's own recent research. In the case of surveys, exceptionally, proofs of results may not be necessary. §S. "Mathematical Reviews" and "Zentralblatt fUr Mathematik" recommend that papers in proceedings volumes carry an explicit statement that they are in final form and that no similar paper has been or is being submitted elsewhere, if these papers are to be considered for a review. Normally, papers that satisfy the criteria of the Lecture Notes in Statistics series also satisfy this requir~ment, but we strongly recommend that each such paper carries the statement explicitly. §6. Proceedings should appear soon after the related meeting. The publisher should therefore receive the complete manuscript (preferably in duplicate) including the Introduction and Table of Contents within nine months of the date of the meeting at the latest. §7. Proposals for proceedings volumes should be sent to one of the editors of the series or to Springer-Verlag New York. They should give sufficient information on the conference, and on the proposed proceedings. In particular, they should include a list of the expected contributions with their prospective length. Abstracts or early versions (drafts) of the contributions are helpful. Lecture Notes in Statistics 94 Edited by S. Fienberg, J. Gani, K. Krickeberg, 1. Oikin, and N. Wermuth Jane F. Gentleman and G. A. Whitmore (Editors) Case Studies in Data Analysis Springer-Verlag New York Berlin Heidelberg London Paris Tokyo Hong Kong Barcelona Budapest Jane F. Gentleman G.A. Whitmore Health Statistics Division Faculty of Management Statistics Canada McGill University Ottawa Montreal Ontario KIA OT6 Quebec H3A 105 Canada Canada library of Congress Cataloging-in-Publication Data Available Printed on acid-free paper. © 1994 Springer-Verlag New York, Inc. All rights reserved. TIris work may not be translated or copied in whole or in part without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone. Camera ready copy provided by the editor. 9 8 7 6 5 432 1 ISBN-13:978-0-387-94410-4 e-ISBN-13:978-1-4612-2688-8 DOl: 10.1007/978-1-4612-2688-8 Preface This volume is a collection of eight Case Studies in Data Analysis that appeared in various issues of the Canadian Journal of Statistics (OS) over a twelve year period from 1982 to 1993. One follow-up article to Case Study No.4 is also included in the volume. The OS's Section on Case Studies in Data Analysis was initiated by a former editor who wanted to increase the analytical content of the journal. We were asked to become Section Co-Editors and to develop a format for the case studies. Each case study presents analyses of a real data set by two or more analysts or teams of analysts working independently in a simulated consulting context. The section aimed at demonstrating the process of statistical analysis and the possible diversity of approaches and conclusions. For each case study, the Co-Editors found a set of real Canadian data, posed what they thought was an interesting statistical problem, and recruited analysts working in Canada who were willing to tackle it. The published case studies describe the data and the problem, and present and discuss the analysts' solutions. For some case studies, the providers of the data were invited to contribute their own analysis. Although the section attempted to mirror the data-analytic process of real life problem solving, the reports are necessarily somewhat artificial because the Co Editors and the analysts had to work under abnormal constraints. The Co-Editors had to limit the scope and complexity of the data and the problem so that the case study remained manageable. On their part, the analysts had to accept the data as provided without having been able to participate in defining the problem, designing the experiment, or supervising the collection of the data. Finally, limited tim:e and journal space necessarily restricted the analysts' freedom of investigation and reporting. The Co-Editors offered editorial advice to the analysts and sometimes obtained outside technical reviews. But, given the desire to reflect a real-life consulting situation in which refereeing and error correction are not generally performed, the Co-Editors attempted to limit to some extent their criticisms and requests for revision. The Co-Editors strove to obtain data of varying types and from assorted subject matter areas. The data sets analyzed include data from a large household survey (Case Study No.7 on child care needs), data gathered in an industrial setting (Case Study No. 1 on equipment failure times), administrative data from the field of criminal statistics (Case Study No.4 on homicide time trends), medical data (Case Study No.2 on damage to firefighters'lungs), environmental data (Case Study No. 3 on iceberg paths and Case Study No.5 on extreme wind speeds), laboratory data emanating from a designed experiment (Case Study No. 8 on mutagenicity of environmental chemicals), and market research data (Case Study No.6 on beer preferences). The only trimming of the data sets involved sometimes reducing the number of variables in order to confine the problem to a manageable size. To preserve the realism of the analytical situation, no observations were ever removed. Each published case study displays a representative fragment of the data, and in one case, all of the data are presented. Copies of the case study data were made available to readers, who were invited to submit their own follow-up analyses to the section. Readers of this volume may obtain copies of the case study data at a nominal cost. Data for all case studies except Case Study No.7 are available upon request from the Statistical Society of Canada, Dunton Tower, 6th Floor, Carleton University, Ottawa, Ontario, Canada KIS 5B6. For Case Study No.7, Statistics Canada has kindly agreed to make the Family History Survey data tape and documentation available; contact the Housing Family and Social Statistics Division, Statistics Canada, Ottawa, Ontario, Canada KIA OT6. Case Study No.8 marks the end of the series. We were honored and pleased that the Statistical Society of Canada chose to recognize the Section in general and Case Study No.8 in particular by presenting us with the 1993 Canadian Journal of Statistics Award, which annually recognizes a CJS article for excellence, innovation and presentation. The production of this series has been challenging, educational, and enjoyable to us. We thank the case study analysts and data providers for their participation in this enterprise. We also thank the editors, managing editors and others who have assisted us in publishing these case studies. We especially want to thank Karen Robertson of McGill University who has so competently handled case study manuscripts and extensive correspondence during the term of the series and who has painstakingly assembled the manuscript of this book in camera-ready form. We hope that readers of this collection will find the case studies interesting and instructive and that the analyses and data will prove to be useful for teaching purposes. Jane F. Gentleman, Statistics Canada G. A Whitmore, McGill University CONTENTS Preface v Reference Information for Case Studies viii Measuring the Impact of an Intervention on Equipment Lives 1 Measurements of Possible Lung Damage to Firefighters at the Mississauga Train Derailment 25 Iceberg Paths and Collision Risks for Fixed Marine Structures 45 Temporal Patterns in Twenty Years of Canadian Homicides 71 Extreme-value Analysis of Canadian Wind Speeds 119 Beer Chemistry and Canadians' Beer Preferences 145 Estimation of the Need for Child Care in Canada 177 Estimation of the Mutagen Potency of Environmental Chemicals Using Short-term Bioassay 219 Reference Information for Case Studies The following list presents the original journal reference information for the case studies in this collection. Case Study No.1. G. A Whitmore and J. F. Gentleman: "Measuring the impact of an inteJVention on equipment lives", in Case Studies in Data Analysis (with contributed analyses by J.D. Kalbfleisch and C.A Struthers and by D.C. Thomas), CabadianJournal of Statistics, 10, 1982, pp. 237-259. Case Study No.2. J. F. Gentleman and G. A Whitmore: "Measurement of possible lung damage to firefighters at the Mississauga train derailment", in Case Studies in Data Analysis (with contributed analyses by R. Kusiak and J. Roos and by K. J. Worsley), Canadian Journal of Statistics, 12, 1984, pp.7-25. Case Study No.3. G. A Whitmore and J. F. Gentleman: "Iceberg paths and collision risks for fixed marine structures", in Case Studies in Data Analysis (with contributed analyses by M. Moore and F. Zwiers), Canadian Journal of Statistics, 13, 1985, pp. 83-108. Case Study No.4. J. F. Gentleman and G. A Whitmore: "Temporal patterns in twenty years of Canadian homicides", in Case Studies in Data Analysis (with contributed analyses by C. McKie, by A I. McLeod, I. B. MacNeill and J. D. Bhattacharyya, and by A Nakamura and M. Nakamura), Canadian Journal of Statistics, 13, 1985, pp. 261-291. Follow-up article to Case Study No.4. E. B. Dagum, G. Huot and M. Morry: "A new look at an old problem: Finding temporal patterns in homicide series. A Canadian problem ", in Case Studies in Data Analysis (with contributed discussion by A I. McLeod and I. B. MacNeill and by A Nakamura and M. Nakamura), Canadian Journal of Statistics, 16,1988, pp. 117-134. Case Study No.5. G. A Whitmore and J. F. Gentleman: "Extreme-value analysis of Canadian wind speeds," in Case Studies in Data Analysis (with contributed analyses by F. W. Zwiers and W. H. Ross), Canadian Journal of Statistics, 15, 1987, pp. 311-337. Case Study No.6. G. A. Whitmore and J. F. Gentleman: "Beer chemistry and Canadians' beer preferences," in Case Studies in Data Analysis (with contributed analyses by J.-P. Carmichael,G. Daigle and L.-P. Rivest and by B. Li and A J. Petkau), Canadian Journal of Statistics, 18,1990, pp. 93-125. Case Study No.7. J. F. Gentleman and G. A Whitmore: "Estimation of the need for child care in Canada," in Case Studies in Data Analysis (with contributed analyses by E. M. Gee and J. B. McDaniel and by C. A Struthers), Canadian Journal of Statistics, 19, 1991, pp. 241-282. Case Study No.8. J. F. Gentleman and G. A Whitmore: "Mutagenic potency of environmental chemicals," in Case Studies in Data Analysis (with contributed analyses by G. A Darlington, by B. J. Eastwood and by B. G. Leroux and D. Krewski), Canadian Journal of Statistics, 21, 1993, pp. 421- 465. Measuring the Impact of an IntelVention on Equipment Lives Measuring the impact of an intervention on equipment lives Key words and phrases: Equipment-failure data, survival analysis, intervention anal ysis, Cox regression, Poisson process, legal proceeding. AMS 1980 subject classifications: Primary 62N05; secondary 62J99, 62M99. ABSTRACT A 1967 strike at a Quebec aluminum smelter resulted in the uncontrolled shutdown of aluminum-reduction cells in the smelter's potrooms. In a subsequent legal action against the union which was before the courts for more than a decade. the company claimed that the shutdown had reduced the operating lives of the hundreds of cells in service at the time. This study describes the background and outcome of the court case and presents the data used by expert witnesses to argue for and against the company's claim. Our analysts independently examine the data and arrive at their own conclusions. 1. STUDY DESCRIPTION 1. 1. The Setting. Aluminum is produced in an electrolytic cell operating at a very high temperature. The anode is a carbon electrode. The cathode is a carbon-lined crucible which contains the molten aluminum and electrolyte. Steel stubs and bars embedded in the anode and in the cathode lining act as conductors for the cell. These aluminum reduction cells-or pots, as they are sometimes called-are arranged in series in smelter potrooms. A moderate-size smelter may have several hundred cells in operation at one time. These cells generally have operating lives measured in years, but eventually fail because of distortion of the crucible or because of cracking of the pot lining, which leads to loss of contents or iron contamination of the molten aluminum. 2 Case Studies in Data Analysis Case Study No.1, Pages 1-23 In 1967, at the present Canadian Reynolds Metals Company smelter in Baie Comeau, Quebec, electric power to the potrooms was cut during a labour dispute, resulting in an uncontrolled shutdown of the cells. The consequent cooling of cell contents and the subsequent difficulties of restarting the cells after the disruption ended were believed by the company to have damaged all or some of the cells in service at the time of the shutdown. The shutdown eventually led to a legal action by the Company against the smelter workers' union (Confederation des Syndicats Nationaux) and others to recover costs associated with lost production and damage to equipment. The case of Societe Canadienne de Metaux Reynolds Limitee v. Confederation des Syndicats Nationaux et Autres was heard in Quebec Superior Court before Judge Vincent Masson. The legal proceedings were complex and lengthy-a decision was finally rendered on 6 February 1979, after being before the court for more than a decade. This statistical study is concerned with one issue which arose in the proceedings in 1977, namely, the statistical estimation of the total loss of operating life, if any, for the several hundred cells in service at the time of the uncontrolled shutdown. 1.2. The Data. The data for this study were extracted from Court Exhibit D-9 and are found in Tables A-I and A-2 of the Appendix. Table A-I contains failure data for 499 cells, of which 349 were in circuit at the time of the shutdown (subsequently referred to as the intervention). The cells vary in both design and time in circuit. Consider first the matter of design. Because of the economic importance of extending the operating life of an aluminum-reduction cell, there is continual experimentation with new designs and with variations of established designs. New designs or design variations are tested in groups of cells which are installed in the normal course of replacing failed cells. Design groups in this case vary in size from a handful to a few dozen cells. The standard cell design in use at the time of the intervention is denoted in Table A-l(a) by the generic label A. Within type-A cells, however, there are twenty minor design variants which are labelled AI, A2, ... , A20. Table A-l(a) contains data for 395 cells of the standard design, of which 297 were in service at the time of the intervention. The data in Table A-l(b) refer to experimental cells. Labels B, C, ... , K identify experimental groups of cells having distinct and largely untried design features. Operating experience with all ofthe cells in a design group is relevant for estimating the remaining life of cells in the group which were still in service at the intervention. Thus, Table A-I contains data for all of the cells of each design group represented by one or more cells in service at the time of the intervention. Specifically, for each cell, the table shows its failure age (in days) and its age at intervention (in days). Note, therefore, that any cell for which the age at intervention exceeds the failure age did not experience the intervention. For example, refer to design group Al in Table A-l(a). This first variant of the standard design consists of20 cells, of which the first 17 failed before the intervention and the last 3 were still in service at the time of the intervention. The first cell, for instance, failed at age 468 days and would have been 2236 days old if it had survived until the day of the intervention. In contrast, the last cell failed at age 2541 days, 254 days after the intervention. On the day of the intervention, its age was 2287 days. The cells in each group are ordered by failure age. The failure ages given in the tables exclude the days during which cells were out of service during the shutdown.