Distant Horizons Distant Horizons Digital Evidence and Literary Change Ted Underwood The University of Chicago Press y Chicago and London The University of Chicago Press, Chicago 60637 The University of Chicago Press, Ltd., London © 2019 by The University of Chicago All rights reserved. No part of this book may be used or reproduced in any manner whatsoever without written permission, except in the case of brief quotations in critical articles and reviews. For more information, contact the University of Chicago Press, 1427 East 60th Street, Chicago, IL 60637. Published 2019 Printed in the United States of America 28 27 26 25 24 23 22 21 20 19 1 2 3 4 5 ISBN- 13: 978- 0- 226- 61266- 9 (cloth) ISBN- 13: 978- 0- 226- 61283- 6 (paper) ISBN- 13: 978- 0- 226- 61297- 3 (e- book) DOI: https:// doi .org /10 .7208 /chicago /9780226612973 .001 .0001 Library of Congress Cataloging-in-Publication Data Names: Underwood, Ted, author. Title: Distant horizons : digital evidence and literary change / Ted Underwood. Description: Chicago : The University of Chicago Press, 2019. | Includes bibliographical references and index. Identifiers: LCCN 2018036446 | ISBN 9780226612669 (cloth : alk. paper) | ISBN 9780226612836 (pbk. : alk. paper) | ISBN 9780226612973 (e-book) Subjects: LCSH: Literature—Research—Methodology. | Digital humanities. Classification: LCC PN73.U53 2019 | DDC 807.2—dc23 LC record available at https://lccn.loc.gov/2018036446 ♾ This paper meets the requirements of ANSI/NISO Z39.48– 1992 (Permanence of Paper). Contents List of Illustrations vii Preface: The Curve of the Literary Horizon ix 1 Do We Understand the Outlines of Literary History? 1 2 The Life Spans of Genres 34 3 The Long Arc of Prestige 68 4 Metamorphoses of Gender 111 5 The Risks of Distant Reading 143 Acknowledgments 171 Appendix A: Data 173 Appendix B: Methods 185 Index 199 Illustrations Figures 1.1 Frequency of color terms in a random sample of fiction, 1700– 1922 10 1.2. Frequency of Stanford “hard seeds” as percentage of fiction and biography 13 1.3. Predictive accuracy of models that attempt to distinguish fiction from biography 23 1.4. Probability of being fiction or biography 23 1.5. Mean time narrated in 250 words, 1700– 2000 28 2.1. Probability of being detective fiction 51 2.2. Probability of being science fiction 58 2.3. Pace of change in detective fiction and science fiction 62 3.1. Probability of belonging to reviewed poetry set 77 3.2. Probability of belonging to reviewed fiction set 80 3.3. Ratio of best sellers by reviewed authors to those by random authors 100 3.4. The literary field, 1850– 74 102 viii List of Illustrations 3.5. The literary field, 1925– 49 103 4.1. Accuracy of models predicting the gender of characters 116 4.2. Gendering of felt, read, and got 120 4.3. Gendering of heart, mind, and passion 121 4.4. Gendering of room, chamber, and house 122 4.5. Gendering of laughed, smiled, grinned, and chuckled 123 4.6. Gendering of eyes, hair, chest, and pocket 125 4.7. Gender dimorphism in characterization, 1800– 2000 127 4.8. Percentage of words used in characterization that describe women 132 4.9. Percentage of English- language fiction titles written by women 134 4.10. Books by women as a percentage of all books 137 B.1. Logistic regression 194 Tables 1.1. Categories that distinguish fiction from biography 25 3.1. Periodicals used to construct a reviewed sample 75 3.2. Categories that distinguish reviewed and random volumes of poetry 84 3.3. Categories that distinguish reviewed and random volumes of fiction 85 Preface: The Curve of the Literary Horizon This is a book about recent discoveries in literary history. The word discovery may sound odd, because the things that matter in literary history are usually arguments, not discoveries. Although lost manuscripts do occasionally turn up in an attic, uncover- ing new evidence is rarely the main purpose of literary research. Instead, scholars reinterpret the well-k nown outlines of the past (Romantic, Victorian, modern) by drawing new connections be- tween texts or by moving something marginal to center stage. Or so I thought ten years ago. Over the past decade, I have gradually lost confidence that the broad outlines of the literary past are as well known as I once thought. As scholars have learned to compare thousands of volumes at a time, we have stumbled onto broad, century-s panning trends that are not described in textbooks and not explained by period concepts. It is becom- ing clear that we have narrated literary history as a sequence of discrete movements and periods because chunks of that size are about as much of the past as a single person could remember and discuss at one time. Apparently, longer arcs of change have been hidden from us by their sheer scale— just as you can drive across a continent noticing mountains and political boundaries but