ebook img

CLARIN: The Infrastructure for Language Resources PDF

821 Pages·2022·58.921 MB·
Save to my drive
Quick download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview CLARIN: The Infrastructure for Language Resources

CLARIN Digital Linguistics Edited by Andreas Witt Volume 1 CLARIN The Infrastructure for Language Resources Edited by Darja Fišer and Andreas Witt ISBN 978-3-11-076734-6 e-ISBN (PDF) 978-3-11-076737-7 e-ISBN (EPUB) 978-3-11-076740-7 ISSN 2751-1278 DOI https://doi.org/10.1515/9783110767377 This work is licensed under the Creative Commons Attribution 4.0 International License. For details go to https://creativecommons.org/licenses/by/4.0/. Creative Commons license terms for re-use do not apply to any content (such as graphs, figures, photos, excerpts, etc.) not original to the Open Access publication and further permission may be required from the rights holder. The obligation to research and clear permission lies solely with the party re-using the material. Library of Congress Control Number: 2022940325 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the internet at http://dnb.dnb.de. © 2022 with the author(s), editing © 2022 Darja Fišer and Andreas Witt, published by Walter de Gruyter GmbH, Berlin/Boston This book is published open access at www.degruyter.com. Cover image: piranka/E+/Getty Images Typesetting: Integra Software Services Pvt. Ltd. Printing and binding: CPI books GmbH, Leck www.degruyter.com Preface During the first decade of its existence, the CLARIN research infrastructure for language resources and technology has made great strides in creating and main- taining an infrastructure to support the sharing, use and sustainability of lan- guage data and tools for research in the humanities and social sciences. It has grown into a network of 25 member and observer third-party countries with 70 CLARIN centres, over 900,000 records in its repositories and an immeasurable number of contributors, users, and trainers. As CLARIN transitions from the phase of conception and development to the phase of stable growth, CLARIN’s explicit and implicit institutional memory is invaluable not only for all types of the current and future members of CLARIN’s network but also for the educational institutions, funding bodies, policy makers, and fellow research infrastructures. While CLARIN’s achievements have been individually documented in numerous workshop, conference and journal articles, they have never been collected and presented in a comprehensive, single volume, which was the main motivation behind the call for contributions for this book. Our primary aim was to offer a volume that will be useful for researchers and lecturers in various fields of humanities and social science, such as linguis- tics, digital humanities, literary studies, history, media studies, communication studies, and political science. Moreover, as CLARIN is one of the first ERICs set up by the European Commission, we also wanted to make it relevant for every- one interested in EU Research and Development policy. In November 2020 we published a call for contributions documenting CLARIN’s organization and its members, its goals and its functioning, the tools and resources hosted by the CLARIN infrastructure as well as prominent use cases and success stories. The response has far exceeded our expectations, with 31 submissions by 109 authors from all corners of the CLARIN network, which were then carefully reviewed by the editors. The process, which was completed in September 2022, resulted in an impressive volume of ca. 800 pages that is organized into 4 parts: Introduc- tion to CLARIN, CLARIN Technical infrastructure, CLARIN Knowledge infrastruc- ture and Research driven by CLARIN. We are especially proud that we are able to present a rich body of work that not only describes how CLARIN is built and what it offers but also hear directly from the researchers with highly diverse profiles and research interests whose work has benefitted from the infrastructure. The editors would like to thank everyone who has contributed to the success of this volume, which, because of the Covid-19 pandemic, required extra flexi- bility and dedication: the authors of the chapters for their inspiring contribu- tions, the technical editors for copyediting and CLARIN ERIC for their support Open Access. © 2022 the author(s), published by De Gruyter. This work is licensed under the Creative Commons Attribution 4.0 International License. https://doi.org/10.1515/9783110767377-202 VI   Preface with making the book openly accessible. In particular, the editors would like to thank Paweł Kamocki for his support throughout the editing process, and Jen- nifer Ecker, whose role in handling the communication with the authors and with the publisher cannot be overestimated. The editors accept full responsibility for all mistakes and shortcomings in this volume. Darja Fišer & Andreas Witt Contents Preface   V Part I: CLARIN: An Introduction of the ERIC Steven Krauwer and Bente Maegaard CLARIN – How It Started   3 Franciska de Jong, Dieter Van Uytvanck, Francesca Frontini, Antal van den Bosch, Darja Fišer, and Andreas Witt Language Matters   31 Part II: Technical Infrastructure Jan Hajič, Eva Hajičová, Barbora Hladká, Jozef Mišutka, Ondřej Košarko, and Pavel Straňák LINDAT/CLARIAH-CZ: Where We Are and Where We Go   61 Claus Zinn and Emanuel Dima The CLARIN Language Resource Switchboard   83 Luís Gomes, Ruben Branco, João Silva, and António Branco Open and Inclusive Language Processing   107 Daan Broeder and Jan Odijk Sustainability and Genericity of CLARIN Services in the Netherlands   133 Marc Kupietz, Nils Diewald, and Eliza Margaretha Building Paths to Corpus Data   163 Menzo Windhouwer and Twan Goosen Component Metadata Infrastructure   191 Martina Trognitz, Matej Ďurčo, and Karlheinz Mörth Text Technology for the Digital Humanities   223 VIII   Contents Gisle Andersen and Peder Gammeltoft The Role of CLARIN in Advancing Terminology: The Case of Termportalen – the National Terminology Portal for Norway   249 Christoph Draxler, Alexander Geyken, Erhard Hinrichs, Annette Klosa-Kückelhaus, Elke Teich, and Thorsten Trippel How to Connect Language Resources, Infrastructures, and Communities   275 Piotr Bański and Hanna Hedeland Standards in CLARIN   307 Part III: Knowledge Infrastructure Jakob Lenardič and Darja Fišer The CLARIN Resource and Tool Families   343 Henk van den Heuvel, Nelleke Oostdijk, Caroline Rowland, and Paul Trilsbeek The CLARIN Knowledge Centre for Atypical Communication Expertise   373 Tanja Wissik, Leon Wessels, and Frank Fischer The DH Course Registry: A Piece of the Puzzle in CLARIN’s Technical and Knowledge Infrastructure   389 Martin Hennelly, Langa Khumalo, Juan Steyn, and Menno van Zaanen Training of Digital Language Resources Skills in South Africa   409 Nikola Ljubešić, Tomaž Erjavec, Maja Miličević Petrović, and Tanja Samardžić Together We Are Stronger: Bootstrapping Language Technology Infrastructure for South Slavic Languages with CLARIN.SI   429 Paweł Kamocki, Aleksei Kelli, and Krister Lindén The CLARIN Committee for Legal and Ethical Issues and the Normative Layer of the CLARIN Infrastructure   457 Krister Lindén, Tommi Jauhiainen, Mietta Lennes, Mikko Kurimo, Aleksi Rossi, Tommi Kurki, and Olli Pitkänen Donate Speech   481 Contents   IX Rūta Petrauskaitė, Darius Amilevičius, Virginijus Dadurkevičius, Tomas Krilavičius, Gailius Raškinis, Andrius Utka, and Jurgita Vaičenonienė CLARIN-LT: Home for Lithuanian Language Resources   511 Margunn Rauset, Gyri Smørdal Losnegaard, Helge Dyvik, Paul Meurer, Rune Kyrkjebø, and Koenraad De Smedt Words, Words!   537 Eva Pettersson and Lars Borin Swedish Diachronic Corpus   561 Part IV: Research Driven by Infrastructure João Silva, Sara Grilo, Márcia Bolrinha, Rodrigo Santos, Luís Gomes, António Branco, and Rui Vaz Where do I Belong in Six Centuries of Literature?   589 Eva Hajičová, Jan Hajič, Barbora Hladká, Jiří Mírovský, Lucie Poláková, Kateřina Rysová, Magdaléna Rysová, Pavel Straňák, Barbora Štěpánková, and Šárka Zikánová Corpus Annotation as a Feasible and Scientifically Beneficial Task   613 Silvia Calamai, Duccio Piccardi, Niccolò Pretto, Giovanni Candeo, Maria Francesca Stamuli, and Monica Monachini Not Just Paper: Enhancement of Archive Cultural Heritage   647 Anna Lindahl and Stian Rødven-Eide Argumentative Language Resources at Språkbanken Text   667 Jack Hoeksema, Kees de Glopper, and Gertjan van Noord Syntactic Profiles in Secondary School Writing Using PaQu and SPOD   691 Jan Odijk CLARIN’s Support for Research into the Acquisition of Lexical Properties   709 Riccardo Pozzo, Timon Gatta, Hansmichael Hohenegger, Jonas Kuhn, Axel Pichler, Marco Turchi, and Josef van Genabith Aligning Immanuel Kant’s Work and its Translations   727

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.