ebook img

Intelligent Document Processing with AWS AI/ML: A comprehensive guide to building IDP pipelines with applications across industries PDF

246 Pages·2022·23.948 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Intelligent Document Processing with AWS AI/ML: A comprehensive guide to building IDP pipelines with applications across industries

Intelligent Document Processing with AWS AI/ML A comprehensive guide to building IDP pipelines with applications across industries Sonali Sahu BIRMINGHAM—MUMBAI Intelligent Document Processing with AWS AI/ML Copyright © 2022 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s) nor Packt Publishing or its dealers and distributors will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Publishing Product Manager: Dhruv Jagdish Kataria Content Development Editor: Priyanka Soam Technical Editor: Sweety Pagaria Copy Editor: Safis Editing Project Coordinator: Farheen Fathima Proofreader: Safis Editing Indexer: Rekha Nair Production Designer: Joshua Misquitta Marketing Coordinator: Shifa Ansari First published: October 2022 Production reference: 1300922 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-80181-056-2 www.packt.com For my parents, for always loving and believing in me. For all the women who can do whatever they want, when they want – or will one day. Co n t r i b u t o r s About the authors Sonali Sahu is a leading Intelligent Document Processing Artificial Intelligence (AI) and Machine Learning (ML) solutions architect on the team at Amazon Web Services. She is a passionate technophile and enjoys working with customers to solve complex problems using innovation. Her core area of focus is AI and ML. She has both breadth and depth of experience working with technology, with industry expertise in healthcare and insurance. She has significant architecture and management experience in delivering large-scale programs across various industries and platforms. About the reviewer Winnie Tung has over 30 years of experience solving some of the world’s most difficult technical problems in the financial services industry. She is currently modernizing the AI/ML platform at JPMC. Before that, she worked in AWS Professional Services, specializing in developing AI/ML solutions for the real world. She helps customers to operationalize and manage AI/ML solutions at scale. Table of Contents Preface xiii Part 1: Accurate Extraction of Documents and Categorization 1 Intelligent Document Processing with AWS AI and ML 3 Understanding common Document classification 11 document processing use cases Document extraction 11 across industries 4 Document enrichment 11 Understanding the AWS ML Document post-processing and AI stack 5 (review and verification) 12 Consumption 12 Introducing Intelligent Document Processing pipeline 9 Summary 13 Data capture 10 References 13 2 Document Capture and Categorization 15 Technical requirements 15 Data store 20 Signing up for an AWS account 15 Data sources 20 Sensitive document processing 27 Understanding data capture with Amazon S3 20 viii Table of Contents Understanding document Understanding document classification with the Amazon categorization with computer Comprehend custom classifier 29 vision 38 Training a Comprehend custom Summary 44 classification model 31 3 Accurate Document Extraction with Amazon Textract 45 Technical requirements 45 Using Amazon Textract for the Understanding the challenges in accurate extraction of specialized legacy document extraction 45 documents 56 Using Amazon Textract for the Accurate extraction of ID document (driver’s license) 57 accurate extraction of different ID document (US passport) accurate types of documents 47 extraction 58 Introducing Amazon Textract 47 Receipt document accurate extraction 60 Invoice document accurate extraction 62 Summary 65 4 Accurate Extraction with Amazon Comprehend 67 Technical requirements 67 Understanding custom entities Using Amazon Comprehend for extraction with Amazon accurate data extraction 67 Comprehend 77 Understanding document Training an Amazon Comprehend custom entity recognizer 78 extraction – the IDP extraction Checking the performance of a stage with Amazon trained model 81 Comprehend 73 Inference result from the Amazon Comprehend custom entity recognizer 82 Summary 84 Table of Contents ix Part 2: Enrichment of Data and Post-Processing of Data 5 Document Enrichment in Intelligent Document Processing 89 Technical requirements 89 Amazon Comprehend Medical 93 Understanding document Learning to use Amazon enrichment 90 Comprehend Medical for Learning to use Amazon medical ontology 105 Comprehend Medical for accurate Summary 109 extraction of medical entities 93 6 Review and Verification of Intelligent Document Processing 111 Technical requirements 111 Learning about the document review process with human-in-the-loop 127 Learning post-processing for a completeness check 112 Summary 130 Post-processing sensitive data 115 References 131 7 Accurate Extraction, and Health Insights with Amazon HealthLake 133 Technical requirements 133 READ operation 140 Introducing Fast Healthcare HealthLake PUT request 142 Interoperability Resources Handling documents with (FHIR) 134 an FHIR data store 145 Using Amazon HealthLake Summary 151 as a health data store 134 References 152 FHIR operations with Amazon HealthLake 138

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.