Data Analytics VCE Units 3&4 © 2019 Cengage Learning Australia Pty Limited 5th Edition Gary Bass Copyright Notice Natalie Heath This Work is copyright. No part of this Work may be reproduced, stored in a Therese Keane retrieval system, or transmitted in any form or by any means without prior Anthony Sullivan written permission of the Publisher. Except as permitted under the Mark Kelly Copyright Act 1968, for example any fair dealing for the purposes of private study, research, criticism or review, subject to certain limitations. These limitations include: Restricting the copying to a maximum of one chapter or 10% of this Senior publisher: Eleanor Gregory book, whichever is greater; providing an appropriate notice and warning with the Editor: Scott Vandervalk copies of the Work disseminated; taking all reasonable steps to limit access to Proofreader: Nadine Anderson these copies to people authorised to receive these copies; ensuring you hold the Indexer: Bruce Gillespie appropriate Licences issued by the Copyright Agency Limited (“CAL”), supply a Visual designer: James Steer remuneration notice to CAL and pay any required fees. For details of CAL licences Cover design: Chris Starr, MakeWork and remuneration notices please contact CAL at Level 11, 66 Goulburn Street, Text design: Leigh Ashforth, Watershed Art & Design Sydney NSW 2000, Tel: (02) 9394 7600, Fax: (02) 9394 7601 Permissions researcher: Lyahna Spencer Email: [email protected] Production controller: Karen Young Website: www.copyright.com.au Typeset by: DiacriTech For product information and technology assistance, Any URLs contained in this publication were checked for currency during the in Australia call 1300 790 853; production process. Note, however, that the publisher cannot vouch for the in New Zealand call 0800 449 725 ongoing currency of URLs. For permission to use material from this text or product, please email [email protected] Acknowledgements Extracts from the VCE Applied Computing Study Design (2020–2023), are ISBN 978 0 17 044087 5 reproduced by permission, © VCAA. VCE is a registered trademark of the VCAA. The VCAA does not endorse or make any warranties regarding this study Cengage Learning Australia resource. Current VCE Study Designs, past VCE exams and related content can be Level 7, 80 Dorcas Street accessed directly at www.vcaa.vic.edu.au South Melbourne, Victoria Australia 3205 Cengage Learning New Zealand Unit 4B Rosedale Office Park 331 Rosedale Road, Albany, North Shore 0632, NZ For learning solutions, visit cengage.com.au Printed in Singapore by 1010 Printing International Limited. 1 2 3 4 5 6 7 23 22 21 20 19 Contents Preface v About the authors vi How to use this book vii Outcomes ix Problem-solving methodology xiii Key concepts xvi 3 Unit Introduction 1 Chapter 3 Project management and data analysis 113 Chapter 1 Data and presentation 2 What is data? 114 Project management 114 What is data? 3 Why must we begin with a research question? 124 Referencing data sources 14 Primary and secondary data 128 Data types 16 Quantitative and qualitative data 129 Data structures 19 Data types and data structures 136 Data integrity 20 Referencing data sources 137 Data validation 24 Data integrity 139 Data visualisation 25 Data security 152 Designing solutions 37 Next steps 158 Design principles 37 Formats and conventions 39 Design tools for data visualisation 46 Chapter 4 Drawing conclusions 165 Continuing Unit 3, Outcome 2 166 Chapter 2 Data manipulation and Solution specifications 166 presentation 57 Design principles 171 Generating design ideas 176 Databases 58 Design tools 187 Creating a database structure 59 Types of infographics and data visualisations 190 Spreadsheet tools 84 Preparing for Unit 3, Outcome 2 204 Data visualisation 89 Testing 96 Review the Outcome’s steps 106 Preparing for Unit 3, Outcome 1 112 iii 9780170440875 4 Unit Introduction 207 Chapter 6 Information management 262 Networks 263 Chapter 5 Development and evaluation 208 Threats to data and information 272 Physical and software security controls 275 Data visualisations 209 Managing data on a network 284 Procedures and techniques for managing files 214 Network attached storage and cloud computing 289 Functional capabilities of software 220 An e�ective infographic or data visualisation 221 Manipulating data 234 Chapter 7 Cyber security measures 297 Formats and conventions 237 The importance of data and information to organisations 298 Manipulating data with software 238 Information management strategies 301 Verification and validation 243 Data security 302 Testing 244 The importance of diminished data integrity in Evaluating your solution 248 information systems 303 Documenting the progress of projects 250 Key legislation relating to data and information 304 Assessing your project plan 254 Ethics and security practices 314 Next steps 255 Resolving legal and ethical tensions 317 Preparing for Unit 4, Outcome 1 261 Reasons to prepare for disaster 318 Consequences of security failure 328 Evaluating information management strategies 331 Preparing for Unit 4, Outcome 2 338 Index 339 iv CONTENTS 9780170440875 Preface This fifth edition of Data Analytics VCE Units 3 & 4 incorporates the changes to the VCAA VCE Applied Computing Study Design that took e�ect from 2020. This textbook looks at how individuals and organisations use, and can be a�ected by, information systems in their daily lives. We believe that teachers and students require a text that focuses on the Areas of Study specified in the Study Design and which presents information in a sequence that allows easy transition from theory into practical assessment tasks. We have, therefore, written this textbook so that a class can begin at Chapter 1 and work their way systematically through to the end. Students will encounter material relating to the key knowledge dot points for each Outcome before they reach the special section that describes the Outcome. The Study Design outlines key skills that indicate how the knowledge can be applied to produce a solution to an information problem. These Outcome preparation sections occur regularly throughout the textbook and flag an appropriate point in the student’s development for each Outcome to be completed. The authors have covered all key knowledge for the Outcomes from the Data Analytics VCE Units 3 & 4 course. Our approach has been to focus on the key knowledge required for each school-assessed Outcome, and to ensure that students are well prepared for these; however, there is considerable duplication in the Study Design relating to the knowledge required for many of the Outcomes. We have found that, with an Outcomes approach, we are covering material sometimes several times. For example, knowledge of a problem-solving methodology is listed as key knowledge for many di�erent Outcomes. In these cases, we have tried to provide a general coverage in the first instance, and specifically apply the concept to a situation relevant to the related Outcome on subsequent encounters. The authors assume that teachers will develop the required key skills with their students within the context of the key knowledge addressed in this textbook and the resources available to them. We have incorporated a margin column in the text to provide additional information and reinforce key concepts. This margin column also includes activities that relate to the topics covered in the text and considers issues relevant to information systems usage. Outcome features appear at several points in the book, indicating the nature of the tasks that students undertake in the completion of the school-assessed Outcome. We have listed the steps required to complete the Outcome, together with advice and suggestions for approaching the task. We have also described the output and support material needed for submission. You will also find sample tasks and further advice relating to the Outcomes are available at https://nelsonnet.com.au. The chapters are organised to present the optimum amount of information in the most e�ective manner. The book is presented in concise, clearly identified sections to guide students through the text. Each chapter is organised into the sections described on pages vii–viii. v 9780170440875 About the authors Gary Bass teaches VCE Applied Computing at Year 11 and Year 12 in an online course environment at Virtual School Victoria. Previously, he has taught VCE Physics, as well as developing and delivering middle school ICT courses. Gary has presented at DLTV DigiCON and the annual IT teachers conference on many topics including Pop-up Makerspace; Big Data requires huge analysis – data visualisation; AR + VR = Mixed reality; and Marshall McLuhan-Medium is the message. Gary was selected as an Apple Distinguished Educator (ADE) in 2002 and 2011. In 2016, he was presented with DLTV’s IT Leader of the Year award. Natalie Heath is the eLearning/ICT Leading Teacher at Eltham High School. She has been an IT specialist teacher for nearly 20 years, teaching at all secondary levels including VCE Informatics, IT Applications, Information Processing and Management, and Software Development. She has extensive experience around VCE, having assessed examinations in various subjects for more than two decades. Natalie has also developed many resources for VCE Computing subjects over the years, including trial examinations. She has presented at teacher professional learning conferences as an expert in Unit 3 and 4 subjects and in 2018 was presented with the DLTV’s Maggie Iaquinto VCE Computing Educator of the Year award for her services to the VCE Computing teaching community and resource development. Associate Professor Therese Keane is Deputy Chair of the Department of Education at Swinburne University and has worked in a variety of school settings where she has taught IT in K–12 education as the Director of ICT. Her passion and achievements in ICT in the education and robotics space have been acknowledged by her peers in her receiving numerous national and state awards. Therese has presented seminars and workshops for teachers involved in the teaching of IT. She has written several textbooks in all units of VCE Information Technology. Therese’s research interests include the use of technology in education, gender inequalities in STEM-based subjects, robotics in education and computers in schools for teaching and learning purposes. Therese is involved with the FIRST LEGO League as the Championship Tournament Director for Victoria and she is a lead mentor for the RoboCats – a female school student only robotics team that participates in the FIRST Robotic Competition. Anthony Sullivan is a Curriculum and Learning Specialist at Monash College where he is responsible for creating assessment and learning materials for accounting and computing subjects as part of the Monash University Foundation Year program. Anthony has more than 25 years experience teaching business and computing subjects. He has taught in both government and non-government settings in Australia and taught computing and information technology courses in schools in Asia and the United Kingdom. Anthony has also been a VCE Examination Assessor, a member of the committee that reviewed and wrote the previous Study Design for VCE Computing, and has written a range of commercial resources related to VCE Computing. He has presented at conferences and professional development events and student examination preparation sessions. vi 9780170440875 How to use this book KEY KNOWLEDGE The key knowledge from the VCE Applied Computing Study Design that you will cover in each chapter is listed on the first page of each chapter. The list includes key knowledge specified in the Outcome related to the chapter. FOR THE STUDENT Each chapter’s opening page includes an overview of that chapter’s contents so that you are aware of the material you will encounter. FOR THE TEACHER This section is for your teacher and outlines how the chapter fits into the overall study of Data Analytics, and outlines how the material relates to the completion of Outcomes. CHAPTERS The major learning material that you will encounter in the chapter is presented as text, photographs and illustrations. The text describes in detail the theory associated with the stated Outcomes of the VCE Applied Computing Study Design in easy-to-understand language. The photographs show hardware, software and other objects that have been described in the text. Illustrations are used to demonstrate concepts that are more easily explained in this manner. Throughout the chapter, glossary terms are highlighted in bold and you can find their definitions at the end of each chapter, in Essential terms. The School-assessed Task Tracker at the bottom of every odd-numbered page provides you with a visual reminder to help you track your progress in the School-assessed Task (SAT), which is derived from Unit 3, Outcome 2 and Unit 4, Outcome 1, so that you can complete 3.1 all required stages on time. THINK ABOUT DATA ANALYTICS MARGIN COLUMN Project management tools are useful to find The margin column contains further explanations that support the main text, weblink icons, the perfect number additional material outside the Study Design and cross-references to material covered of people needed on a task so it is finished elsewhere in the textbook. Issues relevant to Data Analytics that you can discuss with your as quickly as possible classmates are also included in the form of ‘Think about Data Analytics’ boxes. without anyone being idle. Use software to CHAPTER SUMMARY develop a Gantt chart to plan the baking of a The chapter summary at the end of each chapter is divided into two main parts to help you cake. Assume you can use as many cooks as review each chapter. you want. Essential terms are the glossary terms that have been highlighted throughout the chapter. The Important facts section is a list of summaries, ideas, processes and statements relevant to the chapter, in the order in which they occur in the chapter. vii 9780170440875 TEST YOUR KNOWLEDGE Short-answer questions will help you to review the chapter material. The questions are grouped and identified with a section of the text to allow your teacher to direct appropriate questions based on the material covered in class. Teachers will be able to access answers to these questions at https://nelsonnet.com.au. APPLY YOUR KNOWLEDGE Each chapter concludes with a set of questions requiring you to demonstrate that you can apply the theory from the chapter to more complex questions. The style of questions reflects what you can expect in the end-of-year examination. Teachers will be able to access suggested responses to these questions at https://nelsonnet.com.au. PREPARING FOR THE OUTCOMES This section appears at points in the course where it is appropriate for you to complete an Outcome task. The information provided describes what you need to do in the Outcome, the suggested steps to be followed in the completion of the task and the material that needs to be submitted for assessment. NELSONNET The NelsonNet student website contains: • multiple-choice quizzes for each chapter, mirroring the VCAA Unit 3 & 4 exam • additional material such as spreadsheets and infographics. An open-access weblink page is also provided for all weblinks that appear in the margins throughout the textbook. This is accessible at https://nelsonnet.com.au. The NelsonNet teacher website is accessible only to teachers and it contains: • answers for the Test your knowledge and Apply your knowledge questions in the book • sample School-assessed Coursework (SAC) • chapter tests • practice exam. Please note that complimentary access to NelsonNet and the NelsonNetBook is only available to teachers who use the accompanying student textbook as a core educational resource in their classroom. Contact your sales representative for information about access codes and conditions. viii HOW TO USE THIS BOOK 9780170440875 Outcomes OUTCOME KEY KNOWLEDGE LOCATION Unit 3 Data analytics Area of Study 1 On completion of this unit the student should be able to respond to teacher-provided solution Outcome 1 requirements and designs to extract data from large repositories, manipulate and cleanse data and apply a range of functions to develop software solutions to present findings. Data and • techniques for e�cient and e�ective data collection, including methods to collect census, pp. 6–10 information Geographic Information System (GIS) data, sensor, social media and weather • factors influencing the integrity of data, including accuracy, authenticity, correctness, pp. 20–23 reasonableness, relevance and timeliness • sources of, and methods and techniques for, acquiring authentic data stored in large pp. 10–13 repositories • methods for referencing primary and secondary sources, including American Psychological pp. 14–16 Association (APA) referencing system • characteristics of data types pp. 16–18 Approaches to • methods for documenting a problem, need or opportunity p. 37 problem solving • methods for determining solution requirements, constraints and scope p. 37 • naming conventions to support e�cient use of databases, spreadsheets and data visualisations p. 67 • a methodology for creating a database structure: identifying entities, defining tables and fields to represent entities; defining relationships by identifying primary key fields and foreign key pp. 58–66 fields; defining data types and field sizes, normalisation to third normal form • design tools for representing databases, spreadsheets and data visualisations, including data pp. 46–9, 84 dictionaries, tables, charts, input forms, queries and reports • design principles that influence the functionality and appearance of databases, spreadsheets pp. 37–9, 84–93 and data visualisations • functions and techniques to retrieve required information through querying data sets, pp. 74–5 including searching, sorting and filtering to identify relationships and patterns • software functions, techniques and procedures to e�ciently and e�ectively validate, pp. 80–2, 88 manipulate and cleanse data including files, and applying formats and conventions • types and purposes of data visualisations pp. 25–31 • formats and conventions applied to data visualisations to improve their e�ectiveness for pp. 39–45 intended users, including clarity of message • methods and techniques for testing databases, spreadsheets and data visualisations pp. 96–106 Interactions and • reasons why organisations acquire data p. 4 impact Key skills • interpret solution requirements and designs to develop data visualisations pp. 37, 167–71 • identify, select and extract relevant data from large repositories pp. 4, 9–13 • use a standard referencing system to acknowledge intellectual property p. 14 • organise, manipulate and cleanse data using database and spreadsheet software pp. 74–5, 80–8 • select, justify and apply functions, formats and conventions to create e�ective data visualisations pp. 39–45 • develop and apply suitable validation and testing techniques to software tools used pp. 24–5 ix 9780170440875 OUTCOME KEY KNOWLEDGE LOCATION Unit 3 Data analytics: Analysis and design Area of Study 2 On completion of this unit the student should be able to propose a research question, formulate Outcome 2 a project plan, collect and analyse data, generate alternative design ideas and represent the preferred design for creating infographics or dynamic data visualisations. Digital systems • roles, functions and characteristics of digital system components p. 148–52 • physical and software security controls used by organisations for protecting stored and p. 152–8 communicated data Data and • primary and secondary data sources and methods of collecting data, including interviews, p. 128 information observation, querying of data stored in large repositories and surveys • techniques for searching, browsing and downloading data sets p. 126 • suitability of quantitative and qualitative data for manipulation p. 129–30 • characteristics of data types and data structures relevant to selected software tools p. 136–7 • methods for referencing secondary sources, including the APA referencing system p. 137–9 • criteria to check the integrity of data, including accuracy, authenticity, correctness, p. 139–47 reasonableness, relevance and timeliness • techniques for coding qualitative data to support manipulation p. 130–6 Approaches to • features of a research question, including a statement identifying the research question as an p. 124–6 problem solving information problem • functional and non-functional requirements, including data to support the research question, p. 167–71 constraints and scope • types and purposes of infographics and dynamic data visualisations p. 190–8 • design principles that influence the appearance of infographics and the functionality and p. 171–6 appearance of dynamic data visualisations • design tools for representing the appearance and functionality of infographics and dynamic p. 187–9 data visualisations, including data manipulation and validation, where appropriate • techniques for generating alternative design ideas p. 176–85 • criteria for evaluating alternative design ideas and the e�ciency and e�ectiveness of p. 185–7 infographics or dynamic data visualisations • features of project management using Gantt charts, including the identification and p. 114–23 sequencing of tasks, time allocation, dependencies, milestones and the critical path Interactions and • key legal requirements for the storage and communication of data and information, including p. 307–14 impact human rights requirements, intellectual property and privacy Key skills • frame a research question p. 124–6 • analyse and document requirements, constraints and scope of infographics or dynamic data pp. 167–171 visualisations • apply techniques for searching, downloading, browsing and referencing data sets p. 126 • select and apply design tools to represent the functionality and appearance of infographics or p. 187–9 dynamic data visualisations • generate alternative design ideas p. 176–85 x OUTCOMES 9780170440875