ebook img

Data Quality Assurance and Analysis for Food Science Data PDF

101 Pages·2015·4.47 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Quality Assurance and Analysis for Food Science Data

ETH Library Data Quality Assurance and Analysis for Food Science Data Master Thesis Author(s): Gemenetzi, Konstantina Publication date: 2015 Permanent link: https://doi.org/10.3929/ethz-a-010510135 Rights / license: In Copyright - Non-Commercial Use Permitted This page was generated automatically upon download from the ETH Zurich Research Collection. For more information, please consult the Terms of use. Data Quality Assurance and Analysis for Food Science Data Master Thesis Konstantina Gemenetzi <[email protected]> Prof. Dr. MoiraC.Norrie DavidWeber GlobalInformationSystemsGroup InstituteofInformationSystems DepartmentofComputerScience ETHZurich Monday6thApril,2015 Copyright©2015GlobalInformationSystemsGroup. Abstract Data is of paramount significance in every information system. Even though data quality might have a different meaning for different users and applications, it is always recognized that the quality of the data in an information system can have a large impact to the quality andvalueofeveryotheraspectofthesystem. In this master thesis, we model and quantify data quality in information systems with the use of constraints. We propose a novel concept for a highly flexible and configurable data qualitymanagementframeworkthatcaneasilybeadaptedtotheneedsofdifferentusersand applications. Forthispurpose,weintroducethenotionofcontracts,whicharesimple,inter- changeable files that allow each user and application of the system to specify an individual importance weight for each quality-related constraint, thus enabling the creation of unique qualityprofiles. Ourframeworkallowstheusertovalidatedataduringinput,aswellasanalyze,quantifyand visualize the overall quality of the data at any time. For both the data input validation and thedataqualityanalysis,theuserisabletoselectthedataentitiesandconstraintgroupsthat shouldbevalidatedandanalyzed,aswellasthecontract(qualityprofile)thatshouldbeused for the validation and analysis. To measure quality, we defined a novel quality calculation scheme and scale. For the definition, validation and management of the constraints in a unifiedway,weusethebeanvalidation1 specification. For evaluating our approach, we implemented our concept within the FoodCASE food sci- encedatamanagementsystem. FoodCASEisusedfortheadministrationofnutrientsforfood composition studies and contaminants for Total Diet Studies (TDS). Currently, FoodCASE is used to manage the data of the Swiss Food Composition Database as well as the data of TDS-Exposure2 studies for several countries across Europe. Our data quality management framework and its FoodCASE implementation prototype were presented to the participants of the TDS-Exposure workshop that took place during the third TDS-Exposure General As- semblyinFebruary2015andreceivedpositivefeedback,whichisalsodiscussedinthiswork. 1http://beanvalidation.org/ 2http://www.tds-exposure.eu/ iii iv Contents 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Background 5 2.1 DataQuality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 DataversusInformation . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.2 DataQualityDimensions . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.3 DataConstraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.4 DataQualityasaConstraintProblem . . . . . . . . . . . . . . . . . 8 2.2 DataQualityManagementFrameworks . . . . . . . . . . . . . . . . . . . . 8 2.2.1 ConceptualFrameworks . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 IndustryGuidelinesandImplementedFrameworks . . . . . . . . . 10 2.3 FoodCASE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1 GeneralInformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 IssuestobeAddressed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3 Approach 15 3.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 BeanValidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 v vi CONTENTS 3.1.2 DataQualityMeasurementandCalculation . . . . . . . . . . . . . 17 3.1.3 Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.4 WarningtoErrorThreshold . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.5 ValidationGroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.6 PayloadLabelsandWeights . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 DataQualityAnalysisModule . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.1 VisualizationsandDataCleansing . . . . . . . . . . . . . . . . . . . 23 3.3 DataInputValidationModule . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4 AdministrativeToolsModule . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4.1 ContractEditingandCreation . . . . . . . . . . . . . . . . . . . . . . 28 3.4.2 GeneralValidationandAnalysisSettings . . . . . . . . . . . . . . . 28 3.4.3 InputValidationSettings . . . . . . . . . . . . . . . . . . . . . . . . 30 4 Implementation 31 4.1 Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.1.1 EnterpriseJavaBeans . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.1.2 HibernateValidator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.1.3 JavaSwing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1.4 JFreeChart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2 MainConceptImplementationDetails . . . . . . . . . . . . . . . . . . . . . 33 4.2.1 ConstraintDeclarationandDefinition . . . . . . . . . . . . . . . . . 33 4.2.2 Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2.3 ValidationGroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2.4 PayloadWeights. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.3 TDSDataQualityAnalysisBean . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.4 MainHelperClasses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 CONTENTS vii 4.4.1 ConstraintValues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.4.2 ConstraintMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.5 DataQualityAnalysisModule . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.5.1 GeneralInformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.5.2 EntityandGroupSelectionTreeCreation . . . . . . . . . . . . . . . 42 4.5.3 QualityAnalysisandCalculation . . . . . . . . . . . . . . . . . . . . 43 4.5.4 Visualizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.6 DataInputValidationModule . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.6.1 GeneralInformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.6.2 InputValidationandQualityEvaluation . . . . . . . . . . . . . . . . 46 4.6.3 ValidationFeedback . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.7 AdministrativeToolsModule . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.7.1 GeneralInformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.7.2 ContractEditingandCreationPanel . . . . . . . . . . . . . . . . . . 48 4.7.3 GeneralValidationandAnalysisSettings . . . . . . . . . . . . . . . 49 4.7.4 InputValidationSettings . . . . . . . . . . . . . . . . . . . . . . . . 49 4.8 OpenImplementationIssues . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.8.1 DisplayingPreviouslySelectedUserConfigurations . . . . . . . . . 50 4.8.2 LoadingoftheDatabaseSettingsbytheTDSQualityAnalysisBean 50 5 Comparison with FoodCASE Food Composition Data Quality Management Framework 51 5.1 GeneralConcept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.2 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3 InputValidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.4 DataQualityAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 viii CONTENTS 6 UserEvaluation 55 6.1 TDS-ExposureWorkshop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.2 QuestionnaireFindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.2.1 Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.2.2 InputValidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.2.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.2.4 AccessRights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.2.5 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.3 OverviewofKeyFindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 7 Extensibility 61 7.1 UseofDifferentDataQualityCalculationSchemes . . . . . . . . . . . . . . 61 7.2 AdditionalDataQualityAnalysisVisualizations . . . . . . . . . . . . . . . . 62 7.3 SupportforMethodConstraints . . . . . . . . . . . . . . . . . . . . . . . . . 63 8 Conclusion 65 8.1 SummaryofWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 8.1.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 8.1.2 DataQualityAnalysisModule . . . . . . . . . . . . . . . . . . . . . . 66 8.1.3 DataInputValidationModule . . . . . . . . . . . . . . . . . . . . . . 66 8.1.4 AdministrativeToolsModule . . . . . . . . . . . . . . . . . . . . . . 67 8.1.5 ImplementationfortheFoodCASESystem . . . . . . . . . . . . . . 67 8.1.6 UserEvaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 8.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 8.3 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 8.3.1 InclusionofAdditionalAnalyzableEntitiesandConstraints . . . . 69 8.3.2 AdditionalAnalysisofUserEvaluationResults . . . . . . . . . . . . 70 CONTENTS ix 8.3.3 PerformanceTestsandComparisonwiththeFoodComposition DQMF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 8.3.4 PersistenceofDataQualityAnalysisResults . . . . . . . . . . . . . 71 8.3.5 ContractEditingandCreationPanelEnhancement . . . . . . . . . 71 8.3.6 ImprovementofInputValidationFeedbackMechanism . . . . . . 71 A AcronymsandAbbreviations 73 B ComparisonbetweenHibernateValidatorandOVal 75 C QuestionsfromtheTDS-ExposureWorkshopQuestionnaire 79 C.1 Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 C.2 InputValidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 C.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 C.4 AccessRights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 C.5 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 C.6 ConstraintGrouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 C.7 ComparisonwithexistingFoodCompositionDataQualityFramework . . 81 C.8 Additionalcommentsorsuggestions . . . . . . . . . . . . . . . . . . . . . . 81 ListofFigures 83 ListofTables 85 Bibliography 89

Description:
process of being replaced by JavaFX as the standard GUI library for Java SE, even though both will be 5http://docs.oracle.com/javase/tutorial/uiswing/start/about.html Artech House, Inc., Norwood, MA, USA, 1st edition,. 1997.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.