Using Driverless AI Release 1.4.0 H2O.ai Oct 30, 2018 RELEASE NOTES 1 H2ODriverlessAIReleaseNotes 3 1.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 ChangeLog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 WhyDriverlessAI? 19 3 KeyFeatures 21 3.1 FlexibilityofDataandDeployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 NVIDIAGPUAcceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 DataVisualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4 AutomaticFeatureEngineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.5 MachineLearningInterpretability(MLI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.6 TimeSeries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.7 NLPwithTensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.8 AutomaticScoringPipelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4 InstallingandUpgradingDriverlessAI 23 4.1 LinuxDockerImages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 LinuxRPMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3 LinuxDEBs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.4 LinuxTARSH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.5 LinuxintheCloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.6 MacOSX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.7 Windows10Pro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.8 IBMPower . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5 LaunchingDriverlessAI 101 5.1 Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6 TheDatasetsPage 103 6.1 AddingDatasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.2 DatasetDetails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.3 VisualizingDatasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7 Experiments 111 7.1 BeforeYouBegin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 7.2 NewExperiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.3 ExperimentGraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.4 CompletedExperiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 7.5 ViewingExperiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 i 8 InterpretingaModel 147 8.1 IntrepretthisModelButton-Non-Time-Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 8.2 IntrepretthisModelButton-Time-Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 8.3 ModelInterpretationonDriverlessAIModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 8.4 ModelInterpretationonExternalModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 8.5 UnderstandingtheModelInterpretationPage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 8.6 GeneralConsiderations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 9 ViewingExplanations 167 10 ScoreonAnotherDataset 171 11 TransformAnotherDataset 173 12 ThePythonandMOJOScoringPipelines 175 12.1 WhichPipelineShouldIUse? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 12.2 DriverlessAIStandalonePythonScoringPipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 12.3 DriverlessAIMLIStandalonePythonScoringPackage . . . . . . . . . . . . . . . . . . . . . . . . 184 12.4 DriverlessAIMOJOScoringPipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 13 What’sHappeninginDriverlessAI? 197 14 DriverlessAITransformations 199 14.1 AvailableTransformers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 14.2 ExampleTransformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 15 InternalValidationTechnique 205 16 TimeSeriesinDriverlessAI 207 16.1 UnderstandingTimeSeries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 16.2 TimeSeriesConstraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 16.3 TimeSeriesUseCase: SalesForecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 17 NLPinDriverlessAI 215 17.1 ATypicalNLPExample: SentimentAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 18 Usingtheconfig.tomlFile 219 18.1 DockerImageUsers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 18.2 NativeInstallUsers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 18.3 SampleConfig.tomlFile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 19 SettingEnvironmentVariables 237 19.1 SettingVariablesinDockerImages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 19.2 SettingVariablesinNativeInstalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 20 EnablingDataConnectors 239 20.1 UsingDataConnectorswiththeDockerImage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 20.2 UsingDataConnectorswithNativeInstalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 21 ConfiguringAuthentication 265 21.1 EnablingAuthenticationinDockerImages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 21.2 EnablingAuthenticationinNativeInstalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 21.3 LDAPAuthenticationExample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 21.4 PAMAuthenticationExample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 ii 22 EnablingNotifications 273 22.1 ScriptInterfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 22.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 23 DriverlessAILogs 277 23.1 AccessingDriverlessAILogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 23.2 SendingLogstoH2O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 24 FAQ 283 24.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 24.2 Installation/Upgrade/Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 24.3 Data/Experiments/Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 24.4 TimeSeries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 25 AppendixA:ThePythonClient 297 25.1 InstallingthePythonClient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 25.2 CreditCardDemo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 25.3 TimeSeriesAnalysisonaDriverlessAIModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 25.4 DriverlessAINLPDemo-AirlineSentimentDataset . . . . . . . . . . . . . . . . . . . . . . . . . 328 26 References 331 iii iv UsingDriverlessAI,Release1.4.0 H2ODriverlessAIisanartificialintelligence(AI)platformforautomaticmachinelearning. DriverlessAIautomates someofthemostdifficultdatascienceandmachinelearningworkflowssuchasfeatureengineering,modelvalidation, modeltuning,modelselectionandmodeldeployment. Itaimstoachievehighestpredictiveaccuracy,comparableto expertdatascientists,butinmuchshortertimethankstoend-to-endautomation. DriverlessAIalsooffersautomatic visualizationsandmachinelearninginterpretability(MLI).Especiallyinregulatedindustries,modeltransparencyand explanation are just as important as predictive performance. Modeling pipelines (feature engineering and models) are exported (in full fidelity, without approximations) both as Python modules and as pure Java standalone scoring artifacts. DriverlessAIrunsoncommodityhardware.Itwasalsospecificallydesignedtotakeadvantageofgraphicalprocessing units (GPUs), including multi-GPU workstations and servers such as IBM’s Power9-GPU AC922 server and the NVIDIADGX-1fororder-of-magnitudefastertraining. ThisdocumentdescribeshowtoinstallanduseDriverlessAI.FormoreinformationaboutDriverlessAI,pleasesee https://www.h2o.ai/products/h2o-driverless-ai/. For a third-party review, please see https://www.infoworld.com/article/3236048/machine-learning/ review-h2oai-automates-machine-learning.html. HaveQuestions? If you have questions about using Driverless AI, post them on Stack Overflow using the driverless-ai tag at http: //stackoverflow.com/questions/tagged/driverless-ai. RELEASENOTES 1 UsingDriverlessAI,Release1.4.0 2 RELEASENOTES CHAPTER ONE H2O DRIVERLESS AI RELEASE NOTES H2ODriverlessAIisahigh-performance,GPU-enabled,client-serverapplicationfortherapiddevelopmentandde- ploymentofstate-of-the-artpredictiveanalyticsmodels. Itreadstabulardatafromvarioussourcesandautomatesdata visualization,grand-masterlevelautomaticfeatureengineering,modelvalidation(overfittingandleakageprevention), model parameter tuning, model interpretability and model deployment. H2O Driverless AI is currently targeting commonregression,binomialclassification,andmultinomialclassificationapplicationsincludingloss-given-default, probability of default, customer churn, campaign response, fraud detection, anti-money-laundering, and predictive assetmaintenancemodels. Italsohandlestime-seriesproblemsforindividualorgroupedtime-seriessuchasweekly salespredictionsperstoreanddepartment,withtime-causalfeatureengineeringandvalidationschemes. Theability tomodelunstructureddataiscomingsoon. High-levelcapabilities: • Client/serverapplicationforrapidexperimentationanddeploymentofstate-of-the-artsupervisedmachinelearn- ingmodels • Automaticallycreatesmachinelearningmodelingpipelinesforhighestpredictiveaccuracy • Automaticallycreatesstand-alonescoringpipelineforin-processscoringorclient/serverscoringviahttportcp protocols,inPythonandJava(low-latencyscoring). • PythonAPIorGUI(JavaAPIcomingsoon) • Multi-GPUandmulti-CPUsupportforpowerfulworkstationsandNVidiaDGXsupercomputers • MachineLearningmodelinterpretationmodulewithglobalandlocalmodelinterpretation • AutomaticVisualizationmodule • Multi-usersupport • Backwardcompatibility Problemtypessupported: • Regression(continuoustargetvariable,forage,income,houseprice,lossprediction,time-seriesforecasting) • Binaryclassification(0/1or“N”/”Y”,forfraudprediction,churnprediction,failureprediction,etc.) • Multinomialclassification(0/1/2/3or“A”/”B”/”C”/”D”forcategoricaltargetvariables,forpredictionofmem- bershiptype,next-action,productrecommendation,etc.) Datatypessupported: • Tabularstructureddata,rowsareobservations,columnsarefields/features/variables • i.i.d. (identicallyandindependentlydistributed)data • Numeric,categoricalandtextualfields • Missingvaluesareallowed 3 UsingDriverlessAI,Release1.4.0 • Time-seriesdatawithasingletime-series(timeflowsacrosstheentiredataset,notperblockofdata) • Groupedtime-series(e.g.,salesperstoreperdepartmentperweek,allinonefile,with3columnsforstore,dept, week) • Time-series problems with a gap between training and testing (i.e., the time to deploy), and a known forecast horizon(afterwhichmodelhastoberetrained) DatatypesNOT supported: • Image/video/audio Datasourcessupported: • Localfilesystem • Networkfilesystem • S3(Amazon) • Googlecloudstorage • Googlebigquery • Hadoop(HDFS) • Minio • Snowflake Fileformatssupported: • Plaintextformatsofcolumnardata(.csv,.tsv,.txt) • Compressedarchives(.zip,.gz,.bz2) • Excel • Parquet • Feather • Pythondatatable 4 Chapter1. H2ODriverlessAIReleaseNotes
Description: