Simba Spark ODBC Driver with SQL Connector Installation and Configuration Guide SimbaTechnologies Inc. Version 1.2.5 August4,2017 Installationand ConfigurationGuide Copyright© 2017 Simba Technologies Inc.AllRights Reserved. Information in thisdocumentissubjectto change withoutnotice.Companies,names and data used in examplesherein are fictitiousunlessotherwise noted.No partofthis publication,orthe software itdescribes,maybe reproduced,transmitted,transcribed, stored in a retrieval system,decompiled,disassembled,reverse-engineered,or translated into anylanguage in anyformbyanymeansforanypurpose withoutthe expresswritten permission ofSimba TechnologiesInc. Trademarks Simba,the Simba logo,SimbaEngine,and Simba Technologiesare registered trademarksofSimba TechnologiesInc.in Canada,United Statesand/orother countries.All othertrademarksand/orservicemarksare the propertyoftheirrespective owners. ContactUs Simba TechnologiesInc. 938 West8th Avenue Vancouver,BC Canada V5Z 1E5 Tel:+1 (604)633-0008 Fax:+1 (604)633-0004 www.simba.com www.simba.com 2 Installationand ConfigurationGuide About This Guide Purpose The Simba SparkODBC Driverwith SQL ConnectorInstallation and Configuration Guide explainshow to install and configure the Simba SparkODBC Driverwith SQL Connector.The guide also providesdetailsrelated to featuresofthe driver. Audience The guide isintended forend usersofthe Simba SparkODBC Driver,aswell as administratorsand developersintegrating the driver. Knowledge Prerequisites To use the Simba SparkODBC Driver,the following knowledge ishelpful: Familiaritywith the platformon which you are using the Simba SparkODBC l Driver Abilityto use the data source to which the Simba SparkODBC Driveris l connecting An understanding ofthe role ofODBC technologiesand drivermanagersin l connecting to a data source Experience creating and configuring ODBC connections l Exposure to SQL l Document Conventions Italicsare used when referring to bookand documenttitles. Bold isused in proceduresforgraphical userinterface elementsthata userclicksand textthata usertypes. Monospace font indicatescommands,source code,orcontentsoftextfiles. Note: A textboxwith a pencil icon indicatesa shortnote appended to a paragraph. www.simba.com 3 Installationand ConfigurationGuide Important: A textboxwith an exclamation markindicatesan importantcommentrelated to the preceding paragraph. www.simba.com 4 Installationand ConfigurationGuide Table of Contents AbouttheSimbaSparkODBCDriver 7 WindowsDriver 8 WindowsSystemRequirements 8 InstallingtheDriveronWindows 8 CreatingaDataSourceNameonWindows 9 ConfiguringaDSN-lessConnectiononWindows 11 ConfiguringAuthenticationonWindows 13 ConfiguringAdvancedOptionsonWindows 18 ConfiguringHTTPOptionsonWindows 19 ConfiguringSSLVerificationonWindows 20 ConfiguringServer-SidePropertiesonWindows 22 ConfiguringLoggingOptionsonWindows 23 ConfiguringKerberosAuthenticationforWindows 25 VerifyingtheDriverVersionNumberonWindows 29 macOSDriver 30 macOSSystemRequirements 30 InstallingtheDriveronmacOS 30 VerifyingtheDriverVersionNumberonmacOS 31 LinuxDriver 32 LinuxSystemRequirements 32 InstallingtheDriverUsingtheRPMFile 32 InstallingtheDriverUsingtheTarballPackage 34 VerifyingtheDriverVersionNumberonLinux 35 AIXDriver 36 AIXSystemRequirements 36 InstallingtheDriveronAIX 36 VerifyingtheDriverVersionNumberonAIX 37 SolarisDriver 38 SolarisSystemRequirements 38 InstallingtheDriveronSolaris 38 VerifyingtheDriverVersionNumberonSolaris 39 ConfiguringtheODBC DriverManageronNon-WindowsMachines 40 SpecifyingODBC DriverManagersonNon-WindowsMachines 40 www.simba.com 5 Installationand ConfigurationGuide SpecifyingtheLocationsoftheDriverConfigurationFiles 41 ConfiguringODBCConnectionsonaNon-WindowsMachine 43 CreatingaDataSourceNameonaNon-WindowsMachine 43 ConfiguringaDSN-lessConnectiononaNon-WindowsMachine 46 ConfiguringAuthenticationonaNon-WindowsMachine 48 ConfiguringSSLVerificationonaNon-WindowsMachine 51 ConfiguringServer-SidePropertiesonaNon-WindowsMachine 52 TestingtheConnectiononaNon-WindowsMachine 53 ConfiguringLoggingOptionsonaNon-WindowsMachine 55 AuthenticationMechanisms 57 SharkServer 57 UsingaConnectionString 59 DSNConnectionStringExample 59 DSN-lessConnectionStringExamples 59 Features 63 SQLConnectorforHiveQL 63 DataTypes 63 CatalogandSchemaSupport 64 spark_systemTable 65 Server-SideProperties 65 GetTablesWithQuery 65 ActiveDirectory 66 Write-back 66 SecurityandAuthentication 66 DriverConfigurationOptions 68 ConfigurationOptionsAppearingintheUserInterface 68 ConfigurationOptionsHavingOnlyKeyNames 90 Third-PartyTrademarks 94 Third-PartyLicenses 95 www.simba.com 6 Installationand ConfigurationGuide About theSimbaSparkODBCDriver About the Simba Spark ODBC Driver The Simba SparkODBC Driverisused fordirectSQL and HiveQL accessto Apache Hadoop /Sparkdistributions,enabling BusinessIntelligence (BI),analytics,and reporting on Hadoop-based data.The driverefficientlytransformsan application’sSQL queryinto the equivalentformin HiveQL,which isa subsetofSQL-92.Ifan application isSpark-aware,then the driverisconfigurable to passthe querythrough to the database forprocessing.The driverinterrogatesSparkto obtain schema information to presentto a SQL-based application.Queries,including joins,are translated fromSQL to HiveQL.Formore information aboutthe differencesbetween HiveQL and SQL,see SQL ConnectorforHiveQL on page 63. The Simba SparkODBC Drivercomplieswith the ODBC 3.80 data standard and adds importantfunctionalitysuch asUnicode and 32-and 64-bitsupportforhigh- performance computing environments. ODBC isone ofthe mostestablished and widelysupported APIsforconnecting to and working with databases.Atthe heartofthe technologyisthe ODBC driver,which connectsan application to the database.Formore information aboutODBC,see the Data AccessStandardsGlossary: http://www.simba.com/resources/data-access- standards-library.Forcomplete information aboutthe ODBC specification,see the ODBC APIReference:http://msdn.microsoft.com/en- us/library/windows/desktop/ms714562(v=vs.85).aspx. The Simba SparkODBC Driverisavailable forMicrosoft® Windows®,Linux,Solaris, AIX,and macOS platforms. The Installation and Configuration Guide issuitable foruserswho are looking to accessdata residing within Hadoop fromtheirdesktop environment.Application developersmightalso find the information helpful.Referto yourapplication fordetails on connecting via ODBC. Note: Forbasicconfiguration instructionsthatallow you to quicklysetup the Windows driverso thatyou can evaluate and use it,see the Simba ODBC DriversQuick StartGuide forWindows.The QuickStartGuide also explainshow to use the driverin variousapplications. www.simba.com 7 Installationand ConfigurationGuide WindowsDriver Windows Driver Windows System Requirements The Simba SparkODBC DriversupportsApache Sparkversions0.8 through 2.2. Install the driveron clientmachineswhere the application isinstalled.Each machine thatyou install the driveron mustmeetthe following minimumsystemrequirements: One ofthe following operating systems: l Windows7,8.1,or10 l WindowsServer2008 orlater l 100 MB ofavailable diskspace l Visual C++Redistributable forVisual Studio 2013 installed (with the same l bitnessasthe driverthatyou are installing). You can download the installation packagesathttps://www.microsoft.com/en- ca/download/details.aspx?id=40784. To install the driver,you musthave Administratorprivilegeson the machine. Installing the Driver on Windows On 64-bitWindowsoperating systems,you can execute both 32-and 64-bit applications.However,64-bitapplicationsmustuse 64-bitdrivers,and 32-bit applicationsmustuse 32-bitdrivers.Make sure thatyou use the version ofthe driver thatmatchesthe bitnessofthe clientapplication: SimbaSparkODBC32.msi for32-bitapplications l SimbaSparkODBC64.msi for64-bitapplications l You can install both versionsofthe driveron the same machine. To installthe Simba Spark ODBC Driveron Windows: 1. Depending on the bitnessofyourclientapplication,double-clickto run SimbaSparkODBC32.msiorSimbaSparkODBC64.msi. 2. ClickNext. 3. Selectthe checkboxto acceptthe termsofthe License Agreementifyou agree, and then clickNext. 4. To change the installation location,clickChange,then browse to the desired folder,and then clickOK.To acceptthe installation location,clickNext. 5. ClickInstall. www.simba.com 8 Installationand ConfigurationGuide WindowsDriver 6. When the installation completes,clickFinish. 7. Ifyou received a license file through email,then copythe license file into the \lib subfolderofthe installation folderyou selected above.You musthave Administratorprivilegeswhen changing the contentsofthisfolder. Creating a Data Source Name on Windows Typically,afterinstalling the Simba SparkODBC Driver,you need to create a Data Source Name (DSN). Alternatively,forinformation aboutDSN-lessconnections,see Configuring a DSN-less Connection on Windowson page 11. To create a Data Source Name on Windows: 1. Open the ODBC Administrator: Ifyou are using Windows7 orearlier,clickStart > AllPrograms l > Simba Spark ODBC Driver1.2 >ODBC Administrator. Or,ifyou are using Windows8 orlater,on the Startscreen,type ODBC l administrator,and then clickthe ODBC Administratorsearch result. Note: Make sure to selectthe ODBC Data Source Administratorthathasthe same bitnessasthe clientapplication thatyou are using to connectto Spark. 2. In the ODBC Data Source Administrator,clickthe Drivers tab,and then scroll down asneeded to confirmthatthe Simba SparkODBC Driverappearsin the alphabetical listofODBC driversthatare installed on yoursystem. 3. Choose one: To create a DSN thatonlythe usercurrentlylogged into Windowscan use, l clickthe UserDSN tab. Or,to create a DSN thatall userswho log into Windowscan use,clickthe l SystemDSN tab. Note: Itisrecommended thatyou create a SystemDSN instead ofa UserDSN. Some applicationsload the data using a differentuseraccount,and might notbe able to detectUserDSNsthatare created underanotheruser account. 4. ClickAdd. www.simba.com 9 Installationand ConfigurationGuide WindowsDriver 5. In the Create New Data Source dialog box,selectSimba Spark ODBC Driver and then clickFinish.The Simba SparkODBC DriverDSN Setup dialog box opens. 6. In the Data Source Name field,type a name foryourDSN. 7. Optionally,in the Description field,type relevantdetailsaboutthe DSN. 8. In the Spark ServerType list,selectthe appropriate servertype forthe version of Sparkthatyou are running: Ifyou are running Shark0.8.1 orearlier,then selectSharkServer l Ifyou are running Shark0.9,orSpark1.1 orlater,then select l SparkThriftServer 9. In the Hostfield,type the IP addressorhostname ofthe Sparkserver. 10. In the Portfield,type the numberofthe TCP portthatthe Sparkserverusesto listen forclientconnections. 11. In the Database field,type the name ofthe database schema to use when a schema isnotexplicitlyspecified in a query. Note: You can still issue querieson otherschemasbyexplicitlyspecifying the schema in the query.To inspectyourdatabasesand determine the appropriate schema to use,type the show databases command atthe Sparkcommand prompt. 12. In the Authentication area,configure authentication asneeded.Formore information,see Configuring Authentication on Windowson page 13. Note: SharkServerdoesnotsupportauthentication.Mostdefaultconfigurationsof SparkThriftServerrequire UserName authentication.To verifythe authentication mechanismthatyou need to use foryourconnection,check the configuration ofyourHadoop /Sparkdistribution.Formore information, see Authentication Mechanismson page 57. 13. Optionally,ifthe operationsagainstSparkare to be done on behalfofa userthat isdifferentthan the authenticated userforthe connection,type the name ofthe userto be delegated in the Delegation UID field. Note: Thisoption isapplicable onlywhen connecting to a SparkThriftServer instance thatsupportsthisfeature. 14. In the ThriftTransportdrop-down list,selectthe transportprotocol to use in the Thriftlayer. www.simba.com 10

