Table Of ContentKylo Documentation
Release 0.8.3
Think Big, a Teradata Company
Dec 04, 2017
About
1 Features 3
2 FAQ 5
3 Terminology 15
4 ReleaseNotes 19
5 Downloads 63
6 Overview 65
7 ReviewDependencies 67
8 PrepareInstallChecklist 71
9 CreateServiceAccounts 73
10 PrepareOfflineTAR 75
11 InstallKylo 77
12 InstallAdditionalComponents 79
13 EnableKerberos 87
14 AdditionalConfiguration 89
15 GrantHDFSPrivileges 91
16 StartServices 95
17 ImportTemplates 97
18 CreateSampleFeed 99
19 ValidateConfiguration 103
20 HDP2.5Kerberos/RangerClusterDeploymentGuide 107
i
21 Overview 121
22 AdjustMemory 123
23 ChangeJavaHome 125
24 LogFiles 127
25 YarnClusterModeConfiguration 129
26 KyloSparkProperties 131
27 PostgresMetastoreConfiguration 135
28 Overview 137
29 EncryptingConfigurationProperties 139
30 EnableKerberosforKylo 141
31 EnableKerberosforNiFi 145
32 EnableRangerAuthorization 151
33 EnableSentryAuthorization 155
34 KyloUIandSSL 159
35 NiFiandSSL 163
36 Authentication 169
37 KyloKerberosSPNEGO 175
38 AccessControl 179
39 SparkUserImpersonationConfiguration 185
40 SetupANiFiClusterinaKyloSandbox 187
41 ClusteringKylo 189
42 NiFi&KyloProvenance 195
43 NiFiProcessorGuide 197
44 KyloTemplatesGuide 203
45 KyloDatasourcesGuide 207
46 FeedLineageConfiguration 209
47 AccessingS3fromtheDataWrangler 215
48 S3StandardIngestTemplate 217
49 SUSEConfigurationChanges 225
50 ConfigurationProperties 227
ii
51 ValidatorTuning 231
52 ConfigureKylo&GlobalSearch 233
53 ServiceMonitorPlugins 239
54 JMSProviders 241
55 DatabaseUpgrades 245
56 IconsandIconColors 247
57 TwitterSentimentwithKafkaandSparkStreamingTutorial 249
58 ContributingtoKylo 257
59 DeveloperGettingStartedGuide 261
60 PluginAPIs 267
61 KyloRESTAPI 271
62 CleanupScripts 273
63 ClouderaDockerSandboxDeploymentGuide 275
64 HortonworksSandboxConfiguration 279
65 KerberosInstallationExample-Cloudera 281
66 KerberosInstallationExample-HDP2.4 289
67 OperationsGuide 297
68 Troubleshooting&Tips 325
69 BestPractices 339
iii
iv
KyloDocumentation,Release0.8.3
Kylowebsite:
Thedocumentationforthesiteisorganizedintoafewsections:
• About
• Installation
• InstallationExamples
• CommonConfiguration
• Security
• Howtoguides
• Developerguides
• Userguides
• Tipsandtricks
About 1
KyloDocumentation,Release0.8.3
2 About
1
CHAPTER
Features
Kylo is a full-featured Data Lake platform built on Apache Hadoop and Spark. Kylo provides a turn-key, business-
friendlyDataLakesolutionenablingdataingest,datapreparation,anddatadiscovery.
Features Description
License Apache2.0
MajorFeatures
DataIngest UserscaneasilyconfigurefeedsinguidedUI
DataPreparation Visualsqlbuilderanddatawrangling
Operationsdashboard Feedhealthandservicemonitoring
Globalsearch Lucenesearchagainstdataandmetadata
DataProcessing
DataIngest GuidedUIfordataingestintoHive(extensible)
DataExport ExportdatatoRDBMSorothertargets
DataWrangling Visuallywrangledataandbuild/schedulerecipes
PySpark,SparkJobs ExecuteSparkjobs
CustomPipelines Buildandtemplatizenewpipelines
FeedChaining Triggerfeedsbasedondependenciesandrules
IngestFeatures
Batch Batchprocessing
Streaming Streamingprocessing
Snapshot/IncrementalLoads Trackhighwaterusingdatefieldorreplacetarget
SchemaDiscovery Inferschemafromsourcefilesamples
DataValidation ConfigurefieldvalidationinUI
DataProfile Automaticallyprofilestatistics
DataCleanse/Standardization Easilyconfigurefieldstandardizationrules
CustomPartitioning ConfigureHivepartitioning
IngestSources
Continuedonnextpage
3
KyloDocumentation,Release0.8.3
Table1.1–continuedfrompreviouspage
FTP,SFTP SourcefromFTP,SFTP
Filesystem Pollfilesfromafilesystem
HDFS,S3 ExtractfilesfromHDFSandS3
RDBMS EfficientlyextractRDBMSdata
JMS,KAFKA Sourceeventsfromqueues
REST,HTTP Sourcedatafrommessages
IngestTargets
HDFS StoredatainHDFS
HIVE StoredatainHivetables
HBase StoredatainHBase
IngestFormats
ORC,Parquet,Avro,RCFile,Text Storedatainpopulartableformats
FormatCompression SpecifycompressionforORCandParquettypes
Extensiblesourceformats Abilitytodefinecustomschemaplug-inSerdes
Metadata
Tag/Glossary Addtagstofeedsforsearchability
BusinessMetadata(extendedproperties) Addbusiness-definedfieldstofeeds
RESTAPI PowerfulRESTAPIsforautomationandintegration
VisualLineage Exploreprocesslineage
ProfileHistory Viewhistoryofprofilestatistics
Search/Discover Lucenesyntaxsearchagainstdataandmetadata
OperationalMetadata Extensivemetadatacapture
Security
KeberosSupport SupportsKerberizedclusters
Obfuscation Configurefield-leveldataprotection
EncryptionatRest CompatiblewithHDFSencryptionfeatures
AccessControl(LDAP,KDC,AD,SSO) Flexiblesecurityoptions
DataProtection UIconfigurabledataprotectionpolicies
ApplicationGroups,Roles Adminconfiguredroles
Operations
Dashboard KPIs,alerts,performance,troubleshooting
Scheduler Timer,Cron-stylebasedonQuartzengine
SLAMonitoring Servicelevelagreementstiedtofeedperformance
Alerts Alertswithintegrationoptionstoenterprise
HealthMonitoring Quicklyidentifyfeedandservicehealthissues
PerformanceReporting Pivotonperformancestatistics
Scalability
EdgeClustering Scaleedgeresources
4 Chapter1. Features
Description:57 Twitter Sentiment with Kafka and Spark Streaming Tutorial. 241 . Kylo UI. AngularJS browser app with Google Material Design running in a Tomcat container .. feed.zip file available at /opt/kylo/setup/data/feeds/nifi-1.0. 9.4.2.