ebook img

Apache Kafka Guide PDF

175 Pages·2017·2.96 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Apache Kafka Guide

Apache Kafka Guide ImportantNotice ©2010-2021Cloudera,Inc.Allrightsreserved. Cloudera,theClouderalogo,andanyotherproductor servicenamesorsloganscontainedinthisdocumentaretrademarksofClouderaand itssuppliersorlicensors,andmaynotbecopied,imitatedorused,inwholeorinpart, withoutthepriorwrittenpermissionofClouderaortheapplicabletrademarkholder.If thisdocumentationincludescode,includingbutnotlimitedto,codeexamples,Cloudera makesthisavailabletoyouunderthetermsoftheApacheLicense,Version2.0,including anyrequirednotices.AcopyoftheApacheLicenseVersion2.0,includinganynotices, isincludedherein.AcopyoftheApacheLicenseVersion2.0canalsobefoundhere: https://opensource.org/licenses/Apache-2.0 HadoopandtheHadoopelephantlogoaretrademarksoftheApacheSoftware Foundation.Allothertrademarks,registeredtrademarks,productnamesandcompany namesorlogosmentionedinthisdocumentarethepropertyoftheirrespectiveowners. Referencetoanyproducts,services,processesorotherinformation,bytradename, trademark,manufacturer,supplierorotherwisedoesnotconstituteorimply endorsement,sponsorshiporrecommendationthereofbyus. Complyingwithallapplicablecopyrightlawsistheresponsibilityoftheuser.Without limitingtherightsundercopyright,nopartofthisdocumentmaybereproduced,stored inorintroducedintoaretrievalsystem,ortransmittedinanyformorbyanymeans (electronic,mechanical,photocopying,recording,orotherwise),orforanypurpose, withouttheexpresswrittenpermissionofCloudera. Clouderamayhavepatents,patentapplications,trademarks,copyrights,orother intellectualpropertyrightscoveringsubjectmatterinthisdocument.Exceptasexpressly providedinanywrittenlicenseagreementfromCloudera,thefurnishingofthisdocument doesnotgiveyouanylicensetothesepatents,trademarkscopyrights,orother intellectualproperty.ForinformationaboutpatentscoveringClouderaproducts,see http://tiny.cloudera.com/patents. Theinformationinthisdocumentissubjecttochangewithoutnotice.Clouderashall notbeliableforanydamagesresultingfromtechnicalerrorsoromissionswhichmay bepresentinthisdocument,orfromuseofthisdocument. Cloudera,Inc. 395PageMillRoad PaloAlto,CA94306 [email protected] US:1-888-789-1488 Intl:1-650-362-0488 www.cloudera.com ReleaseInformation Version: CDH6.0.x Date:August2,2021 Table of Contents Apache Kafka Guide.................................................................................................7 IdealPublish-SubscribeSystem............................................................................................................................7 Kafka Architecture................................................................................................................................................7 Topics.....................................................................................................................................................................................8 Brokers...................................................................................................................................................................................8 Records..................................................................................................................................................................................9 Partitions................................................................................................................................................................................9 RecordOrderandAssignment.............................................................................................................................................10 LogsandLogSegments........................................................................................................................................................11 KafkaBrokersandZooKeeper..............................................................................................................................................12 Kafka Setup............................................................................................................14 Hardware Requirements....................................................................................................................................14 Brokers.................................................................................................................................................................................14 ZooKeeper............................................................................................................................................................................14 Kafka Performance Considerations....................................................................................................................15 OperatingSystemRequirements........................................................................................................................15 SUSELinuxEnterpriseServer(SLES).....................................................................................................................................15 KernelLimits........................................................................................................................................................................15 Kafka in Cloudera Manager....................................................................................16 Kafka Clients..........................................................................................................17 CommandsforClientInteractions......................................................................................................................17 KafkaProducers..................................................................................................................................................18 Kafka Consumers................................................................................................................................................19 Subscribingtoatopic...........................................................................................................................................................19 GroupsandFetching............................................................................................................................................................20 ProtocolbetweenConsumerandBroker.............................................................................................................................20 Rebalancing Partitions.........................................................................................................................................................22 ConsumerConfigurationProperties.....................................................................................................................................23 Retries..................................................................................................................................................................................23 KafkaClientsandZooKeeper..............................................................................................................................23 Kafka Brokers.........................................................................................................25 Single Cluster Scenarios.....................................................................................................................................25 Leader Positions...................................................................................................................................................................25 In-SyncReplicas....................................................................................................................................................................26 Topic Configuration............................................................................................................................................26 Topic Creation......................................................................................................................................................................27 Topic Properties...................................................................................................................................................................27 Partition Management.......................................................................................................................................27 PartitionReassignment........................................................................................................................................................28 Adding Partitions.................................................................................................................................................................28 ChoosingtheNumberofPartitions......................................................................................................................................28 Controller.............................................................................................................................................................................28 Kafka Integration....................................................................................................30 Kafka Security.....................................................................................................................................................30 Client-BrokerSecuritywithTLS............................................................................................................................................30 UsingKafka’sInter-BrokerSecurity......................................................................................................................................33 Enabling Kerberos Authentication.......................................................................................................................................34 EnablingEncryptionatRest.................................................................................................................................................35 TopicAuthorizationwithKerberosandSentry.....................................................................................................................36 ManagingMultipleKafkaVersions.....................................................................................................................39 KafkaFeatureSupportinClouderaManagerandCDH........................................................................................................39 Client/BrokerCompatibilityAcrossKafkaVersions..............................................................................................................40 UpgradingyourKafkaCluster..............................................................................................................................................40 ManagingTopicsacrossMultipleKafkaClusters................................................................................................42 Consumer/ProducerCompatibility.......................................................................................................................................42 TopicDifferencesbetweenClusters......................................................................................................................................43 OptimizeMirrorMakerProducerLocation..........................................................................................................................43 DestinationClusterConfiguration........................................................................................................................................43 KerberosandMirrorMaker.................................................................................................................................................43 SettingupMirrorMakerinClouderaManager...................................................................................................................43 SettingupanEnd-to-EndDataStreamingPipeline............................................................................................44 Data Streaming Pipeline......................................................................................................................................................44 IngestUsingKafkawithApacheFlume................................................................................................................................44 UsingKafkawithApacheSparkStreamingforStreamProcessing......................................................................................51 Developing Kafka Clients....................................................................................................................................52 Simple Client Examples........................................................................................................................................................52 MovingKafkaClientstoProduction.....................................................................................................................................55 KafkaMetrics......................................................................................................................................................57 Metrics Categories...............................................................................................................................................................57 Viewing Metrics...................................................................................................................................................................57 BuildingClouderaManagerChartswithKafkaMetrics.......................................................................................................58 Kafka Administration..............................................................................................59 Kafka Administration Basics...............................................................................................................................59 Broker Log Management.....................................................................................................................................................59 Record Management...........................................................................................................................................................59 BrokerGarbageLogCollectionandLogRotation................................................................................................................60 AddingUsersasKafkaAdministrators.................................................................................................................................60 MigratingBrokersinaCluster............................................................................................................................60 UsingrsynctoCopyFilesfromOneBrokertoAnother........................................................................................................61 Setting User Limitsfor Kafka..............................................................................................................................61 Quotas................................................................................................................................................................61 Setting Quotas.....................................................................................................................................................................62 KafkaAdministrationUsingCommandLineTools..............................................................................................62 UnsupportedCommandLineTools.....................................................................................................................................62 NotesonKafkaCLIAdministration.....................................................................................................................................63 kafka-topics..........................................................................................................................................................................64 kafka-configs........................................................................................................................................................................64 kafka-console-consumer......................................................................................................................................................64 kafka-console-producer.......................................................................................................................................................65 kafka-consumer-groups.......................................................................................................................................................65 kafka-reassign-partitions.....................................................................................................................................................65 kafka-log-dirs.......................................................................................................................................................................69 kafka-*-perf-test..................................................................................................................................................................70 EnablingDEBUGorTRACEincommandlinescripts............................................................................................................71 Understandingthekafka-run-classBashScript.................................................................................................................71 JBOD...................................................................................................................................................................71 JBODSetupandMigration...................................................................................................................................................71 JBODOperationalProcedures..............................................................................................................................................74 Kafka Performance Tuning......................................................................................77 Tuning Brokers....................................................................................................................................................77 TuningProducers................................................................................................................................................77 Tuning Consumers..............................................................................................................................................78 Mirror Maker Performance................................................................................................................................78 KafkaTuning:HandlingLargeMessages.............................................................................................................78 Kafka Cluster Sizing............................................................................................................................................79 ClusterSizing-NetworkandDiskMessageThroughput.....................................................................................................79 ChoosingtheNumberofPartitionsforaTopic....................................................................................................................80 KafkaPerformanceBrokerConfiguration...........................................................................................................82 JVMandGarbageCollection................................................................................................................................................82 Networkand I/O Threads....................................................................................................................................................82 ISR Management.................................................................................................................................................................82 Log Cleaner..........................................................................................................................................................................83 KafkaPerformance:System-LevelBrokerTuning...............................................................................................83 FileDescriptorLimits............................................................................................................................................................83 Filesystems...........................................................................................................................................................................84 Virtual Memory Handling....................................................................................................................................................84 Networking Parameters.......................................................................................................................................................84 Configuring JMX Ephemeral Ports.......................................................................................................................................84 Kafka-ZooKeeperPerformanceTuning...............................................................................................................85 Kafka Reference.....................................................................................................86 Metrics Reference..............................................................................................................................................86 UsefulShellCommandReference....................................................................................................................161 Hardware Information.......................................................................................................................................................161 Disk Space..........................................................................................................................................................................161 I/OActivityandUtilization.................................................................................................................................................161 FileDescriptor Usage.........................................................................................................................................................162 NetworkPorts,States,andConnections............................................................................................................................162 Process Information...........................................................................................................................................................162 Kernel Configuration..........................................................................................................................................................162 Kafka Public APIs..................................................................................................163 Kafka Frequently Asked Questions.......................................................................164 Basics................................................................................................................................................................164 Use Cases.........................................................................................................................................................166 References........................................................................................................................................................172 Appendix: Apache License, Version 2.0.................................................................173 ApacheKafkaGuide Apache Kafka Guide ApacheKafkaisastreamingmessageplatform.Itisdesignedtobehighperformance,highlyavailable,andredundant. Examplesofapplicationsthatcanusesuchaplatforminclude: • InternetofThings.TVs,refrigerators,washingmachines,dryers,thermostats,andpersonalhealthmonitorscan allsendtelemetrydatabacktoaserverthroughtheInternet. • SensorNetworks.Areas(farms,amusementparks,forests)andcomplexdevices(engines)canbedesignedwith anarrayofsensorstotrackdataorcurrentstatus. • PositionalData.Deliverytrucksormassivelymultiplayeronlinegamescansendlocationdatatoacentralplatform. • OtherReal-TimeData.Satellitesandmedicalsensorscansendinformationtoacentralareaforprocessing. Ideal Publish-Subscribe System Theidealpublish-subscribesystemisstraight-forward:PublisherA’smessagesmustmaketheirwaytoSubscriberA, PublisherB’smessagesmustmaketheirwaytoSubscriberB,andsoon. Figure1:IdealPublish-SubscribeSystem Anidealsystemhasthebenefitof: • UnlimitedLookback.AnewSubscriberA1canreadPublisherA’sstreamatanypointintime. • MessageRetention.Nomessagesarelost. • UnlimitedStorage.Thepublish-subscribesystemhasunlimitedstorageofmessages. • NoDowntime.Thepublish-subscribesystemisneverdown. • UnlimitedScaling.Thepublish-subscribesystemcanhandleanynumberofpublishersand/orsubscriberswith constantmessagedeliverylatency. Nowlet'sseehowKafka’simplementationrelatestothisidealsystem. Kafka Architecture Asisthecasewithallreal-worldsystems,Kafka'sarchitecturedeviatesfromtheidealpublish-subscribesystem.Some ofthekeydifferencesare: • Messagingisimplementedontopofareplicated,distributedcommitlog. • Theclienthasmorefunctionalityand,therefore,moreresponsibility. • Messagingisoptimizedforbatchesinsteadofindividualmessages. • Messagesareretainedevenaftertheyareconsumed;theycanbeconsumedagain. Theresultsofthesedesigndecisionsare: • Extremehorizontalscalability • Veryhighthroughput • Highavailability • but,differentsemanticsandmessagedeliveryguarantees ApacheKafkaGuide|7 ApacheKafkaGuide Thenextfewsectionsprovideanoverviewofsomeofthemoreimportantparts,whilelatersectiondescribedesign specificsandoperationsingreaterdetail. Topics Intheidealsystempresentedabove,messagesfromonepublisherwouldsomehowfindtheirwaytoeachsubscriber. Kafkaimplementstheconceptofatopic.Atopicallowseasymatchingbetweenpublishersandsubscribers. Figure2:TopicsinaPublish-SubscribeSystem Atopicisaqueueofmessageswrittenbyoneormoreproducersandreadbyoneormoreconsumers.Atopicis identifiedbyitsname.ThisnameispartofaglobalnamespaceofthatKafkacluster. SpecifictoKafka: • Publishersarecalledproducers. • Subscribersarecalledconsumers. Aseachproducerorconsumerconnectstothepublish-subscribesystem,itcanreadfromorwritetoaspecifictopic. Brokers Kafkaisadistributedsystemthatimplementsthebasicfeaturesoftheidealpublish-subscribesystemdescribedabove. EachhostintheKafkaclusterrunsaservercalledabrokerthatstoresmessagessenttothetopicsandservesconsumer requests. 8|ApacheKafkaGuide ApacheKafkaGuide Figure3:BrokersinaPublish-SubscribeSystem Kafkaisdesignedtorunonmultiplehosts,withonebrokerperhost.Ifahostgoesoffline,Kafkadoesitsbesttoensure thattheotherhostscontinuerunning.Thissolvespartofthe“NoDowntime”and“UnlimitedScaling”goalsfromthe idealpublish-subscribesystem. KafkabrokersalltalktoZookeeperfordistributedcoordination,additionalhelpforthe"UnlimitedScaling"goalfrom theidealsystem. Topicsarereplicatedacrossbrokers.Replicationisanimportantpartof“NoDowntime,”“UnlimitedScaling,”and “MessageRetention”goals. Thereisonebrokerthatisresponsibleforcoordinatingthecluster.Thatbrokeriscalledthecontroller. Asmentionedearlier,anidealtopicbehavesasaqueueofmessages.Inreality,havingasinglequeuehasscalingissues. Kafkaimplementspartitionsforaddingrobustnesstotopics. Records InKafka,apublish-subscribemessageiscalledarecord.Arecordconsistsofakey/valuepairandmetadataincluding atimestamp.Thekeyisnotrequired,butcanbeusedtoidentifymessagesfromthesamedatasource.Kafkastores keysandvaluesasarraysofbytes.Itdoesnototherwisecareabouttheformat. Themetadataofeachrecordcanincludeheaders.Headersmaystoreapplication-specificmetadataaskey-valuepairs. Inthecontextoftheheader,keysarestringsandvaluesarebytearrays. Forspecificdetailsoftherecordformat,seetheRecorddefinitionintheApacheKafkadocumentation. Partitions Insteadofallrecordshandledbythesystembeingstoredinasinglelog,Kafkadividesrecordsintopartitions.Partitions canbethoughtofasasubsetofalltherecordsforatopic.Partitionshelpwiththeidealof“UnlimitedScaling”. Recordsinthesamepartitionarestoredinorderofarrival. Whenatopiciscreated,itisconfiguredwithtwoproperties: ApacheKafkaGuide|9 ApacheKafkaGuide partitioncount Thenumberofpartitionsthatrecordsforthistopicwillbespreadamong. replicationfactor Thenumberofcopiesofapartitionthataremaintainedtoensureconsumersalwayshaveaccesstothequeueof recordsforagiventopic. Eachtopichasoneleaderpartition.Ifthereplicationfactorisgreaterthanone,therewillbeadditionalfollower partitions.(Forthereplicationfactor=M,therewillbeM-1followerpartitions.) AnyKafkaclient(aproducerorconsumer)communicatesonlywiththeleaderpartitionfordata.Allotherpartitions existforredundancyandfailover.Followerpartitionsareresponsibleforcopyingnewrecordsfromtheirleader partitions.Ideally,thefollowerpartitionshaveanexactcopyofthecontentsoftheleader.Suchpartitionsarecalled in-syncreplicas(ISR). WithNbrokersandtopicreplicationfactorM,then • IfM<N,eachbrokerwillhaveasubsetofallthepartitions • IfM=N,eachbrokerwillhaveacompletecopyofthepartitions Inthefollowingillustration,thereareN=2brokersandM=2replicationfactor.Eachproducermaygeneraterecords thatareassignedacrossmultiplepartitions. Figure4:RecordsinaTopicareStoredinPartitions,PartitionsareReplicatedacrossBrokers Partitionsarethekeytokeepinggoodrecordthroughput.Choosingthecorrectnumberofpartitionsandpartition replicationsforatopic • Spreadsleaderpartitionsevenlyonbrokersthroughoutthecluster • Makespartitionswithinthesametopicareroughlythesamesize. • Balancestheloadonbrokers. RecordOrderandAssignment Bydefault,Kafkaassignsrecordstoapartitionsround-robin.Thereisnoguaranteethatrecordssenttomultiple partitionswillretaintheorderinwhichtheywereproduced.Withinasingleconsumer,yourprogramwillonlyhave recordorderingwithintherecordsbelongingtothesamepartition.Thistendstobesufficientformanyusecases,but doesaddsomecomplexitytothestreamprocessinglogic. Tip: Kafkaguaranteesthatrecordsinthesamepartitionwillbeinthesameorderinallreplicasofthatpartition. 10|ApacheKafkaGuide

Description:
Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are trademarks of Cloudera and.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.