STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA Processing billions of events every day Neha Narkhede Co-founder and Head of Engineering @ Stealth ¨ Startup Prior to this… ¨ Lead, Streams Infrastructure @ LinkedIn (Kafka & ¤ Samza) One of the initial authors of Apache Kafka, committer ¤ and PMC member Reach out at @nehanarkhede ¨ Agenda Real-time Data Integration ¨ Introduction to Logs & Apache Kafka ¨ Logs & Stream processing ¨ Apache Samza ¨ Stateful stream processing ¨ The Data Needs Pyramid Self actualization Automation Esteem Understanding Love/Belonging Data processing Safety Data collection Physiological Maslow's hierarchy of needs Data needs Agenda Real-time Data Integration ¨ Introduction to Logs & Apache Kafka ¨ Logs & Stream processing ¨ Apache Samza ¨ Stateful stream processing ¨ Increase in diversity of data Database data (users, products, orders etc) 1980+ Events (clicks, impressions, pageviews) Application logs (errors, service calls) 2000+ Application metrics (CPU usage, requests/sec) Siloed data feeds 2010+ IoT sensors Explosion in diversity of systems Live Systems ¨ Voldemort ¤ Espresso ¤ GraphDB ¤ Search ¤ Samza ¤ Batch ¨ Hadoop ¤ Teradata ¤ Data integration disaster EsEpsrpersessoso VVolodledmemorotrt OOraraclcele User Tracking Logs OpMeeratrtiicosnal Espresso Voldemort Oracle Data Hadoop Log Monitoring Warehous Social Rec. Search Security ... Email Search Graph Engine e Production Services Centralized service EsEpsrpersessoso VVolodledmemorotrt OOraraclcele User Tracking Logs OpMeeratrtiicosnal Espresso Voldemort Oracle Data Pipeline Data Rec Hadoop Log Monitorin Warehous Social Engine & Search Security ... Email Search g Graph e Life Production Services Agenda Real-time Data Integration ¨ Introduction to Logs & ¨ Apache Kafka Logs & Stream processing ¨ Apache Samza ¨ Stateful stream processing ¨
Description: