Table Of Content

Living with Big Data: Challenges and Opportunities Jeff Dean, Sanjay Ghemawat Google Joint work with many collaborators Friday, September 14, 2012 Computational Environment • Many datacenters around the world Friday, September 14, 2012 Zooming In... Friday, September 14, 2012 Zooming In... Friday, September 14, 2012 Decomposition into Services query Frontend Web Server Ad System Super root Spelling correction News Local Video Images Blogs Books Web Storage Scheduling Naming ... Friday, September 14, 2012 Communication Protocols • Example: – Request: query: “ethiopiaan restaurnts” – Response: list of (corrected query, score) results correction { query: “ethiopian restaurants” score: 0.97 } correction { query: “ethiopia restaurants” score: 0.02 } ... • Beneﬁts of structure: – easy to examine and evolve (add user_language to request) – language independent – teams can operate independently • We use Protocol Buffers for RPCs, storage, etc. – http://code.google.com/p/protobuf/ Friday, September 14, 2012 The Horrible Truth... Typical first year for a new cluster: ~1 network rewiring (rolling ~5% of machines down over 2-day span) ~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back) ~5 racks go wonky (40-80 machines see 50% packetloss) ~8 network maintenances (4 might cause ~30-minute random connectivity losses) ~12 router reloads (takes out DNS and external vips for a couple minutes) ~3 router failures (have to immediately pull traffic for an hour) ~dozens of minor 30-second blips for dns ~1000 individual machine failures ~thousands of hard drive failures slow disks, bad memory, misconfigured machines, flaky machines, etc. Long distance links: wild dogs, sharks, dead horses, drunken hunters, etc. Friday, September 14, 2012 The Horrible Truth... Typical first year for a new cluster: ~1 network rewiring (rolling ~5% of machines down over 2-day span) ~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back) ~5 racks go wonky (40-80 machines see 50% packetloss) ~8 network maintenances (4 might cause ~30-minute random connectivity losses) ~12 router reloads (takes out DNS and external vips for a couple minutes) ~3 router failures (have to immediately pull traffic for an hour) ~dozens of minor 30-second blips for dns ~1000 individual machine failures ~thousands of hard drive failures slow disks, bad memory, misconfigured machines, flaky machines, etc. Long distance links: wild dogs, sharks, dead horses, drunken hunters, etc. • Reliability/availability must come from software! Friday, September 14, 2012 Replication • Data loss – replicate the data on multiple disks/machines (GFS/Colossus) • Slow machines – replicate the computation (MapReduce) • Too much load – replicate for better throughput (nearly all of our services) • Bad latency – utilize replicas to improve latency – improved worldwide placement of data and services Friday, September 14, 2012 Shared Environment Linux Friday, September 14, 2012

Living with Big Data: Challenges and Opportunities PDF

132 Pages·2012·8.42 MB·English

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Living with Big Data: Challenges and Opportunities

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.