htcondor admin tutorial PDF

122 Pages·2015·3.63 MB·English

Checking for file health...

Save to my drive

Quick download

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview htcondor admin tutorial

HTCONDOR ADMIN TUTORIAL Greg Thain Center for High Throughput Computing University of Wisconsin – Madison [email protected] © 2015 Internet2 Overview •  What makes HTCondor different? •  Nitty gritty of HTCondor configuration? [ 2 ] HTCondor differences and implications •  Goal: Run jobs on as many machines as possible* •  -> implies heterogeneity: •  Generational: different speeds, memory •  OS Distribution and type (Linux/Windows) •  HTCondor versions •  Installed software base // installed datasets •  Geography: building, campus, •  Temporal [ 3 ] HTCondor differences and implications •  Goal: Run jobs on as many machines as possible* •  -> implies scalability and reliability: •  Worker machines can come and go And there can be a lot of them •  Horizontal scalability Many independent submit nodes •  No single point of failure •  HTCondor job is “like money in the bank” [ 4 ] Requires Networking •  Not just fibers and packets and routers •  Community building: – Breaking out of silos •  Asymmetric relationships [ 5 ] HTCondor history •  Cycle scavenging on desktops – Implies temporal disruptions – Delegating ownerships temporarily •  Not just desktop scavenging anymore – But same Model applies at different scales.. [ 6 ] The Lehigh Story •  Professor needed more compute power •  Access to a small cluster not sufficient •  The power of one lunch… •  Computing demand outstrips capacity [ 7 ] Condominium cluster •  Several groups pool money for big buy: – Chemistry, physics, biology each buy 500 cores – When sufficient demand, each get 500 cores – When less demand, can get more than 500 •  Same model as delegated ownership [ 8 ] HPC Backfil l •  HPC / MPI implies low utilization – HTCondor can backfill HPC Cluster – Same model as delegated ownership •  HPC offload – Many HPC workloads really HTC, clogging up – Make HPC Admins happy [ 9 ] Cloud annexation •  EC2 “spot instances” – Delegated ownership yet again •  Can build pool in cloud (Cycle computing) •  Or can expand into the cloud [ 10 ]

Description:

What does this machine hate my job? -better-analyse: . Mem ActvtyTime [email protected] LINUX X86_64 Unclaimed Idle 0.110 1024 0+00:45:04.

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.