HTCONDOR ADMIN TUTORIAL Greg Thain Center for High Throughput Computing University of Wisconsin – Madison [email protected] © 2015 Internet2 Overview • What makes HTCondor different? • Nitty gritty of HTCondor configuration? [ 2 ] HTCondor differences and implications • Goal: Run jobs on as many machines as possible* • -> implies heterogeneity: • Generational: different speeds, memory • OS Distribution and type (Linux/Windows) • HTCondor versions • Installed software base // installed datasets • Geography: building, campus, • Temporal [ 3 ] HTCondor differences and implications • Goal: Run jobs on as many machines as possible* • -> implies scalability and reliability: • Worker machines can come and go And there can be a lot of them • Horizontal scalability Many independent submit nodes • No single point of failure • HTCondor job is “like money in the bank” [ 4 ] Requires Networking • Not just fibers and packets and routers • Community building: – Breaking out of silos • Asymmetric relationships [ 5 ] HTCondor history • Cycle scavenging on desktops – Implies temporal disruptions – Delegating ownerships temporarily • Not just desktop scavenging anymore – But same Model applies at different scales.. [ 6 ] The Lehigh Story • Professor needed more compute power • Access to a small cluster not sufficient • The power of one lunch… • Computing demand outstrips capacity [ 7 ] Condominium cluster • Several groups pool money for big buy: – Chemistry, physics, biology each buy 500 cores – When sufficient demand, each get 500 cores – When less demand, can get more than 500 • Same model as delegated ownership [ 8 ] HPC Backfil l • HPC / MPI implies low utilization – HTCondor can backfill HPC Cluster – Same model as delegated ownership • HPC offload – Many HPC workloads really HTC, clogging up – Make HPC Admins happy [ 9 ] Cloud annexation • EC2 “spot instances” – Delegated ownership yet again • Can build pool in cloud (Cycle computing) • Or can expand into the cloud [ 10 ]
Description: