Mallob

Decentralized job scheduling platform for HPC and cloud environments, featuring an award-winning distributed engine for automated reasoning (SAT solving)

1
contributor

Cite this software

What Mallob can do for you

The platform Mallob (Malleable Load Balancer, or Massively Parallel Logic Backend) allows to fairly schedule and process jobs with unknown execution time in a (massively) parallel and distributed system on demand. Mallob's flexible and decentralized approach to online job scheduling results in scheduling latencies in the range of milliseconds, near-optimal system utilization, and high resource efficiency.

In particular, we have explored with great success the NP-complete application of propositional satisfiability (SAT) solving, where a Boolean formula is processed and a satisfying assignment to its variables (or an attestation of unsatisfiability) is returned. Due to its astounding applicability to a wide range of academic as well as industrial disciplines, our distributed general-purpose SAT solving engine has received a large amount of attention, four gold medals of the International SAT Competition's Cloud Track in a row, and Amazon's proposition that our system is, "by a wide margin, the most powerful SAT solver on the planet" (Byron Cook, Amazon Science blog post).

Mallob is liberally (LGPL) licensed and openly developed on Github, where detailed instructions for building, testing, and running Mallob can be found.

Background

Despite SAT being a notoriously difficult problem, practically efficient SAT solving approaches have empowered a wide range of real-world applications for SAT such as software verification, circuit design, cryptography, automated planning, explainable AI, and theorem proving.
With respect to modern computing paradigms such as cloud computing and high-performance computing (HPC), the limited scalability of established parallel and distributed SAT solvers has become a pressing issue.
Moreover, since processing times of SAT jobs are unknown in advance, conventional HPC scheduling approaches applied to such jobs can lead to prohibitively large scheduling latencies and to suboptimal utilization of computational resources. Instead, we suggest to make use of so-called malleable scheduling. A parallel computing task is called malleable if the amount of computational resources allotted (i.e., the number of cores or machines) can be adjusted during its execution. Malleability is a powerful paradigm in the field of job scheduling and load balancing as it allows to schedule incoming jobs rapidly and to continuously rebalance the present tasks according to their priority and other properties.

We believe that a cloud service which combines a scalable SAT solving engine with malleable job scheduling can significantly improve many SAT-related academic and industrial workflows in terms of productivity and (resource-)efficiency. Moreover, our decentralized scheduling approach is applicable to further applications beyond SAT where processing times are unknown in advance and a modest amount of data is transferred in between the workers.

System Overview

Mallob is a C++ application designed for parallel and distributed systems between 16 and 10000 cores and allows to resolve propositional problems in a (massively) parallel manner. Our award-winning SAT solving engine scales up to three thousand cores using a selection of well-established sequential SAT solvers and a careful exchange of information among them.

Each distributed SAT solving task is malleable, allowing users to submit formulae to Mallob at will with scheduling latencies in the range of milliseconds. Computational resources are allotted proportional to each job's priority and also respecting each job's (maximum) demand of resources. Our scheduling approach is fully decentralized and uses a small part of each worker's CPU time (< 5%) to perform scheduling negotiations. As such, Mallob achieves optimal system utilization almost always, i.e., either all available processes are utilized or all resource demands are fully met. We achieve this feat by arranging each active job as a binary tree of workers and growing or shrinking each job tree dynamically based on the current system state. We describe these techniques in depth in our related publications.

Keywords
Programming languages
  • C++ 93%
  • Python 3%
  • Shell 3%
  • C 1%
License
  • LGPL-3.0-or-later
</>Source code

Contributors

DS
Dominik Schreiber
Developer
Karlsruher Institut für Technologie