Next Generation High Throughput Computing – Nitro Version 1.1

We’ve been busy this Summer giving Nitro it’s first upgrade and doing a lot of performance testing on various systems. We’ve added some features, and also found some significant performance improvements. Here are some highlights from our Summer of development:

Multi-core Tasks

Nitro was originally conceived to launch single processor tasks as quickly as possible. But a lot of users need to be able to handle short running tasks needing multiple processors.  In version 1.0 we allowed for this by using thread scaling, as long as the workload was homonogeneous, but sometimes it isn’t. Nitro 1.1 now includes the ability to specify the number of cores a task requires. So now you can have a task file that looks like this:

cores=4 myprog --calculate-some-import-things-with-4-cores
cores=2 myprog --calculate-some-import-things-with-2-cores
cores=8 myprog --calculate-some-import-things-with-8-cores

Nitro uses a first fit algorithm to place the tasks on available cores and sets the tasks affinity to those cores when launched. The bonus is that even though there is additional bookkeeping that Nitro has to do to track available cores, verify that the task will fit, and set the affinity, we didn’t give up anything in speed. Since CPU resources are now tracked and tasks are locked to an assigned set of cores, tasks generally run a bit faster than when they were allowed to task switch between cores.

Resource Management

Many clusters are configured such that nodes are completely allocated to perform only the work that is assigned to them by the resource manager. There are, however, configurations which have background tasks running on many or all of the nodes in the cluster such as parallel, distributed file systems. These background tasks can consume resources, and put the system into a sluggish state if too much work is being performed by the Nitro workload.  Nitro now has two modes to help mitigate the potential overload scenarios: memory threshold and cpu load threshold. The memory threshold works by specifying a minimum amount of available physical memory that must be available on a node before nitro will run another task. This can be useful in balancing tasks that have unknown memory requirements, as well as accounting for background tasks. This is the most responsive method to workload management since the available memory is evaluated before any additional tasks launch.

You can also throttle Nitro by setting a CPU load threshold. Since many clusters don’t have homogeneous nodes, Nitro uses a percentage of full load calculation to determine when to throttle.  If you have 16 processors in a node, 100% utilization will equate to a load factor of around 16 (since load is the number of processes contending for CPU cores). Since the system uses load averages, this measurement lags behind the instantaneous load, but it is still a useful throttling mechanism.

Tasks can be defined such that they specify the amount of memory they require. Nitro reads the total available physical memory on startup and uses this number as a memory budget. Tasks are allocated on a first fit basis and Nitro will only run as many tasks concurrently as will fit in its memory budget. Here’s an example of using the task memory constraints in the task definition file:

memory=4GB myprog --calculate-some-import-things-using-4GB
memory=8GB myprog --calculate-some-import-things-using-8GB

Speed Improvements

Sometimes shells provide more than you really need to get your task running. The shell performs functions such as reading system configuration files, interpreting environment variables, parsing command line syntax (in case you have multiple instructions on a line), before finally launching your binary. If you don’t need a lot of that functionality, or none at all, Nitro provides you with options to speed up your tasks launch. Most systems use /bin/bash as a default shell, and it carries with it considerable weight. A first option might be to check out your /bin/sh file and see what it is mapped to. On my system it’s mapped to /bin/dash, and dash does perform better than bash (about 20% faster).  Next, you might consider installing the Korn shell.  In my tests, using ksh with Nitro was 33% faster than using bash. If you don’t need any command line interpretation at all, Nitro lets you run your command without a shell. You still get command line options with your binary, but you won’t get environment variables translated. If the shoe fits, as they say, wear it! In my tests launching the binary directly resulted in launch times 92% faster than bash. You can set the default shell system wide, or specify an override in the task definition file. Here’s an example of a task file with various shell options:

shell=/bin/bash myprog --run-pretty-fast $GET_SOME_DATA
shell=/bin/ksh myprog --run-really-fast $GET_SOME_DATA
shell=no-shell myprog --run-super-fast --get-some-data-from-here /usr/share/data_location