Throttling execution of array job tasks

Posted by chris Wed, 02 Dec 2009 16:27:11 GMT

I've long found that SGE users are perfectly willing to do the right thing when it comes to sharing a computing infrastructure among multiple competing workgroups. What has often been lacking have been SGE features accessible to non-admin users that empower users to have more control over how their jobs run and are prioritized.

A very common example of this is a situation where a user will say:

"I need to submit 100,000 jobs but I don't want to totally take over the cluster and upset my coworkers - can I limit how many of my jobs run at any given time so that resources are left free for others?"

As a Grid Engine consultant, training and administrator I've personally felt that working with people wanting to be "good citizens" has sometimes been a challenge. Most of the common SGE methods for limiting or controlling job execution and policies are available only to users with SGE Administrator privileges. As nice as it is to handle one-off cluster resource allocation situations these sorts of requests can consume lots of admin time and can occasionally cause problems if people make SGE quota or scheduler changes without tight coordination and planning.

Well, it was undocumented in the initial release but ever since SGE version 6.2u4 people have had the ability to limit concurrent execution of tasks within array jobs that they submit. The syntax looks like:
$ qsub -t 1-20 -tc 5 test.sh ... where the "-tc" argument is new. The example above shows a 20-task array job being submitted with a request to run no more than 5 at any one time.

This feature is now documented as of SGE 6.2u5:

-tc max_running_tasks

allow users to limit concurrent array job task execution.
Parameter max_running_tasks specifies maximum number of simultaneously
running tasks. For example we have running SGE with 10 free slots. We
call qsub -t 1-100 -tc 2 jobscript. Then only 2 tasks will be
scheduled to run even when 8 slots are free.

This is a very welcome new feature addition to Grid Engine, I suspect it will be popular and well received by the user community.

Grid Engine 6.2 Update 3 is out

Posted by chris Tue, 23 Jun 2009 14:07:30 GMT

Important Note: Sun has changed the license terms for this release. The full release from Sun.com can only be used for 90 days for free. The courtesy binaries are still free for all to use but the distribution will not include the Amazon EC2 cloud adaptor or the excellent "sgeinspect" tool. Source code for both of these components is available under the SISSL license so theoretically community members can build versions for themselves.

The full release announcement is here:
http://gridengine.sunsource.net/news/SGE62u3-announce.html

For me, the most important new features are the SGEInspect tool (screenshots of which you can view online at http://www.flickr.com/photos/chrisdag/sets/72157617805352910/ and the exclusive host scheduling feature which now removes the need for PE-based 'hacks' to achieve the same goal.

The license change is interesting, I need to see how hard it is to build sgeinspect from source code, it really is a powerful new tool. It's a shame that this won't be part of the free distribution but then again I want Sun to make product and support revenue off of SGE so I can see the point.

SGE 6.2u3 beta release

Posted by chris Fri, 24 Apr 2009 10:22:07 GMT

Wow, some great new features added to this beta, see below for the full details or just read the full announcement. The beta will last through 5/20/2009 so get your feedback in quickly. The team is planning for an official production release in June 2009.


Sun Grid Engine Inspect
SGE Inspect, a new Monitoring and Configuration Console, allows to monitor Sun Grid Engine clusters and to monitor and configure the Service Domain Manager (SDM).

Service Domain Manager (SDM) - Cloud Adapter and Initial Power Saving Support
The new SDM service adapter interface adds support to manage external virtual resources. The implementation provides an interface to manage Amazon's EC2 AMIs. Those AMIs could include SGE execution hosts which can be added to a local Sun Grid Engine cluster.

The enhanced spare pool implementation can be configured to power cycle machines when being added or removed from the spare pool.

"Exclusive Job" Scheduling Enhancement
Jobs and all parallel tasks of a job can request exclusive scheduling on a host. Jobs requiring resources only available for one job per host or jobs having access patterns to hardware resources like CPUs or memory, which make it useful to run only one job per machine can request non shared access.

Complete Microsoft Windows Vista Support for GUI Jobs
A native Windows job can now open a GUI on a Windows Vista and Windows Server 2008 desktop (SGE 6.2u2 added support for those Windows operating systems)

SGE 6.2 Screencasts

Posted by chris Mon, 30 Mar 2009 05:13:09 GMT

Lubomir has a few screencasts up on his blog. The first one covers the process of preparing things for the new Java-based GUI installer and the second covers the GUI installation process itself.