Evading quota limits when resources are available
DanT has an interesting answer to the following list question:
Hi, There is any way to have some kind of "dynamic" quota in the sense that, if someone has use all his quota's resource but there is still available free resource, he can use it but until someone need it?
Dan's reply covers an interesting configuration in which resource quotas constrain access to the main SGE queues while an unmanaged subordinate queue exists to catch jobs from quota-limited users should surplus resources still be available:
You'd have to do it with queue configuration and RQS. You could change the queue_sort_method to seqno and set up a queue with a high sequence number that is subordinated to all the other queues. The RQS would limit usage of the main queues, but not the subordinated queue. When a job comes in, it tries to go to one of the main queues. If it can't, then it ends up in the subordinated queue. If another job comes in and lands in a main queue on the same machine as a job in the subordinated queue, the job in the subordinated queue will be suspended.
Selectively blocking user access to queues or hosts
In this mailing list thread, both Reuti and Dan provide answers to a common question:
"...I need to find a way to limit [...] users to a specific user list on a specific host, while the [other] queue is available to everyone."
There are 2 methods for this in versions of Grid Engine 6.1 and later, the first one makes use of user groups and Access Control Lists ("ACL") applied to particular queue configuration(s):
... user_lists NONE,[@quad64gb=mylist1 mylist2], [@quad128gb=mylist1 mylist2] ...In human readable terms, the above ACL access syntax applied to the queue "user_lists" parameter will result in the following behavior: "Only members in mylist1/2 can run on the machines which are in @quad64gb and @quad128gb."
Make sense?
The 2nd method makes use of the powerful SGE Resource Quota framework. Using the RQS method one could construct a rule such as this:
{
name global
description prohibit some users from large memory hosts
enabled TRUE
limit users !@mylist1,!@mylist2
queues all.q
hosts @quad64gb,@quad128gb
to slots=0
}
In human readable terms we are defining a new resource quota rule that grants zero slots to any user requesting access to hosts defined in the @quad64gb and @quad128gb host lists who IS NOT named in the user group list @mylist1 or @mylist2. The end result of limiting users who match this limit rule to "slots=0" is that they will be prohibited from running on the large memory nodes until their user names are added to the ACL lists.
Keeping single slot jobs off of certain nodes
In this thread, Paul asks:
"I'm looking at finding a way to either limit single-slot jobs, or requiring all jobs in a given queue to be running in a pe. Specifically, I have some SMP nodes, that I'd rather not waste on single thread, and also keep the single thread jobs off of the infiniband connected nodes. I have gigE small cpu count nodes for this task."
Dan replied with another example of clever use of the new SGE Resource Quota syntax within SGE 6.1 and later:
You can use resource quota sets to restrict non-PE jobs to certain queues hosts.limit pes !* hosts @smp to slots=0
Slick!
Be careful with your RQS syntax
I believe pretty strongly that the 6.1 release is going to be a big deal, primarily because the new Resource Quota support in Grid Engine 6.1 is going to take a solid whack at a whole bunch problems and issues that SGE admins have been bothered by for years.
The nice clean resource quota syntax is going to replace entire bodies of clever hacks and workarounds that the developers, users and community have created. Many of the kludgy-yet-clever hacks involving the intentional (ahem...) misuse of custom parallel environment objects and dedicated queues will simply become obsolete.
RQS is going to take some getting used to, however as this mailing list discussion thread makes clear.
Now that people outside of the developer community are putting RQS through its pace it is becoming easier to spot areas where documentation can be improved and/or fixed. The discussion referenced above provided a great example, one that I'll turn into a little quiz for the reader:
What is the difference between the following 2 resource quota sets?
{
name peruser_limit
enabled TRUE
limit users * to slots = 10
}
{
name peruser_limit
enabled TRUE
limit users {*} to slots = 10
}
The answer after the jump ...
Well this was not much of a quiz since the answer is contained in the mailing list thread but I'll summarize it here:
users * -- Means "apply limit globally across ALL users"
users {*} -- Means "apply limit INDIVIDUALLY to EACH user"
Something to think about as you start writing rule sets for 6.1!
Enhanced dynamic limits in the new resource quota system
A bit of interesting news via the GE issues mailing list recently concerning the newly announced "Resource Quota" feature that will be part of the upcoming Grid Engine 6.1 release. The specification document for the new Resource Quota facility makes specific mention of "dynamical limits". The specific example of a given "dynamical limit" is the following:
limit hosts {@linux_hosts} to slots=$num_proc*5
... that limit would change from machine to machine depending on the number of CPUs resident in each machine. Useful.
Roland filed (and then fixed!) a new issue asking for this functionality to be extended to allow the following types of usage:
'slots=$num_proc*2-1' or slots=$num_proc*2+2'
The new enhancements extend the operators that can be used for defining these new limits. This enhancement also applies to load_formula syntax as well due to a shared codebase. The new syntax definition looks like this:
{w1|$complex1[*w1]}[{+|-}{w2|$complex2[*w2]}[{+|-}...]]

XML Feeds