Solaris 9 Resource Manager software is
a new set of features designed to enhance resource control and accounting
on the SolarisTM Operating Environment
(Solaris OE). Detailed information about it can be found in the SolarisTM 9 Operating Environment documentation set.
Here are some related links:
FAQ: http://wwws.sun.com/software/solaris/faqs/resource_manager.html.
Overview: http://wwws.sun.com/software/solaris/ds/ds-srm/,
Solaris 9 Resource Manager manual: http://docs.sun.com/db/doc/806-4076/6jd6amqor?q=Solaris+9+resource+Manager&a=view.
Sun Blue Prints: http://www.sun.com/solutions/blueprints/0902/816-7753-10.pdf
Two papers on this subject were presented at Sun's SuperG 2002 conference.
What are the benefits?
It can be used to generate detailed accounting information, to better enforce limits in a job and to prevent a job from using more CPUs than requested.
How to use it?
Before you start, set up the projects database and extended accounting facility. Please refer to the Solaris OE documentation for detailed setup instructions.
The key is to associate each job to a taskid of Solaris 9 Resource Manager software. To do this, run newtask at job start up, either on the starter method or on the prolog.
The starter method is simpler; since it execs the end user script. An example of a starter method is:
exec /usr/bin/newtask $SGE_STARTER_SHELL_PATH $*
A sample starter method is provided in Appendix E.
The advantage of using the queue/host prolog is that it can be run as root; and thus perform privileged operations without the need of setuid helper programs. On the other hand, the prolog process does not exec the user script. To make the changes "stick" you have to change the taskid of the shepherd process instead of the prolog's process:
newtask -c `ps -o ppid= -p $$` (CORRECT: changes the shepherd's taskid).
Once the shepherd has the new taskid the starter method, user script and epilog script will inherit the proper taskids from the shepherd.
Projects
Note: The SunTM ONE Grid Engine, Enterprise Edition software has projects, but for this document, only the Solaris 9 Resource Manager software concept of projects will be used.
newtask can be used to create a new taskid as well as bill the job to a project. To do this, use the -p <<project name>> flag on the newtask command line inside the starter method or prolog script.
You can have resource pools associated with a project; therefore, billing the job to a project will also bind it to the associated resource pool.
Resource Pools
There are two ways of using resource pools with jobs: (1) associate a resource pool to a project and use this project on the newtask call. (2) use the poolbind command and explicitly bind the job to a resource pool. The poolbind command requires root privileges and should be used on the queue's prolog or be called from the starter method through a setuid wrapper.
The prolog example on Appendix B, uses the poolbind method. Resource pools can be used to limit the number of CPUs a job can use; therefore, a grid administrator can block a multithreaded program to use more CPUs than requested. Note that if a parallel job is bound to a resource pool with less CPUs than given by the PARALLEL environment variable, the parallel job will severely slow down.
Resource controls
At this point resource controls do not offer much new functionality to the Grid Engine software. However, you can use resource controls to enforce CPU time limits and Light Weight Process (LWP) limits on a job, by using the prctl command on the prolog or start method to set the job's resource controls. The prolog in Appendix B illustrates the usage of prctl.
How do I get an exacct
report?
There are no reporting tools for exacct at this time. Only the C API and a demo program that prints all the exacct records from a file is available. The demo program in C is in the SUNWosdem package and is usually installed at /usr/demo/libexacct. You will need to compile it before you use it.
I wrote a simple tool that scans the exacct file that keeps track of processes and prints selected parts of the process records that used more than 0.5 seconds of CPU time. You can download this tool here.
Limitations
The setup presented
here only works for BATCH queues.
Interactive loads
have to be controlled by projects because the in.rlogind starts a new task
for the login session, ignoring the task created by the starter method
or prolog.
Parallel Environments
are not covered in this HOWTO.
Appendix: A
The following is a minimal prolog script to assign a taskid to a job; You can download it here.
Here is an example of a fancier prolog script. It creates a new taskid, assigns it to the job, sets resource controls, and binds the job to a resource pool. You can download it here.
== fancy prolog ===
== fancy prolog ==
To use this prolog in a queue, use the following command:
qconf -mqattr prolog root@<<path to the prolog script>> <<queues>>
Appendix: C
Here are the commands that will create three resource pools in a 4 CPU machine. One pool is allocated to system resources and the other two ("single", with 1 CPU and "dual" with 2 CPUs) can be used by SGE prologs. You can download this file here.
create system mymachine
To use this prolog in a queue, edit the variable EPILOG_SUMMARY to point to the proper path of proclist then use the following command:
qconf -mqattr epilog <<path to the prolog script>> <<queues>>
Sample Starter Method
Simple starter_method, similar to the minimal prolog (available for download
here):
To use this starter_method in a queueuse the following command:
qconf -mqattr starter_method <<path to the starter_method script script>> <<queues>>
Trademarks
Copyright © 2002 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,
California 95054, U.S.A. All rights reserved.This distribution may include
materials developed by third parties. Sun, Sun Microsystems, SunTM ONE and Solaris are trademarks or registered
trademarks of Sun Microsystems, Inc. in the United States and other countries.
By Paulo Tibério Muradas Bulhões, November 2002.