Setting Up A Load Sensor in Grid Engine |
Overview Grid Engine contains a certain set of load parameters which it tracks automatically. Should it be necessary to track a load value not tracked by Grid Engine, a load sensor can be used. A load sensor is a small script which simply outputs one or more name-value pairs to standard out. A name-value pair consists of a resource name, and its current value. The example below illustrates how to set up a load sensor to track the amount of /tmp space on each Grid Engine host. Load sensors to monitor any desirable value can be written by using this as a template. Once a load sensor is added, the new resource can be used as a load threshhold, or consumable resource. The steps for adding a load sensor are as follows: Step 1: Define the resource Step 2: Configure the resource Step 3: View/Verify the resource Step 4: Request the resource Step 1: Define the resource attributes in the host
complex name shortcut type value relop requestable consumable default ---------------------------------------------------------------------- tmpfree tmpfree MEMORY 0 <= YES YES 0 tmptot tmptot MEMORY 0 <= YES NO 0 tmpused tmpused MEMORY 0 >= NO NO 0
This says: There is a complex attribute called "tmpfree"
with the shortcut "tmpfree" of type memory. The "value"
is supplied by the load sensor. It is requestable ("yes"),
and it is consumable ("yes"). The "default"
should be set to 0. % qconf -sc host name shortcut type value relop requestable consumable default ------------------------------------------------------------------------ tmpfree tmpfree MEMORY 0 <= YES YES 0 tmptot tmptot MEMORY 0 <= YES NO 0 tmpused tmpused MEMORY 0 >= NO NO 0
Step 2: Configure the "global" host in the cluster
configuration Step 3: View the new global resources % qhost -F tmpfree,tmptot,tmpused HOSTNAME ARCH NPROC LOAD MEMTOT MEMUSE SWAPTO SWAPUS ---------------------------------------------------------------- BALROG solaris6 2 1.47 1.0G 974.0M 150.0M 130.0M Host Resource(s): hl:tmpfree=337.744000M hl:tmptot=338.808000M hl:tmpused=1.014709M See the qhost(1) man page for more information. Step 4: Requesting a resource % qsub -l tmpfree=100 myjob.sh This will dispatch the job only to those machines whose tmpfree value is greater than or equal to 100 MB. Note on Using a Load Sensor for Floating Licenses # ----------------< tmpspace.sh >----------------------------- #!/bin/sh # Grid Engine will automatically start/stop this script on exec hosts, if # configured properly. See the application note for configuration # instructions or contact support@gridware.com # fs to check FS=/tmp if [ "$SGE_ROOT" != "" ]; then root_dir=$SGE_ROOT # invariant values myarch=`$root_dir/util/arch` myhost=`$root_dir/utilbin/$myarch/gethostname -name` ende=false while [ $ende = false ]; do # ---------------------------------------- # wait for an input # read input result=$? if [ $result != 0 ]; then ende=true break fi if [ "$input" = "quit" ]; then ende=true break fi # ---------------------------------------- # send mark for begin of load report # NOTE: for global consumable resources not attached # to each machine (ie. floating licenses), the load # sensor only needs to be run on one host. In that case, # echo the string 'global' instead of '$myhost'. echo "begin" dfoutput="`df -k $FS | tail -1`" tmpfree=`echo $dfoutput | awk '{ print $4}'` tmptot=`echo $dfoutput | awk '{ print $2}'` tmpused=`echo $dfoutput | awk '{ print $3}'` echo "$myhost:tmpfree:${tmpfree}k" echo "$myhost:tmptot:${tmptot}k" echo "$myhost:tmpused:${tmpused}k" echo "end" done #----------------------< CUT HERE >-------------------------------- |