Relocating Jobs From a User's Workstation |
Grid Engine may be configured to relocate a job, useful in the case where the host being used is a desktop system and is only to be used when the owner is not actively working with it. This uses the checkpointing facility of Grid Engine to kill and restart the job elsewhere when the user returns and moves the mouse or presses a key. To set up this configuration, the following steps are needed: 1) Configure Grid Engine to track
interactive idle time 1) Configure Grid Engine to track interactive idle time Please see the following application note: Tracking Interactive Idle Time
2) Configure the checkpointing interface The checkpointing interface needs to be created. This can be done in qmon:
3) Add the checkpoint ability to the appropriate queues In qmon, modify the appropriate queues to give them the checkpointing ability:
4) Set the load threshold in the queues to trigger the relocation
In qmon "Queue Control", select and modify an appropriate queue. On the "Load/Suspend Thresholds" tab, add to the currently set load threshold by clicking on the heading labelled "Load" and selecting the idle time resource. Enter the desired value under the value column.
When submitting a job that is eligible to be moved, the checkpointing interface needs to be specified. For example, if the interface created above was named "reloc", I would submit the job as such:
qsub -ckpt reloc myjob.sh
The job will be eligible to run in any queue which has the checkpointing ability. Then, if the job is subsequently suspended (as when the queue it is running in is suspended when the user clears the interactive idle time), it will be killed, then requeued. |