Close Integration
With Sun HPC
Cluster ToolsTM 5 software
release, Sun CRE (Cluster Runtime Environment) provides close
integration with several distributed resource managers. In that
integration, Sun CRE retains most of its original functions, but
delegates others to the resource manager.
The Sun
HPC ClusterTools 5 Software Administrator's Guide provides the
detailed description of how
it works and how
to configure it with Sun Grid Engine, which is based on Grid
Engine open source community project.
We recommend the close
integration for Sun HPC ClusterTools 5 software because it provides
significantly better resource monitoring, control and accounting on
Sun MPI processes via Grid Engine commands than the loose integration
introduced for Sun HPC ClusterTools 3.1 and 4 releases.
However,
we need to provide appropriate suspend and resume methods for Grid
Engine queues to run Sun MPI jobs under Grid Engine environment.
These suspend and resume methods (scripts) can deliver SIGSTOP and
SIGCONT signals to Sun MPI processes when suspending/resuming Sun MPI
jobs using Grid Engine commands such as "qmod -s $sge_jid"
and "qmod -us $sge_jid". This is all due to the difference
between how both Grid Engine and HPC ClusterTools products trap and
deliver signals to their child processes. The
enhancement package includes the following
files:
README
suspend_sunmpi_ci.sh
resume_sunmpi_ci.sh
pe_sunmpi_ci.template
The
README file in this package describes about all other files and
provides technical background information about this enhancement and
how to configure suspend and resume methods.
Loose Integration
A loose integration package distributed with Grid Engine 5.3 is
useful to loosely integrate Grid Engine with Sun
HPC Cluster Tools software with little effort. The package works
for Sun HPC ClusterTools 3.1 and 4 releases.
The loose
integration package is located at
$SGE_ROOT/mpi/sunhpc/loose-integration directory after installing
Grid Engine software. The loose integration package includes all the
necessary files and integration script. The README file in the
package gives detailed technical description of the loose integration
and a step-by-step integration procedure in case anyone wants to
implement it manually.