Integration Between Grid Engine and HPC Cluster Tools Software (MPI, MPI2, OpenMP)

Close Integration

With Sun HPC Cluster ToolsTM 5 software release, Sun CRE (Cluster Runtime Environment) provides close integration with several distributed resource managers. In that integration, Sun CRE retains most of its original functions, but delegates others to the resource manager.

The Sun HPC ClusterTools 5 Software Administrator's Guide provides the detailed description of how it works and how to configure it with Sun Grid Engine, which is based on Grid Engine open source community project.

We recommend the close integration for Sun HPC ClusterTools 5 software because it provides significantly better resource monitoring, control and accounting on Sun MPI processes via Grid Engine commands than the loose integration introduced for Sun HPC ClusterTools 3.1 and 4 releases.

However, we need to provide appropriate suspend and resume methods for Grid Engine queues to run Sun MPI jobs under Grid Engine environment. These suspend and resume methods (scripts) can deliver SIGSTOP and SIGCONT signals to Sun MPI processes when suspending/resuming Sun MPI jobs using Grid Engine commands such as "qmod -s $sge_jid" and "qmod -us $sge_jid". This is all due to the difference between how both Grid Engine and HPC ClusterTools products trap and deliver signals to their child processes. The enhancement package includes the following files:

README
suspend_sunmpi_ci.sh
resume_sunmpi_ci.sh
pe_sunmpi_ci.template

The README file in this package describes about all other files and provides technical background information about this enhancement and how to configure suspend and resume methods.

Loose Integration

A loose integration package distributed with Grid Engine 5.3 is useful to loosely integrate Grid Engine with Sun HPC Cluster Tools software with little effort. The package works for Sun HPC ClusterTools 3.1 and 4 releases.

The loose integration package is located at $SGE_ROOT/mpi/sunhpc/loose-integration directory after installing Grid Engine software. The loose integration package includes all the necessary files and integration script. The README file in the package gives detailed technical description of the loose integration and a step-by-step integration procedure in case anyone wants to implement it manually.