Grid Scheduler / Grid Engine HOWTOs

Last Update: June 2012

Table Of Contents

General Grid Engine concepts
Resource management
Cluster management
Special Applications
Tight Integration of Parallel Libraries
Accounting and Reporting Database (ARCo)
Profiling and Tracing
DRMAA

Content

General Grid Engine concepts

Introduction to Grid Engine video
Basic Usage
Common Administrative Tasks
Customization of Qmon
Migration of Qmaster to Another Machine
Setting Up a Shadow Master
Commonly Seen Problems
Troubleshooting

Resource management

Managing Resources Abstractly
Consumable Resources
Setting Up Load Sensors to Track Resource Availablility/Utilization
Different resource management approaches with Grid Engine
Tracking interactive idle time of desktop workstations
Relocating Jobs From a User's Workstation
Grid Engine Enterprise Edition
Sun Grid Engine, Enterprise Edition -- Configuration Use Cases and Guidelines
Scheduler Policies for Job Prioritization in the N1 Grid Engine 6 System
File Staging
Logical resource expressions
Resource quotas

Cluster management

Command Line and Scripting of Administrative Tasks
Submitting Binaries
Configure qrsh and qlogin to use ssh as transport protocol
Rotating and truncating Log Files
Reducing and Eliminating NFS Usage
Installing on a system with multiple network interfaces
Installing on a system with Solaris IP Multipathing
Deploying PCs with Grid Engine enabled KNOPPIX boot images
Using Host Groups and Cluster Queues
Running jobs on data kept (on a USB connected HD) in a separate network via sshfs

Special Applications

Grid Engine Hadoop Integration
Olesen-FLEXlm-Integration, also wiki documentation of the Olesen method
Using Clearcase
Using Mentor ModelSim and Mentor JobSpy
Mathematica
Ansys
MultiClustering using Transfer Queues
Integration of SGE and Solaris 9 Resource Manager
SGE-Globus integration
Checkpointing jobs using SGE's checkpointing support
Checkpointing under Linux with Berkeley Lab Checkpoint/Restart
Integration of Meiosys MetaCluster HPC with N1[TM] Grid Engine 6
JAM - Job & Application Manager
JGrid - an RMI-based Java interface for Grid Engine

Tight Integration of Parallel Libraries

Tight Integration of LAM/MPI and SGE
Tight Integration of MPICH and SGE -- With Application Notes
Tight Integration of MPICH2 and SGE
Tight Integration of PVM and SGE
Mvapich (MPICH Infiniband) + Loose/Tight SGE Integration
Sun HPC Cluster Tools parallel jobs (MPI, MPI2, OpenMP)
Tight integration of Open MPI with SGE

Accounting and Reporting Database (ARCo)

ARCo and Oracle 10g Database
ARCo on MySQL Database
Space Requirements for the ARCo database

Profiling and Tracing

Grid Engine Tuning guide
Grid Engine Profiling HOWTO
Monitoring SGE Performance with DTrace

DRMAA

DRMAA C Binding
File Staging in Grid Engine 6.0 with DRMAA
DRMAA Java Language Binding
DRMAA Python Tutorial