How to install the Grid Engine software on hosts with multiple network interfaces

This document describes how to install Grid Engine on machines with multiple network interafces (multi-homed host). A special case, using the SolarisTM Operating Environment IP Multipathing (IPMP) technology for IP failover, is described in a separate HOWTO.

Systems on multiple networks

Suppose we have two ethernet interfaces (say hme0 and hme1) on each machine. One interface is associated with general traffic, NFS file sharing and so on

(hme0) and the other is dedicated to Grid Engine communications (hme1). We would like to set up Grid Engine so that it communicates only on the Grid Engine dedicated network. In this example, there is a Grid Engine master node (sun-master) and three exec hosts (sun-1, sun-2, sun-3)

Setting up the networks

In /etc, there will be file called hostname.hme0 populated with the hostname (eg sun-1) and another file called hostname.hme1 populated with the grid engine interface name (eg sun-1-grid). In the /etc/hosts file, there should of course, be entries for the SGE interface as well as the standard interface


#
# Grid Engine Network 
#
192.168.7.2     sun-master-grid
192.168.7.3     sun-1-grid
192.168.7.11    sun-2-grid
192.168.7.12    sun-3-grid

When both networks are functioning correctly, install gridengine.


Making Grid Engine use the SGE network

Modify the configuration

Startup Grid Engine

Start up the master node now:


# /etc/init.d/rcsge
   starting sge_qmaster
starting program: /gridware/sge/bin/solaris64/sge_commd
using service "sge_commd"
bound to port 536
Reading in complexes:
        Complex "host".
        Complex "queue".
Reading in execution hosts.
Reading in administrative hosts.
Reading in submit hosts.
Reading in usersets:
        Userset "defaultdepartment".
        Userset "deadlineusers".
Reading in queues:
        Queue "sun-1.q".
        Queue "sun-2.q".
        Queue "sun-3.q".
Reading in parallel environments:
        PE "make".
Reading in scheduler configuration
cant load sharetree (cant open file sharetree: No such file or directory), 
starting up with empty sharetree
   starting sge_schedd
# 


Now start up the SGE exec hosts


# /etc/init.d/rcsge start
   starting sge_execd
starting program: /gridware/sge/bin/solaris64/sge_commd
using service "sge_commd"
bound to port 536
#

Check it has worked

Snoop the network to check that the correct interfaces are being used:


# qsub -q sun-1 test.sh
# snoop -V -d hme1 sun-1-grid
sun-master-grid -> sun-1-grid TCP D=46883 S=536 Syn Ack=2694354350 \
  Seq=2537161622 Len=0 Win=49640 Options=<mss 1460,nop,nop,sackOK>
sun-1-grid -> sun-master-grid TCP D=536 S=46883 Ack=2537161623 \
  Seq=2694354350 Len=0 Win=49640

Trademarks 

Sun and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. Sun et Solaris sont des marques déposées ou enregistrées de Sun Microsystems, Inc. aux Etats-Unis et dans d'autres pays.