How to install the Grid
Engine software on hosts with the Solaris TM
Operating Environment IP Multipathing (IPMP) technology
This document describes how to install Grid Engine on machines with
multiple network interafces (multi-homed host). Particular
attention is given to the Solaris Operating Environment IP Multipathing
technology,. The procedure presented here should work for other environments
as well.
What is IP Multipathing?
IP Multipathing is a technology that allows grouping of TCP/IP interfaces
for fail over and load balancing purposes. If an interface within an IP
Multipathing group fails, the interface is disabled and its IP address
is relocated to another interface in the group. Outbound IP traffic is
distributed across the interfaces of a group.
For further details on IP Multipathing,
refer to the Solaris Operating Environment documentation, which can be
found at:
The IPMP features overview can be found
at:
Issues between IPMP and Grid Engine
The only major issue is the error messages
while starting the Grid Engine daemons on a machine in which the main interface
is part of an IPMP group. This occurs when the IPMP load balancing distributes
the connections across the interfaces in the group; therefore, the IP packets
show up at the receiving end as coming from a different host rather than
the one associated with the main interface.
For example, let's say we have a machine
with three interfaces named qfe0, qfe1, and qfe3
, where the IP addresses for these interfaces are 10.1.1.1, 10.1.1.2 and
10.1.13 respectively. IPMP would need an extra address for each interface
for testing, but we will ignore those in this example. Each of these addresses
has a hostname associated with it. The hosts table looks
like:
10.1.1.1 sge
10.1.1.2 sge-qfe1
10.1.1.3 sge-qfe2
The machine's hostname is
sge. When a connection is established
from
sge to another machine, it might go through
sge,
sge-qfe1 , or
sge-qfe2. Upon installation, Grid Engine
will only recognize
sge.When it receives a connection from
sge-qfe2
, it closes the connection because it is not from one of the authorized
(or known) nodes.
To solve this issue we have to use the host_aliases files (see
host_aliases
man page for details). This file can be used to "tell" Grid
Engine that sge, sge-qfe1, and sge-qfe2 are
all from the same machine. The host_aliases file for this case
would look like this:
sge sge-qfe1 sge-qfe2
Note: If you
make any changes to the
$SGE_ROOT/$SGE_CELL/common/host_aliases
file, all running Grid Engine daemons (
sge_qmaster,
sge_scheduler,
sge_execd and
sge_commd) must be stopped and restarted.
Login as
root to all your Grid Engine hosts and enter:
/etc/init.d/rcsge stop
/etc/init.d/rcsge
start
How to install the Grid Engine master node with IPMP
There are at least two options:
A) Ignore the error messages during installation. The procedure is:
1. Run inst_sge -m, ignoring the error messages during the start
up of the daemons.
2. Shutdown the daemons with /etc/init.d/rcsge stop. Due to the
networking errors, some daemons fail to shutdown
and must be killed with kill -9. To check which daemons
failed to shutdown use: ps -e | grep sge_.
3. Install the host_aliases file in the $SGE_ROOT/$SGE_CELL/common
directory.
4. Restart the daemons with /etc/init.d/rcsge start.
Note: This procedure is Operating System independent.
B) Temporarily disable IPMP on the interface associated with the machine's
hostname. The procedure is:
1. Identify the interface associated with the machine's hostname.
2. Verify the interface has IPMP enabled with:
ifconfig <<interface>> | grep groupname.
3. Take note of the group name.
4. Disable IPMP with: ifconfig <<interface>> group "" .
5. Install the Grid Engine master node.
6. Install the host_aliases file in the $SGE_ROOT/$SGE_CELL/common
directory.
7. Restart all the Grid Engine daemons.
8. Re-enable IPMP: ifconfig <<interface>> group <<IPMP
group>>.
Note: This procedure is valid only for SolarisTM
8 Operating Environment or newer.
How to install a Grid Engine execution host with IPMP
Once the host_aliases file is
installed and the Grid Engine daemons are restarted, you can simply start
the execution host installation without further problems.
How to enable administrative and submit
hosts with IPMP
You can either follow the same procedure
used for the execution host (e.g. update host_aliases before installation,
see the note on changes to the host_aliases file
), or add all the hostnames associated with the administrative, or submit
host with:
qconf -ah <<hostname>> <<alias 1>>
<<alias 2>> ...
(for the
administrative host) or
qconf -as <<hostname>> <<alias 1>>
<<alias 2>> ...
(for the
submit host).
Trademarks
Sun and Solaris are trademarks or registered trademarks of Sun Microsystems,
Inc. in the United States and other countries. Sun et Solaris sont des
marques déposées ou enregistrées de Sun Microsystems,
Inc. aux Etats-Unis et dans d'autres pays.