Setting Up A Shadow Master In Grid Engine

To set up master shadowing in Grid Engine, the following steps must be taken. 
    1) Create the shadow_masters file 
    2) Verify correct permissions 
    3) Start the shadowd daemon(s)

1) Create a shadow_masters file

The file needs to be created in $SGE_ROOT/default/common. This file should contain the name of the primary master host as the first line. Other hosts that are chosen to assume master responsibility should then be listed in the order desired. For example:

>cat shadow_masters
host1
host2
host3

Here, host1 is the primary master host. Should host1 fail, host2 will take over as the master server after a period of approximately 10 minutes. Further, if host2 should then fail, host3 will take over.

2) Verify correct permissions

All master shadow hosts must have read/write permissions to the qmaster spool directory.

3) Start the shadow daemons

The shadow daemon must be started on all shadow master hosts. This is done via the startup script, rcsge. As root on each host, run the following: 

 $SGE_ROOT/default/common/rcsge -shadowd       [Version 5.3 and its patches]
 $SGE_ROOT/default/common/sgemaster -shadowd   [Version 6 or later]
After these steps are successfully completed, master shadowing for the Grid Engine cluster is active. Refer to the Shadow Master Documentation and Shadow Master Man Page for more information about shadowd failover delay (SGE_DELAY_TIME) and check interval (SGE_CHECK_INTERVAL).

NOTES:
When using this shadow master feature with the master hosts with multiple network interfaces, the following things have to be addressed.

  • Version 6 release must install the Update 1 patch to make the shadow master work.
  • Version 5.3 and its patch releases need to create a symbolic link to each of shadow masters as shown below at $SGE_ROOT/default/spool/qmaster directory. This is because the shadow daemon still looks for the following file name associated with the old hostname while the rcsge script looks for the file name associated with the new hostname assigned to the Grid Engine traffic. An example is given below.
       % ls -l $SGE_ROOT/default/spool/qmaster 
       ...
       lrwxrwxrwx 1 sge sge 17 Sep 28 09:01 shadowd_host1-ge.pid -> shadowd_host1.pid
       -rw-r--r-- 1 sge sge 17 Sep 28 09:00 shadowd_host1.pid
       lrwxrwxrwx 1 sge sge 17 Sep 28 09:02 shadowd_host2-ge.pid -> shadowd_host2.pid
       -rw-r--r-- 1 sge sge 17 Sep 28 09:00 shadowd_host2.pid
    
    In this example, host1 and host2 are hostnames for two shadow masters. Also host1-ge and host2-ge are the names for Grid Engine network interfaces.
       % cat /etc/hosts
       #
       # hostnames
       #
       192.168.8.10   host1
       192.168.8.11   host2
       #
       # Grid Engine Network 
       #
       192.168.9.10   host1-ge
       192.168.9.11   host2-ge
    
  • Version 5.3 and its patch releases need to keep both new and old hostnames in the shadow_masters file due to the reason mentioned above. An example is given below.
       % cat shadow_masters
       host1
       host2
       host1-ge
       host2-ge