Monitoring SGE Performance with DTrace

DTrace support was added to the V60s2_BRANCH release of the Grid Engine CVS codebase.

This means one can conduct performance tuning and troubleshooting of the qmaster/scheduler of a production SGE cluster, without compiling a special version of the binaries, and without restarting daemons. Currently, Solaris 10, FreeBSD, NetBSD, and Mac OS X are the only OSes with DTrace support.

A wrapper shell script (monitor.sh) is also included to invoke the DTrace script from the command line with all the parameters needed:

   % monitor.sh -help
   monitor.sh [options]
   options:
      -cell            use $SGE_CELL other than "default"
      -interval        use statistics interval other than "15sec"
      -spooling        show qmaster spooling probes
      -requests        show incoming qmaster request probes
                Monitoring Grid Engine Masters with dtrace
                ------------------------------------------

Content
-------
1. Introduction
2. Master bottleneck analyis with dtrace
3. Copyright

1. Introduction
---------------

   Dtrace is a comprehensive framework for tracing dynamic events in
   Solaris 10. Please see under

      http://www.sun.com/bigadmin/content/dtrace/

   for more detailed information about dtrace.

2. Master bottleneck analyis with dtrace
----------------------------------------

   Understanding the bottlenecks of distributed systems is crucial for 
   performance tuning. The script $SGE_ROOT/util/dtrace/monitor.sh allows 
   a Grid Engine master be monitored, if Solaris 10 dtrace(1) can be used.

   Monitor.sh measures throughput-relevant data of your running Grid Engine 
   master and compiles this data into few indices that are printed in a 
   single-line view per interval with columns below.

      Spooling:
        #wrt 
           Number of qmaster write operations via spool_write_object() and 
           spool_delete_object(). Almost any significant write operation goes
           through this function both in bdb/classic spooling.

        wrt/ms
           Total time all threads spend in spool_write_object() in micro
           seconds.

      Message processing:
        #rep
           Number of reports qmaster processed through sge_c_report().
           Most data sent by execd's to qmaster comes as such a report
           (job/load/config report).

        #gdi 
           Number of GDI requests qmaster processed through do_gdi_request().
           Almost anything sent from client commands arrives qmaster as a
           GDI request, but also execd's and scheduler use GDI requests.

        #ack
           Number of ACK messages qmaster processed through do_c_ack().
           High numbers of ACK messages can be an indication of job
           signalling, but they are used also for other purposes.
           
      Scheduling:
         #dsp
           Number of calls to dispatch_jobs() in schedd. Each call
           to dispatch_jobs() can seen as a scheduling run.

         dsp/ms
           Total time scheduler spent in all calls to dispatch_jobs().

         #sad
           Number of calls to select_assign_debit(). Each call to
           select_assign_debit() can be seen as a try of the scheduler
           to find an assignement or a reservation for a job.

      Qmaster/Schedd synchronization:
         #snd
           Number of event packages sent by qmaster to schedd. If that
           number goes down to zero over longer time there is something
           wrong and qmaster/schedd get out of sync.

         #rcv
           Number of event packages received by schedd from qmaster.
           If that number goes down to zero over longer time there is
           something wrong and qmaster/schedd get out of sync.

      Qmaster communication:
         #in++   
           Number of messages added into qmaster received messages 
           buffer.

         #in--
           Number of messages removed from qmaster received messages 
           buffer. If more messages are added than removed during an 
           interval, the total of messages not yet processed is about 
           to grow.

         #out++  
           Number of messages added into qmaster send messages 
           buffer.

         #out--
           Number of messages removed from qmaster send messages 
           buffer. If more messages are added than removed during an 
           interval, the total of not yet messages not yet delivered 
           is about to grow.

      Qmaster locks:
         #lck0/#ulck0
           Number of calls to sge_lock()/sge_unlock() for qmasters
           "global" lock. This lock must always be obtained, when
           qmaster-internal lists (job list, queue list, etc.) are
           accessed.

         #lck1/#ulck1
           Number of calls to sge_lock()/sge_unlock() for qmasters
           "master_config" lock. This lock is a secondary lock, but
           also plays it's role.

   note, currently the following options are supported:

      -interval