NAME
     sched_conf - Sun Grid Engine default scheduler configuration
     file

DESCRIPTION
     sched_conf defines the configuration  file  format  for  Sun
     Grid  Engine's  scheduler. In order to modify the configura-
     tion, use the graphical user's interface qmon(1)  or  the  -
     msconf  option of the qconf(1) command. A default configura-
     tion is provided together with the Sun Grid Engine distribu-
     tion package.

     Note, Sun Grid Engine allows  backslashes  (\)  be  used  to
     escape  newline (\newline) characters. The backslash and the
     newline are replaced with a space (" ") character before any
     interpretation.

FORMAT
     The following parameters are  recognized  by  the  Sun  Grid
     Engine scheduler if present in sched_conf:

  algorithm
     Note: Deprecated, may be removed in future release.
     Allows for the selection  of  alternative  scheduling  algo-
     rithms.

     Currently default is the only allowed setting.

  load_formula
     A simple  algebraic  expression  used  to  derive  a  single
     weighted  load value from all or part of the load parameters
     reported by sge_execd(8) for each host and from all or  part
     of  the  consumable  resources  (see complex(5)) being main-
     tained for each host.  The load formula expression syntax is
     that of a summation weighted load values, that is:

          {w1|load_val1[*w1]}[{+|-}{w2|load_val2[*w2]}[{+|-}...]]

     Note, no blanks are allowed in the load formula.
     The load values and consumable  resources  (load_val1,  ...)
     are  specified  by the name defined in the complex (see com-
     plex(5)).
     Note: Administrator defined load values (see the load_sensor
     parameter   in   sge_conf(5)  for  details)  and  consumable
     resources available for all hosts (see  complex(5))  may  be
     used as well as Sun Grid Engine default load parameters.
     The weighting factors (w1, ...) are positive integers. After
     the  expression  is  evaluated for each host the results are
     assigned to the  hosts  and  are  used  to  sort  the  hosts
     corresponding  to the weighted load. The sorted host list is
     used to sort queues subsequently.
     The default load formula is "np_load_avg".

  job_load_adjustments
     The load, which is imposed by the Sun Grid Engine jobs  run-
     ning on a system varies in time, and often, e.g. for the CPU
     load, requires some amount of time to  be  reported  in  the
     appropriate  quantity by the operating system. Consequently,
     if a job was started very recently, the  reported  load  may
     not provide a sufficient representation of the load which is
     already imposed on that host by the job. The  reported  load
     will  adapt  to  the  real load over time, but the period of
     time, in which the reported load is  too  low,  may  already
     lead  to  an  oversubscription of that host. Sun Grid Engine
     allows the  administrator  to  specify  job_load_adjustments
     which  are  used in the Sun Grid Engine scheduler to compen-
     sate for this problem.
     The job_load_adjustments are specified as a comma  separated
     list  of  arbitrary  load parameters or consumable resources
     and (separated by an equal sign) an associated load  correc-
     tion  value.  Whenever  a job is dispatched to a host by the
     scheduler, the load parameter and consumable  value  set  of
     that  host  is  increased  by  the  values  provided  in the
     job_load_adjustments  list.  These  correction  values   are
     decayed      linearly      over     time     until     after
     load_adjustment_decay_time from the  start  the  corrections
     reach  the  value  0.   If  the job_load_adjustments list is
     assigned the special denominator NONE, no  load  corrections
     are performed.
     The adjusted load and consumable values are used to  compute
     the  combined  and  weighted  load  of  the  hosts  with the
     load_formula (see above) and to compare the load and consum-
     able  values against the load threshold lists defined in the
     queue   configurations   (see   queue_conf(5)).    If    the
     load_formula consists simply of the default CPU load average
     parameter np_load_avg, and if  the  jobs  are  very  compute
     intensive,  one  might  want to set the job_load_adjustments
     list to np_load_avg=1.00, which means  that  every  new  job
     dispatched  to  a host will require 100 % CPU time, and thus
     the machine's load is instantly increased by 1.00.

  load_adjustment_decay_time
     The load  corrections  in  the  "job_load_adjustments"  list
     above  are  decayed linearly over time from the point of the
     job start, where the corresponding load or consumable param-
     eter  is  raised by the full correction value, until after a
     time  period  of  "load_adjustment_decay_time",  where   the
     correction      becomes     0.     Proper     values     for
     "load_adjustment_decay_time" greatly depend upon the load or
     consumable   parameters  used  and  the  specific  operating
     system(s). Therefore, they can only  be  determined  on-site
     and experimentally.  For the default np_load_avg load param-
     eter a "load_adjustment_decay_time" of 7 minutes has  proven
     to yield reasonable results.

  maxujobs
     The maximum number of jobs any user may have  running  in  a
     Sun  Grid  Engine  cluster  at  the  same  time. If set to 0
     (default) the users may run an arbitrary number of jobs.

  schedule_interval
     At the time the scheduler thread initially registers at  the
     event     master     thread     in     sge_qmaster(8)process
     schedule_interval is used to set the time interval in  which
     the  event  master  thread sends scheduling event updates to
     the scheduler thread.  A scheduling event is a status change
     that has occurred within sge_qmaster(8) which may trigger or
     affect scheduler decisions (e.g. a job has finished and thus
     the allocated resources are available again).
     In the Sun Grid Engine default scheduler the  arrival  of  a
     scheduling  event  report  triggers  a  scheduler  run.  The
     scheduler waits for event reports otherwise.
     Schedule_interval is a time value (see queue_conf(5)  for  a
     definition of the syntax of time values).

  queue_sort_method
     This parameter determines in which  order  several  criteria
     are  taken  into  account  to  product  a sorted queue list.
     Currently, two settings are valid:  seqno and load.  However
     in  both  cases,  Sun  Grid  Engine attempts to maximize the
     number of soft requests (see qsub(1) -s option)  being  ful-
     filled  by  the  queues for a particular as the primary cri-
     terion.
     Then, if the queue_sort_method parameter is  set  to  seqno,
     Sun  Grid Engine will use the seq_no parameter as configured
     in the current queue configurations (see  queue_conf(5))  as
     the  next criterion to sort the queue list. The load_formula
     (see above) has only a meaning  if  two  queues  have  equal
     sequence  numbers.   If queue_sort_method is set to load the
     load according the load_formula is the criterion after  max-
     imizing  a  job's  soft  requests and the sequence number is
     only used if two hosts have the  same  load.   The  sequence
     number  sorting is most useful if you want to define a fixed
     order in which queues are to be filled (e.g.   the  cheapest
     resource first).

     The default for this parameter is load.

  halftime
     When executing under a share  based  policy,  the  scheduler
     "ages"  (i.e. decreases) usage to implement a sliding window
     for achieving the share entitlements as defined by the share
     tree.  The halftime defines the time interval in which accu-
     mulated usage will have been decayed to  half  its  original
     value.  Valid  values are specified in hours or according to
     the time format as specified in queue_conf(5).
     If the value is set to 0, the usage is not decayed.

  usage_weight_list
     Sun  Grid  Engine  accounts  for  the  consumption  of   the
     resources  CPU-time,  memory  and  IO to determine the usage
     which is imposed on a system by a job. A single usage  value
     is computed from these three input parameters by multiplying
     the individual values by weights and  adding  them  up.  The
     weights  are defined in the usage_weight_list. The format of
     the list is

          cpu=wcpu,mem=wmem,io=wio

     where wcpu, wmem and wio are the configurable  weights.  The
     weights  are real number. The sum of all tree weights should
     be 1.

  compensation_factor
     Determines how fast Sun Grid Engine  should  compensate  for
     past  usage  below of above the share entitlement defined in
     the share tree. Recommended values are  between  2  and  10,
     where 10 means faster compensation.

  weight_user
     The relative importance of the user shares in the functional
     policy.  Values are of type real.

  weight_project
     The relative importance of the project shares in  the  func-
     tional policy.  Values are of type real.

  weight_department
     The relative importance of  the  department  shares  in  the
     functional policy. Values are of type real.

  weight_job
     The relative importance of the job shares in the  functional
     policy. Values are of type real.

  weight_tickets_functional
     The maximum number of functional tickets available for  dis-
     tribution by Sun Grid Engine. Determines the relative impor-
     tance of the functional policy.  See  under  sge_priority(5)
     for an overview on job priorities.

  weight_tickets_share
     The maximum number of share based tickets available for dis-
     tribution by Sun Grid Engine. Determines the relative impor-
     tance of the share tree policy.  See  under  sge_priority(5)
     for an overview on job priorities.

  weight_deadline
     The weight applied on the remaining time until a jobs latest
     start  time.  Determines  the  relative  importance  of  the
     deadline. See under sge_priority(5) for an overview  on  job
     priorities.

  weight_waiting_time
     The weight applied on the jobs waiting  time  since  submis-
     sion.  Determines  the  relative  importance  of the waiting
     time.  See under sge_priority(5)  for  an  overview  on  job
     priorities.

  weight_urgency
     The weight applied on jobs normalized urgency when determin-
     ing  priority  finally used.  Determines the relative impor-
     tance of urgency.  See under sge_priority(5) for an overview
     on job priorities.

  weight_priority
     The weight applied on jobs normalized  POSIX  priority  when
     determining  priority  finally used. Determines the relative
     importance of POSIX priority.  See under sge_priority(5) for
     an overview on job priorities.

  weight_ticket
     The weight applied on normalized ticket amount  when  deter-
     mining  priority  finally  used.   Determines  the  relative
     importance of the ticket policies. See under sge_priority(5)
     for an overview on job priorities.

  flush_finish_sec
     The parameters are provided for tuning the system's schedul-
     ing  behavior.   By default, a scheduler run is triggered in
     the scheduler interval. When this parameter is set to  1  or
     larger,  the  scheduler  will be triggered x seconds after a
     job has finished. Setting this parameter to 0  disables  the
     flush after a job has finished.

  flush_submit_sec
     The parameters are provided for tuning the system's schedul-
     ing  behavior.   By default, a scheduler run is triggered in
     the scheduler interval.  When this parameter is set to 1  or
     larger,  the  scheduler will be triggered  x seconds after a
     job was submitted to the system. Setting this parameter to 0
     disables the flush after a job was submitted.

  schedd_job_info
     The default scheduler can keep track why jobs could  not  be
     scheduled  during  the  last  scheduler  run. This parameter
     enables or disables the observation.  The value true enables
     the monitoring false turns it off.

     It is also possible to activate  the  observation  only  for
     certain  jobs.  This will be done if the parameter is set to
     job_list followed by a comma separated list of job ids.
     The user can obtain the collected information with the  com-
     mand qstat -j.

  params
     This is foreseen for passing additional  parameters  to  the
     Sun  Grid  Engine scheduler. The following values are recog-
     nized:

     DURATION_OFFSET
          If set, overrides the  default  of  value  60  seconds.
          This parameter is used by the Sun Grid Engine scheduler
          when planning resource utilization as the delta between
          net  job runtimes and total time until resources become
          available again. Net job runtime as specified  with  -l
          h_rt=...   or  -l  s_rt=...  or default_duration always
          differs from total job runtime due to delays before and
          after  actual  job  start  and finish. Among the delays
          before job start  is  the  time  until  the  end  of  a
          schedule_interval,  the  time it takes to deliver a job
          to sge_execd(8) and the  delays  caused  by  prolog  in
          queue_conf(5)   ,   start_proc_args  in  sge_pe(5)  and
          starter_method      in      queue_conf(5)      (notify,
          terminate_method   or  checkpointing),  procedures  run
          after actual job  finish,  such  as  stop_proc_args  in
          sge_pe(5)  or  epilog  in queue_conf(5) , and the delay
          until a new schedule_interval.
          If the offset is too low,  resource  reservations  (see
          max_reservation)  can  be  delayed repeatedly due to an
          overly optimistic job circulation time.

     JC_FILTER
          Note: Deprecated, may be removed in future release.
          If set to true, the scheduler limits the number of jobs
          it  looks  at during a scheduling run. At the beginning
          of the scheduling run it assigns each  job  a  specific
          category,  which is based on the job's requests, prior-
          ity settings, and the job owner. All  scheduling  poli-
          cies will assign the same importance to each job in one
          category. Therefore the number  of  jobs  per  category
          have  a  FIFO order and can be limited to the number of
          free slots in the system.

          A exception are jobs, which request a resource reserva-
          tion.  They  are  included  regardless of the number of
          jobs in a category.

          This setting is turned off per default, because in very
          rare cases, the scheduler can make a wrong decision. It
          is also advised to turn report_pjob_tickets off. Other-
          wise qstat -ext can report outdated ticket amounts. The
          information shown with a qstat -j for a job,  that  was
          excluded in a scheduling run, is very limited.

     PROFILE
          If set equal to 1, the scheduler logs profiling  infor-
          mation summarizing each scheduling run.

     MONITOR
          If set equal to 1, the  scheduler  records  information
          for  each  scheduling  run  allowing  to  reproduce job
          resources      utilization      in       the       file
          <sge_root>/<cell>/common/schedule.

     PE_RANGE_ALG
          This parameter sets the algorithm for the pe range com-
          putation.  The  default  is automatic, which means that
          the scheduler will select the best one, and  it  should
          not be necessary to change it to a different setting in
          normal operation. If a custom setting  is  needed,  the
          following values are available:
          auto       : the scheduler selects the best algorithm
          least      : starts  the  resource  matching  with  the
          lowest slot amount first
          bin        : starts the resource matching in the middle
          of the pe slot range
          highest    : starts  the  resource  matching  with  the
          highest slot amount first

     Changing params will take immediate effect.  The default for
     params is none.

  reprioritize_interval
     Interval (HH:MM:SS) to reprioritize jobs  on  the  execution
     hosts  based  on  the  current ticket amount for the running
     jobs. If the interval is set to 00:00:00  the  reprioritiza-
     tion  is  turned  off.  The  default value is 00:00:00.  The
     reprioritization tickets are calculated by the scheduler and
     update  events  for  running  jobs  are  only sent after the
     scheduler calculated new  values.  How  often  the  schedule
     should   calculate   the   tickets   is   defined   by   the
     reprioritize_interval.  Because the scheduler is only  trig-
     gered in a specific interval (scheduler_interval) this means
     the reprioritize_interval has only a meaning if set  greater
     than   the   scheduler_interval.    For   example,   if  the
     scheduler_interval is 2 minutes and reprioritize_interval is
     set  to  10  seconds, this means the jobs get re-prioritized
     every 2 minutes.

  report_pjob_tickets
     This parameter allows to tune the  system's  scheduling  run
     time.  It is used to enable / disable the reporting of pend-
     ing job tickets to the qmaster.  It does not  influence  the
     tickets  calculation.  The  sort  order of jobs in qstat and
     qmon is only based on the submit time, when the reporting is
     turned off.
     The reporting should be turned off in a system with  a  very
     large amount of jobs by setting this parameter to "false".

  halflife_decay_list
     The halflife_decay_list allows to configure different  decay
     rates  for  the "finished_jobs usage types, which is used in
     the pending job ticket calculation to account for jobs which
     have just ended. This allows the user the pending jobs algo-
     rithm to count finished jobs against a user or project for a
     configurable decayed time period. This feature is turned off
     by default, and the halftime is used instead.
     The halflife_decay_list also allows one  to  configure  dif-
     ferent  decay  rates for each usage type being tracked (cpu,
     io, and mem). The list is specified in the following format:

          <USAGE_TYPE>=<TIME>[:<USAGE_TYPE>=<TIME>[:<USAGE_TYPE>=<TIME>]]

     <Usage_TYPE> can be one of the following: cpu, io, or mem.
     <TIME> can be -1, 0 or a timespan specified in  minutes.  If
     <TIME>  is  -1,  only the usage of currently running jobs is
     used. 0 means that the usage is not decayed.

  policy_hierarchy
     This parameter sets up a dependency chain  of  ticket  based
     policies.  Each  ticket based policy in the dependency chain
     is influenced by the previous policies  and  influences  the
     following  policies.  A  typical  scenario is to assign pre-
     cedence for the override policy over the share-based policy.
     The  override  policy  determines  in such a case how share-
     based tickets are assigned among jobs of the  same  user  or
     project.   Note  that  all policies contribute to the ticket
     amount assigned to a particular job regardless of the policy
     hierarchy  definition. Yet the tickets calculated in each of
     the    policies    can    be    different    depending    on
     "POLICY_HIERARCHY".

     The "POLICY_HIERARCHY" parameter can be a  up  to  3  letter
     combination of the first letters of the 3 ticket based poli-
     cies S(hare-based), F(unctional) and O(verride). So a  value
     "OFS"  means  that the override policy takes precedence over
     the functional policy, which finally influences  the  share-
     based  policy.   Less  than  3 letters mean that some of the
     policies do not influence other policies and  also  are  not
     influenced  by other policies. So a value of "FS" means that
     the functional policy influences the share-based policy  and
     that there is no interference with the other policies.

     The special value "NONE" switches off policy hierarchies.

  share_override_tickets
     If set to "true" or "1", override tickets  of  any  override
     object  instance  are  shared equally among all running jobs
     associated with the object. The pending  jobs  will  get  as
     many  override  tickets,  as they would have, when they were
     running. If set to "false" or "0", each job  gets  the  full
     value  of  the  override tickets associated with the object.
     The default value is "true".

  share_functional_shares
     If set to "true" or "1", functional shares of any functional
     object  instance  are  shared  among all the jobs associated
     with the object. If set to "false" or "0", each job  associ-
     ated  with  a  functional  object,  gets the full functional
     shares of that object. The default value is "true".

  max_functional_jobs_to_schedule
     The maximum number of pending jobs to schedule in the  func-
     tional policy. The default value is 200.

  max_pending_tasks_per_job
     The maximum number of subtasks  per  pending  array  job  to
     schedule.  This parameter exists in order to reduce schedul-
     ing overhead. The default value is 50.

  max_reservation
     The  maximum  number  of  reservations  scheduled  within  a
     schedule  interval.  When  a runnable job can not be started
     due  to  a  shortage  of  resources  a  reservation  can  be
     scheduled   instead.  A  reservation  can  cover  consumable
     resources with the global host, any execution host  and  any
     queue.  For  parallel  jobs  reservations  are done also for
     slots resource as specified in sge_pe(5).   As  job  runtime
     the  maximum  of  the  time specified with -l h_rt=... or -l
     s_rt=... is assumed. For jobs that have neither of them  the
     default_duration  is  assumed.  Reservations prevent jobs of
     lower priority as specified in sge_priority(5) from  utiliz-
     ing  the reserved resource quota during the time of reserva-
     tion.  Jobs of lower priority are allowed to  utilize  those
     reserved  resources  only  if  their  prospective job end is
     before the start of the reservation (backfilling).  Reserva-
     tion  is  done  only  for  non-immediate jobs (-now no) that
     request reservation (-R y). If max_reservation is set to "0"
     no job reservation is done.

     Note, that reservation scheduling can be performance consum-
     ing  and  hence  reservation  scheduling  is switched off by
     default. Since reservation scheduling  performance  consump-
     tion  is  known to grow with the number of pending jobs, the
     use of -R y option is recommended only for those jobs  actu-
     ally  queuing  for  bottleneck  resources. Together with the
     max_reservation parameter this technique can be used to nar-
     row down performance impacts.


  default_duration
     When job  reservation  is  enabled  through  max_reservation
     sched_conf(5)  parameter  the default duration is assumed as
     runtime for jobs  that  have  neither  -l  h_rt=...  nor  -l
     s_rt=...  specified.  In  contrast to a h_rt/s_rt time limit
     the default_duration is not enforced.

FILES
     <sge_root>/<cell>/common/sched_configuration
                scheduler thread configuration

SEE ALSO
     sge_intro(1), qalter(1), qconf(1), qstat(1),  qsub(1),  com-
     plex(5),  queue_conf(5),  sge_execd(8),  sge_qmaster(8), Sun
     Grid Engine Installation and Administration

COPYRIGHT
     See sge_intro(1) for a full statement of rights and  permis-
     sions.
Man(1) output converted with man2html